Slack digest for #general - 2020-02-22

Apache Pulsar Slack Sat, 22 Feb 2020 01:11:50 -0800

2020-02-21 09:21:00 UTC - Steven Op de beeck: @Devin G. Bost There's a way 
around it, but I fear that's a debezium-kafka thing, and not supported in 
Pulsar. 
<https://debezium.io/documentation/reference/1.0/configuration/outbox-event-router.html>
----
2020-02-21 10:16:56 UTC - Jon Bennett: @Jon Bennett has joined the channel
----
2020-02-21 10:19:57 UTC - Jon Bennett: hi - anyone using the official (cgo) Go 
client in Docker-deployed apps?
----
2020-02-21 13:31:45 UTC - Manuel Mueller: hello,
we are currently looking into functions - especially "state" functions. We are 
experiencing some weird results, where the system is running but after a couple 
of hours it starts freezing and breaks the whole system. Logs start to show 
that the health check goes bad - it starts to refuse the connections and such. 
It would be great if any of you could share your feedback on "how to enable 
state functions" or your experience in general, maybe we missed something 
crucial.
We already ran into the python3 vs python2 "bug" which we fixed (symlinked). 
The current tests are being run in a kubernetes setup as well as standalone.
In addition the REST API goes down as well and becomes inresponsive (still not 
sure how they are connected)
----
2020-02-21 13:32:13 UTC - Manuel Mueller: 
----
2020-02-21 13:32:45 UTC - Roman Popenov: Are those function logs? Are you 
running out of heap or memory?
----
2020-02-21 13:36:35 UTC - Manuel Mueller: currently it does not feel like it 
would be a memory related issue, so far the logs did not indicate anything 
specific
----
2020-02-21 13:36:56 UTC - Roman Popenov: Perhaps stale client connections?
----
2020-02-21 13:37:12 UTC - Roman Popenov: jps/ps showing any hanging processes?
----
2020-02-21 13:38:27 UTC - Roman Popenov: Do you have grafana enabled to see 
some metrics?
----
2020-02-21 13:39:59 UTC - Manuel Mueller: in the kubernetes setup we have it - 
would you recommend us to check for memory things or just in general to check 
overal system performance?
----
2020-02-21 13:40:29 UTC - Roman Popenov: I would check to see what happening 
with the Pulsar cluster itself first
----
2020-02-21 13:40:48 UTC - Mikhail Veygman: Is there a way to configure Pulsar 
to allow message forwarding before it is stored to disk on a persistent topic?
----
2020-02-21 13:46:12 UTC - Manuel Mueller: so far the grafana dashboard is 
rather inconclusive for us. After the function is deployed, it seems to 
communicate on its port where at one point it stops with the message "serving 
on port .." at the same time, we can not use the REST API any more, which hangs 
as well
----
2020-02-21 13:48:06 UTC - Roman Popenov: Sounds like an networking issue 
:thinking_face:
----
2020-02-21 13:48:20 UTC - Manuel Mueller: My hunch tells me - it is somehow the 
"table / state service" bugging out - but I am not sure how to debug this
----
2020-02-21 13:51:38 UTC - Roman Popenov: And what is the status of the function 
if you check through admin-cli?
----
2020-02-21 13:52:40 UTC - Roman Popenov: There are also functions logs somewhere
----
2020-02-21 13:53:06 UTC - Roman Popenov: `workerId/logs/functions/` don’t 
remember where it is exactly now
----
2020-02-21 13:54:41 UTC - Roman Popenov: 
<http://pulsar.apache.org/docs/en/functions-debugging/#debug-with-localrun-mode>
----
2020-02-21 14:03:56 UTC - Chris Bartholomew: In k8s, the function logs are 
under `/pulsar/logs/functions` where the functions are running (broker or 
function worker). If the function is having trouble connecting to the state 
server (bookkeeper) then you will likely see that in the logs for the function. 
The log message from above (in broker or function worker) indicates the health 
check could not connect to the function, which probably means it is not running 
(crashed). The function log should shed some light on that.
----
2020-02-21 14:18:34 UTC - Ming: We have apps using both cgo and native go 
library deployed in docker/k8s. We'll migrate cgo to native go. Native go may 
lack some features that cgo and java client provide.
----
2020-02-21 14:27:16 UTC - Jon Bennett: hi @Ming sorry for the delay!
----
2020-02-21 14:27:53 UTC - Jon Bennett: I’m trying to build the container, but 
keep getting missing header file errors, is there a trick I’m missing?

```#
<http://github.com/apache/pulsar/pulsar-client-go/pulsar|github.com/apache/pulsar/pulsar-client-go/pulsar>
In file included from
../pkg/mod/github.com/apache/pulsar/[email protected]/pulsar/c_client.go:24:0:
./c_go_pulsar.h:22:29: fatal error: pulsar/c/client.h: No such file or directory
#include &lt;pulsar/c/client.h&gt;```

----
2020-02-21 14:28:33 UTC - Jon Bennett: I’ve tried vendoring on/off, and using
the 3rd party `vend` application, same error each time.
----
2020-02-21 14:31:08 UTC - Ming: You need add c library. Here is our docker file
<https://github.com/kafkaesque-io/pulsar-beam/blob/master/Dockerfile>
----
2020-02-21 14:31:46 UTC - Jon Bennett: @Ming for native, are you using the one
from Comcast? I was looking at that, with the cpp official client, you set a
`message.Key` and `message.Properties`, which I’ve not been able to see a way
to do with the native client.
Without a key, it’s unclear if things like topic compaction would work, how
would Pulsar know that a message is a newer version?
----
2020-02-21 14:32:25 UTC - Jon Bennett: @Ming thanks for Dockerfile, I’ll read
and be back to you shortly.
----
2020-02-21 14:37:29 UTC - Devin G. Bost: You could easily create a sink that
writes the data to a desired location before storing on a persistent topic.
What exactly is meant by "forwarding"?
----
2020-02-21 14:39:04 UTC - Devin G. Bost: Functions with state are in developer
preview and are not ready for production use. There are a few tests that fail
intermittently and are currently under investigation.
----
2020-02-21 14:41:05 UTC - Devin G. Bost: My team use a cache layer (Apache
Ignite) or stateful compute (Akka or Apache Flink, depending on the workload)
to work around the issue.
----
2020-02-21 14:43:49 UTC - Ming: The example I gave uses pulsar's cgo library.
The native go library supports ProducerMessage's key and properties.
<https://github.com/apache/pulsar-client-go/blob/fc390a6a37f3cbd94ac46b3b5e4239b3ca5df875/pulsar/message.go#L31>
----
2020-02-21 14:44:41 UTC - Devin G. Bost: It would be extremely helpful if the
logs and details here could be put into a Github Issue so we can have a
permanent record and track progress on the issue and link your experience to
any current issues.
----
2020-02-21 14:55:30 UTC - Chris Bartholomew: @Devin G. Bost in the case where
you use Apache Ignite for state storage, do you package Ignite client in with
the Pulsar function?
----
2020-02-21 15:05:31 UTC - Devin G. Bost: We created an Ignite Sink for that.
----
2020-02-21 15:05:47 UTC - Devin G. Bost: I'm working on getting it open sourced.
+1 : Chris Bartholomew, David Kjerrumgaard
----
2020-02-21 16:38:04 UTC - Mikhail Veygman: Message is sent to Pulsar. Pulsar
forwards it then records it.
----
2020-02-21 16:38:24 UTC - Mikhail Veygman: So that if you look to replay topic
from the beginning you can do it.
----
2020-02-21 17:07:49 UTC - David Kjerrumgaard: @Pushkar Sawant it depends on the
underlying issue for the write quorum failure. Did you lose a bookie? Is one of
the bookie disks full, etc?
----
2020-02-21 17:58:54 UTC - Pushkar Sawant: I have a cluster with Write Quorum
set to 2. We had an issue with one of the bookies with journal directory
filling up. I was in process of recovering the data from the node. While that
node was recovering, another node went down with ledger directory full. It
could not come back up with an error “Exception while replaying journals,
shutting down”. I effectively lost all ledgers that were shared between these
two nodes. To recover the topics that were shared on these two nodes, I tried
to delete the topics but always received 500 internal server error. Only way i
could recover cluster was to create a new cluster and migrate to it.
----
2020-02-21 18:30:10 UTC - David Kjerrumgaard: So you lost two bookies in
succession which left you with one active bookie and a write quorum of 2.
----
2020-02-21 18:34:29 UTC - David Kjerrumgaard: In such a scenario, exceptions
will be raised to the producers in order to make them stop sending messages,
which is what we want in order to prevent data loss. As far as "fixing" the
issue there are a few options; first was/is increasing the size of the journal
directory if you are using expandable storage such as Logical Volume Management
(LVM) or Amazon ESB.
----
2020-02-21 18:34:31 UTC - Rolf Arne Corneliussen: @Antti Kaikkonen Yes, you are
right, I can imagine concurrency could be problematic with a heap.

Anyway, I just tried out subscribing to a partitioned topic with the Pulsar
Java client, subscription type = Failover, and then the partitions were
distributed among the different consumers (with same subscription), resembling
a Kafka consumer group. If you register a `ConsumerEventListener` when building
the `Consumer`, you will get callbacks when partitions get active/inactive for
a consumer. The callbacks will be on a listener thread (e.g.
'pulsar-external-listener-3-4'). So I was wrong - 'consumer groups' with
callbacks can be run on Pulsar.
ok : Antti Kaikkonen
+1 : Antti Kaikkonen
----
2020-02-21 18:36:33 UTC - David Kjerrumgaard: If you fail to do that in time,
you can/should introduce a new bookie into the cluster to provide additional
storage capacity (this is especially true when you lost the second bookie).
That would really be your only course of action to prevent any further downtime.
----
2020-02-21 18:40:01 UTC - David Kjerrumgaard: once you got into a single bookie
state, you would need to add more bookies, such that `EnsembleSize &gt;= Write
Quorum &gt;= Ack Quorum` . Then you could start decommissioning the "lost"
bookie nodes to offload the data onto the newly added bookie(s).
<https://bookkeeper.apache.org/docs/latest/admin/decomission/>
----
2020-02-21 19:03:30 UTC - Jon Bennett: @Ming ahh, great. Do you have an example
of creating a ProducerMessage with Key/Props using the native client?
----
2020-02-21 19:25:13 UTC - Devin G. Bost: You could write the message to two
topics, one that sends to your desired location and one that writes to
persistent storage via a sink. You could also setup retention with tiered
storage.
----
2020-02-21 21:13:05 UTC - Ruwen: @Ruwen has joined the channel
----
2020-02-21 21:24:04 UTC - Ruwen: Hi. Are there any plans for a Java 11 Docker
image? Asking because I tried to upload a function/jar compiled with Java 11
which (obviously) failed
----
2020-02-21 21:27:33 UTC - Ali Ahmed: it’s a matter of time, we will probably
move to default java11 base image and deprecate java8, we can consider this for
the 2.7 release
----
2020-02-21 21:28:03 UTC - Roman Popenov: Are there instructions how to build
with Java 11? I am trying to do this now
----
2020-02-21 21:29:34 UTC - Roman Popenov: Anyone successfully built a docker
image using Java 11?
----
2020-02-21 21:29:50 UTC - Ruwen: if you try it anyway:
<https://github.com/apache/pulsar/blob/master/docker/pulsar/Dockerfile> swap
out the base image
----
2020-02-21 21:29:58 UTC - Ali Ahmed: we use jdk11 internally
----
2020-02-21 21:30:05 UTC - Ruwen: and let me know how it turned out :wink:
----
2020-02-21 21:33:35 UTC - Jon Bennett: Are there plans to patch the Homebrew
formula for local MacOS installs?
----
2020-02-21 21:34:40 UTC - Ali Ahmed: which homebrew this one ?
<https://github.com/streamlio/homebrew-formulae/blob/master/pulsar.rb>
----
2020-02-21 21:38:46 UTC - Devin G. Bost: What's up with it? Is it not working
again?
----
2020-02-21 22:06:28 UTC - Roman Popenov: I keep seeing the following error
while trying to run build.sh in
<https://github.com/apache/pulsar/blob/master/docker/build.sh>
----
2020-02-22 00:04:39 UTC - Roman Popenov: Seems like this cannot be run:
<https://github.com/apache/pulsar/blob/master/pulsar-client-cpp/docker/build-wheel-file-within-docker.sh#L25>
```Permission denied```
Do I need to set privileged to `docker run` command or is it something else?
----
2020-02-22 00:17:37 UTC - Jon Bennett: no, apologies, the libpulsar package
<https://formulae.brew.sh/formula/libpulsar>
----
2020-02-22 00:21:40 UTC - Ali Ahmed: will try to get it updated
----
2020-02-22 00:40:24 UTC - Mikhail Veygman: That flies in the face of what I am
doing. I am trying to make it faster instead of doing double the work.
----
2020-02-22 00:56:47 UTC - Pushkar Sawant: I have total 6 bookies. EmsembleSize,
Write Quorum and Ack Quorum is 2. I expanded the storage on the nodes but they
bookie there didn’t start on these two bookies with the error “”Exception while
replaying journals, shutting down”. Also 3 new bookies were added to cluster.
After the expansion, a small subset of topic has an error for “bookie handle
not available” which i believe were on these two nodes which couldn’t start
because of the error. As both copies were not accessible the decommission
command did not work for me.
----
2020-02-22 00:57:48 UTC - Roman Popenov: It appears that my `Permission denied`
was caused because I manually built the python and cpp client and the files
were created with different permissions/users. Running `git clean -fdx` solved
the issue
+1 : Devin G. Bost
----
2020-02-22 01:03:33 UTC - Pushkar Sawant: I was trying to delete and recreate
the topics but that also resulted in 500 internal server error because of
“bookie handle not available”
----
2020-02-22 02:44:29 UTC - Joe Francis: What are you doing?
:slightly_smiling_face: What happens if the message is fwded and Pulsar
cannot persist it? Pulsar guarantees delivery. That is, Pulsar acknowledges a
message is published, only after its persisted.
----
2020-02-22 06:06:58 UTC - Devin G. Bost: @Mikhail Veygman

By "faster," are you talking about latency? If so, my recommendation still
holds. We have long pipelines in production with many steps that are processing
tens of thousands of messages per second, and adding a sink to a function like
that will hardly add 15 milliseconds to your total latency. If your application
is really that latency sensitive, then you are going to have a lot more to
rework beyond adding a sink.

There are many good blog posts and videos about Pulsar's architecture. I
recommend that you study them. It should help your understanding a lot.
----
2020-02-22 06:07:48 UTC - Devin G. Bost: @Joe Francis "Persistence" could refer
to two different things in this context.
----
2020-02-22 06:12:38 UTC - Devin G. Bost: @Mikhail Veygman

It seems like you're expecting all of these operations to be synchronous and
thread blocking. Pulsar is designed with parallelism and asynchronous I/O in
mind. Think of it like a tree of dominoes falling. You can kick off many
operations in parallel from a single event.
----
2020-02-22 06:18:28 UTC - Devin G. Bost: If you want to replay, there are
multiple ways of doing that. For example, you could persist messages in
external storage like Apache Ignite. Or, you could setup data retention with
Apache Bookkeeper. You can also tier storage, but it sounds like you need all
your data storage to be very fast. At the latencies it sounds like you're
needing, I'd think that disk I/O would be too slow for you and that you'd need
hot storage in a memory-only cache.
----

Slack digest for #general - 2020-02-22

Reply via email to