Slack digest for #general - 2019-08-30

Apache Pulsar Slack Fri, 30 Aug 2019 02:12:19 -0700

2019-08-29 09:11:30 UTC - geal: IIRC there’s some kind of copyright assignment 
with the foundation, right?
----
2019-08-29 09:11:33 UTC - Sijie Guo: You can submit a PIP and contribute the 
project back to ASF. Upon the contribution, you might need to sign a SGA 
(Software Grant Agreement) to transfer the repo to be under apache.
----
2019-08-29 09:11:54 UTC - geal: right, I see
----
2019-08-29 09:12:50 UTC - Sijie Guo: Once it lands in ASF, the Pulsar PMC is 
responsible for making sure the repo following the principles of ASF and doing 
releases following the Apache way.
----
2019-08-29 09:12:51 UTC - geal: on the feature side, what’s the priority in a 
client?
----
2019-08-29 09:12:56 UTC - Ali Ahmed: I don’t think apache foundation can have 
official projects associated with alternate licenses
----
2019-08-29 09:14:00 UTC - Ali Ahmed: I can’t say most of the work right no is 
around node js and go client, there hasn’t been any discussion of rust client
----
2019-08-29 09:17:12 UTC - Sijie Guo: as a MVP of a new language client, making 
sure the client having the basic pub/sub messaging features is the top priority.


- produce: batch, unbatch, compressed, no-compressed
- consumer: support all subscription types, ack individual, ack cummulatively
- partitioned and non-partitioned
- reader features
- common logic: retry
----
2019-08-29 09:18:45 UTC - geal: right, that’s about what I had in mind
----
2019-08-29 09:18:52 UTC - geal: (it’s very limited right now)
----
2019-08-29 09:19:37 UTC - geal: there are 4 contributors for the project, so a 
relicensing would be doable
----
2019-08-29 09:19:54 UTC - Sijie Guo: :+1:
----
2019-08-29 09:19:58 UTC - geal: or just, starting from a specific commit, 
dropping MIT
----
2019-08-29 09:21:26 UTC - Sijie Guo: @geal I think those license stuff can be 
done when you contribute a repo to ASF.
----
2019-08-29 09:22:18 UTC - Sijie Guo: if you contribute a repo as an ASF 
incubator project, incubator PMC will mentor you to do so; if you contribute 
the repo as a sub project to Pulsar, the Pulsar PMC will help you to do so.
----
2019-08-29 09:24:50 UTC - geal: I see
----
2019-08-29 13:13:02 UTC - bsideup: Is there anyone I can talk to about this?
<https://github.com/apache/pulsar/blob/83365ebd536af3b3575026a06d0b83cd506898d3/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L2729>
Trying to understand why `+1` here
+1 : Kirill Merkushev
----
2019-08-29 13:36:29 UTC - Ivan Kelly: @bsideup @Matteo Merli would have to 
confirm, as it came with the initial commit, but it seems that isValidPosition 
returns true if a position could be valid at some point, not whether it 
currently exists
----
2019-08-29 13:36:39 UTC - Ivan Kelly: you seeing a bug with it?
----
2019-08-29 13:38:57 UTC - bsideup: I do :slightly_smiling_face:

Context:
<https://github.com/bsideup/liiklus/pull/176/files#diff-ce123c80dd3e12bcfbf6b16c9dba6dfdR61>

We sometimes observe errors when seek is performed. Apparently that’s due to 
that “+1” (which is not necessarily for regular seeks)
----
2019-08-29 13:39:53 UTC - bsideup: Scenario:

Every newly created consumer seeks to the latest known offset (stored *outside* 
of Pulsar, hence manual seek)
----
2019-08-29 13:40:22 UTC - Ivan Kelly: Reader or Consumer?
----
2019-08-29 13:44:35 UTC - bsideup: Consumer
----
2019-08-29 13:45:27 UTC - bsideup: `Failover` sub. type
----
2019-08-29 13:46:58 UTC - bsideup: Full code, if it helps:
<https://github.com/bsideup/liiklus/blob/49de3a30bbe5de4db7451bce898d953aa6ed93d7/plugins/pulsar-records-storage/src/main/java/com/github/bsideup/liiklus/pulsar/PulsarRecordsStorage.java#L163>
----
2019-08-29 13:47:12 UTC - Ivan Kelly: hmm, could be a bug then. Have you tried 
removing +1 and running the pulsar tests?
----
2019-08-29 13:48:33 UTC - bsideup: I haven’t (don’t have the environment set 
up) :slightly_smiling_face: Would you recommend doing so?
----
2019-08-29 13:49:26 UTC - Ivan Kelly: I would expect whatever breaks would be 
the reason it is like that :slightly_smiling_face:
----
2019-08-29 13:54:11 UTC - Quentin ADAM: ping me if you need help oin it
----
2019-08-29 13:59:08 UTC - Matteo Merli: The reason for the +1 is that the read 
position always points to the “next message to be read”. When the cursor is at 
the end of the topic, it will be positioned to a non-existing entry (and 
typically the cursor get notified when the next entry is persisted)
----
2019-08-29 14:00:31 UTC - Matteo Merli: Need to get at least one coffee before 
looking at the other code ;)
----
2019-08-29 14:02:12 UTC - bsideup: but the consumer never knows whether he is 
at the end of the topic or not, and the exception is a bit too defensive here 
it seems
heavy_plus_sign : Kirill Merkushev
----
2019-08-29 14:02:30 UTC - Kim Christian Gaarder: Using the java client, I get 
an error after using seek(timestamp). If I attempt to seek again, the consumer 
used to seek is closed. Is this a known problem?
----
2019-08-29 14:03:31 UTC - bsideup: Are there any examples of externally managed 
offsets maybe?
----
2019-08-29 14:38:45 UTC - Kim Christian Gaarder: Bug reported here: 
<https://github.com/apache/pulsar/issues/5073>
----
2019-08-29 14:52:04 UTC - bsideup: Another question: why does Pulsar disallow 
seeking to an offset pointing to some old ledger?
<https://github.com/apache/pulsar/blob/83365ebd536af3b3575026a06d0b83cd506898d3/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L2735-L2736>
----
2019-08-29 14:53:44 UTC - bsideup: Use case: rewind processing to the begging 
or just some old offset
----
2019-08-29 15:02:11 UTC - bsideup: And one more: according to the user report ( 
<https://github.com/bsideup/liiklus/pull/176#discussion_r319117674> ), manual 
seek of one partitioned consumer seems to affect other consumers.

Have you observed something like that before? Or is it “by design” (although, 
would be a very unfortunate design for us :slightly_smiling_face:)
----
2019-08-29 16:36:57 UTC - Ryan Samo: Is there a way for a Pulsar producer to 
change the schema type? I know the first message produced sets the initial 
schema but how can they update the schema?
----
2019-08-29 16:39:48 UTC - Addison Higham: so two things we just noticed with 
the golang client:
- the consumer stats aren't exposed on the consumer interface, they are plumbed 
through in the C++ api so it seems like it should be fairly straightforward
- redelivery count isn't exposed in go OR in libpulsar, it seems like it is a 
bit bigger of a change, need to plumb through the value from `CommandMessage` 
struct down into where the `Message` gets created to match the design of java 
API
----
2019-08-29 16:57:12 UTC - Matteo Merli: I see, so you’re using the subscription 
but always resetting that to a particular message id. When that message id is 
at the end of the topic, you’re getting exception.

Is that the right description?
----
2019-08-29 16:58:49 UTC - Matteo Merli: &gt; Are there any examples of 
externally managed offsets maybe?

The typical scenario would be by using a Reader instead of Consumer so that the 
consumption position is always determined by the app.

(That doesn’t mean that seek() shouldn’t work correctly as well…)
----
2019-08-29 17:07:17 UTC - Poule: is there a Python function example somewhere 
that uses PickleSerde?
----
2019-08-29 17:09:37 UTC - Matteo Merli: @bsideup the check there is for a 
ledger that is missing.

eg: `ledgers = [ 1, 3, 9, 10 ]`

`seek( ledgerId: 5, entryId: 2 )`  -&gt; fail
`seek( ledgerId: 3, entryId: 2 )`  -&gt; success
+1 : bsideup
----
2019-08-29 17:10:36 UTC - Poule: or better an example of Python Fucntion that 
outputs to an Avro Topic
----
2019-08-29 17:11:09 UTC - Poule: do I need to Pickle to output to Avro topic?
----
2019-08-29 17:11:27 UTC - Matteo Merli: That shouldn’t be the case. I don’t 
understand in that report that “consumer does not reconnect”.

With failover subscription type, whenever a consumer gets disconnected, another 
available consumer will take over
----
2019-08-29 17:12:37 UTC - Matteo Merli: Regarding the disconnect. the current 
behavior is indeed to disconnect all consumers, perform the seek and let 
everyone to to reconnect.

The disconnect is ultimately unnecessary and was already removed in master.
+1 : Kirill Merkushev, bsideup
----
2019-08-29 17:13:33 UTC - Matteo Merli: At this point, the Python functions are 
not yet integrated with the schema, so you would have to deal with the 
serialization directly
----
2019-08-29 17:14:34 UTC - Poule: how can I do this?
----
2019-08-29 17:19:36 UTC - Matteo Merli: uhm.. sorry my answer above was not 
precise.

In Python function, the serialization is indeed defined by the Serde that is 
configured.

What it does not (yet) do is to use that as the “schema” for the topic (to be 
validated and enforced by brokers).

You can still define an Avro Serde and define that when you create the function 
(`--output-serde-classname MyAvroSerde`)
----
2019-08-29 17:20:07 UTC - Matteo Merli: however that won’t automatically set 
the topic schema to Avro with a specific schema
----
2019-08-29 17:20:35 UTC - Retardust: @Anonymitaet cool, any idea when? tag me 
here when it's happens, ok? we probably will test that in near future for our 
legacy system replication needs. It's hard to use java or any other lang pulsar 
clients there:(
----
2019-08-29 17:21:18 UTC - Sijie Guo: I think what you are looking for is close 
to the issue described here: <https://github.com/apache/pulsar/issues/4806>
----
2019-08-29 17:21:39 UTC - Sijie Guo: See the discussion in 
<https://github.com/apache/pulsar/pull/5056#issuecomment-525519313>
----
2019-08-29 17:22:24 UTC - Ryan Samo: Thanks @Sijie Guo I’ll check it out
----
2019-08-29 18:29:49 UTC - Poule: not sure what to put in `serialize()` and 
`deserialize()` though
Do I need to go find a python lib that translates avro-to-python_dict and 
python_dict-to-avro?
----
2019-08-29 18:30:09 UTC - Poule: is it the idea
----
2019-08-29 18:30:59 UTC - Poule: because there is no`AvroSerDe` in 
<https://github.com/apache/pulsar/blob/master/pulsar-client-cpp/python/pulsar/functions/serde.py>
----
2019-08-29 18:31:28 UTC - Matteo Merli: yes, you would have to define the Avro 
serde
----
2019-08-29 18:32:01 UTC - Poule: ok I begin to understand
----
2019-08-29 18:35:03 UTC - Retardust: is there any build in pulsar functions? I 
have a very simple case, two topics "buffer" and "target". buffer fills with 
changes and should be connect to the target only after some external event. So 
I want to start function that sinks "buffer" to the "target" on event occurred. 
Is there any simple bridge function in pulsar or I need to write my own and 
deploy it?)
----
2019-08-29 18:37:37 UTC - Poule: @Retardust i think you need to write it.
----
2019-08-29 19:25:30 UTC - Ming Fang: I’m looking to use NiFi with Pulsar.  NiFi 
processors seems to fulfill the same role as Pulsar Functions/Source/Sink.  Are 
there any advantages to use the Pulsar Functions over using NiFi processors?
----
2019-08-29 19:26:18 UTC - Ali Ahmed: @Ming Fang You have to run nifi instances 
yourself, pulsar function worker can run the instances for you and scale out as 
needed
----
2019-08-29 19:28:18 UTC - Ming Fang: Yes that’s true. Although I actually like 
the idea of not running processing on the broker nodes to avoid resource 
contention.  If I use Pulsar Functions I would run them as separate Kubernetes 
pods.
----
2019-08-29 19:29:31 UTC - Ming Fang: The nice thing about NiFi is that it has 
hundreds on connectors, has NiFi Registry + git integration for deployment, and 
has visual tool for development and debuging
----
2019-08-29 19:31:14 UTC - Ali Ahmed: sure if your compute is stateless that 
could work, with functions you have a replicated state store built in, it all 
depends on you specific needs.
----
2019-08-29 19:34:45 UTC - Ming Fang: You’re absolutely right! state is the key 
advantage. I think I’m going to build a new NiFi processor call PulsarFunction 
to complement the existing Consumer/Publisher processors
----
2019-08-29 19:35:48 UTC - David Kjerrumgaard: @Ming Fang In addition to the 
lack of state, NiFi only provides you with a limited number of processors with 
which to work. So if you are able to perform the necessary processing using one 
or more of those processors, then you are fine. With Pulsar functions you can 
implement any Java, Python, or Go code that you like (including using 
third-party libraries).
----
2019-08-29 19:38:16 UTC - Ming Fang: That’s an excellent point. It would be 
perfect if NiFi runs it processors inside a Docker container.  That way I can 
package anything I want to avoid this limitation
----
2019-08-29 19:38:18 UTC - David Kjerrumgaard: @Ming Fang When I wrote the NiFi 
processors, I envisioned a use case where you can leverage the existing suite 
of NiFi connectors to feed data into a Pulsar topic, have some Pulsar functions 
perform the processing you require and publish to a "results" topic. Then, if 
you like, you can consume from the results topic in another NiFi topic.  Sort 
of a hybrid and complementary approach.
----
2019-08-29 19:40:40 UTC - David Kjerrumgaard: This is a common pattern with 
NiFi, to use it to push data into a messaging system such as Pulsar or Kafka 
for downstream processing. Again, one of the biggest limitations of NiFi is the 
lack of disk space for storing data. It wasn't designed to hold incoming data 
for days, it was designed to move data from system A to system B quickly.
----
2019-08-29 19:43:15 UTC - Ming Fang: @David Kjerrumgaard Thanks for your 
insight. It was very helpful.  I’m going to play around to see if it’s feasible 
to run Pulsar Functions inside NiFi.
----
2019-08-29 19:45:18 UTC - David Kjerrumgaard: It might be possible, but bear in 
mind that if you use a processor that has high latency (such as a pulsar 
function) it will introduce backpressure to the entire flow, which might result 
in data loss from the source systems.
----
2019-08-29 19:47:24 UTC - David Kjerrumgaard: I would recommend looking at the 
ExecuteGroovyScript processor in NiFi as an example of a "wrapper" processor 
that wraps random code.
+1 : Ming Fang
----
2019-08-29 20:25:24 UTC - Kirill Merkushev: &gt;Is that the right description?
Yes, that what happens (I’m aware of the case described)
----
2019-08-29 21:49:30 UTC - bsideup: &gt; The typical scenario would be by using 
a Reader

but it doesn’t support per-partition reader, doesn’t it?
----
2019-08-29 23:34:25 UTC - jialin liu: Hi, how can I change the max message size?
----
2019-08-29 23:35:00 UTC - jialin liu: I tested with pulsar-perf with 1MB msg 
size:  ./pulsar-perf produce -r 100 -s 1024000 -n 1 -t 1 -time 120 
<persistent://public/functions/test1_topic1>
----
2019-08-29 23:35:10 UTC - jialin liu: run into error like: 23:27:07.713 
[pulsar-timer-5-1] WARN  org.apache.pulsar.client.impl.ProducerImpl - 
[<persistent://public/functions/test1_topic1_msg1m>] [local-3-0] error while 
create opSendMsg by batch message container -- 
io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 
byte(s) of direct memory (used: 956301312, max: 960954368)
----
2019-08-29 23:35:17 UTC - jialin liu: can anybody advise ?
----
2019-08-29 23:42:50 UTC - David Kjerrumgaard: @jialin liu Use the 
`maxMessageSize` property in the `broker.conf` file
----
2019-08-29 23:47:16 UTC - David Kjerrumgaard: @jialin liu The error above seems 
to indicate that you are running out of direct memory, which Netty uses to 
store incoming messages. So you might want to increase the amount of direct 
memory (if possible) using the `-XX:MaxDirectMemorySize` JVM switch
----
2019-08-30 00:29:39 UTC - jialin liu: Thanks @David Kjerrumgaard
----
2019-08-30 00:40:10 UTC - jialin liu: 
----
2019-08-30 00:40:38 UTC - jialin liu: @David Kjerrumgaard we have maxdirect 
memory size as 2G, but still facing the same error
----
2019-08-30 01:35:13 UTC - Chris DiGiovanni: Running Pulsar 2.3.0  and having 
issues with autorecovery not being able to finish ledgerreplication.  On my 
bookies I'm seeing errors like the below:
```
2019-08-29 20:33:12.858 [BookKeeperClientWorker-OrderedExecutor-0-0] ERROR 
org.apache.bookkeeper.client.PendingReadOp - Read of ledger entry failed: L3388 
E301524-E301524, Sent to 
[<http://chhq-vuppulbk01.us.drwholdings.com:3181|chhq-vuppulbk01.us.drwholdings.com:3181>,
 
<http://chhq-vuppulbk04.us.drwholdings.com:3181|chhq-vuppulbk04.us.drwholdings.com:3181>,
 
<http://chhq-vuppulbk03.us.drwholdings.com:3181|chhq-vuppulbk03.us.drwholdings.com:3181>],
 Heard from [] : bitset = {}, Error = 'Too many requests to the same Bookie'. 
First unread entry is (-1, rc = null)
2019-08-29 20:33:12.858 [BookKeeperClientScheduler-OrderedScheduler-0-0] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - Timed-out 123 operations 
to channel [id: 0xf6a3fc7d, L:/10.8.53.81:45124 - 
R:<http://chhq-vuppulbk01.us.drwholdings.com/10.8.53.66:3181|chhq-vuppulbk01.us.drwholdings.com/10.8.53.66:3181>]
 for 
<http://chhq-vuppulbk01.us.drwholdings.com:3181|chhq-vuppulbk01.us.drwholdings.com:3181>
2019-08-29 20:33:12.858 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  
org.apache.bookkeeper.client.PendingReadOp - Error: Bookie operation timeout 
while reading L3388 E300040 from bookie: 
<http://chhq-vuppulbk01.us.drwholdings.com:3181|chhq-vuppulbk01.us.drwholdings.com:3181>
2019-08-29 20:33:12.858 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  
org.apache.bookkeeper.client.PendingReadOp - Error: Bookie operation timeout 
while reading L3388 E300190 from bookie: 
<http://chhq-vuppulbk01.us.drwholdings.com:3181|chhq-vuppulbk01.us.drwholdings.com:3181>
2019-08-29 20:33:12.858 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  
org.apache.bookkeeper.client.PendingReadOp - Error: Bookie operation timeout 
while reading L3388 E300202 from bookie: 
<http://chhq-vuppulbk01.us.drwholdings.com:3181|chhq-vuppulbk01.us.drwholdings.com:3181>
2019-08-29 20:33:12.858 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  
org.apache.bookkeeper.client.PendingReadOp - Error: Bookie operation timeout 
while reading L3388 E300118 from bookie: 
<http://chhq-vuppulbk01.us.drwholdings.com:3181|chhq-vuppulbk01.us.drwholdings.com:3181>
2019-08-29 20:33:12.858 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  
org.apache.bookkeeper.client.PendingReadOp - Error: Bookie operation timeout 
while reading L3388 E300199 from bookie: 
<http://chhq-vuppulbk01.us.drwholdings.com:3181|chhq-vuppulbk01.us.drwholdings.com:3181>
2019-08-29 20:33:12.858 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  
org.apache.bookkeeper.client.PendingReadOp - Error: Bookie operation timeout 
while reading L3388 E300067 from bookie: 
<http://chhq-vuppulbk01.us.drwholdings.com:3181|chhq-vuppulbk01.us.drwholdings.com:3181>
2019-08-29 20:33:12.858 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  
org.apache.bookkeeper.client.PendingReadOp - Error: Bookie operation timeout 
while reading L3388 E302679 from bookie: 
<http://chhq-vuppulbk01.us.drwholdings.com:3181|chhq-vuppulbk01.us.drwholdings.com:3181>
```
----
2019-08-30 01:37:43 UTC - Anonymitaet: @Retardust will document ASAP and keep u 
updated 
----
2019-08-30 01:37:45 UTC - Chris DiGiovanni: Any ideas would be helpful.  
Disabling auto recovery and re-enabling sometimes helps but it has been getting 
stuck at 1 unreplicated ledger
----
2019-08-30 01:52:16 UTC - Sijie Guo: user ledger command `bin/bookkeeper shell` 
to check the metadata for ledger `3388`?
----
2019-08-30 01:54:10 UTC - Chris DiGiovanni: What exactly am I looking for in 
this output?
----
2019-08-30 01:56:11 UTC - Chris DiGiovanni: Filled up my scrollback buffer in 
tmux :shrug:
----
2019-08-30 02:05:00 UTC - Chris DiGiovanni: Well I'm stuck at a single ledger 
that is unreplicated.  I eneded up running ledgermetadata on the unreplicated 
ledgerid 9780 and below is the output:

```
ledgerID: 9780
LedgerMetadata{formatVersion=2, ensembleSize=3, writeQuorumSize=3, 
ackQuorumSize=2, state=CLOSED, length=2147599016, lastEntryId=27861, 
digestType=CRC32C, password=base64:, 
ensembles={0=[<http://chhq-vuppulbk06.us.drwholdings.com:3181|chhq-vuppulbk06.us.drwholdings.com:3181>,
 
<http://chhq-vuppulbk01.us.drwholdings.com:3181|chhq-vuppulbk01.us.drwholdings.com:3181>,
 
<http://chhq-vuppulbk08.us.drwholdings.com:3181|chhq-vuppulbk08.us.drwholdings.com:3181>],
 
19516=[<http://chhq-vuppulbk06.us.drwholdings.com:3181|chhq-vuppulbk06.us.drwholdings.com:3181>,
 
<http://chhq-vuppulbk04.us.drwholdings.com:3181|chhq-vuppulbk04.us.drwholdings.com:3181>,
 
<http://chhq-vuppulbk08.us.drwholdings.com:3181|chhq-vuppulbk08.us.drwholdings.com:3181>]},
 customMetadata={component=base64:bWFuYWdlZC1sZWRnZXI=, 
pulsar/managed-ledger=base64:ZmlvL2RlZmF1bHQvcGVyc2lzdGVudC9jbWU=, 
application=base64:cHVsc2Fy}}
```
----
2019-08-30 02:06:40 UTC - Chris DiGiovanni: Looks like the ledger is 2Gi 
large...  Not a lot of data.
----
2019-08-30 02:07:04 UTC - Chris DiGiovanni: Assuming length size in bytes...
----
2019-08-30 02:19:43 UTC - Chris DiGiovanni: `bookkeeper shell 
listunderreplicated -printreplicationworkerid -printmissingreplica` produces 
the following output.

```
9780
        Ctime : 1567094832243
        MissingReplica : 
<http://chhq-vuppulbk01.us.drwholdings.com:3181|chhq-vuppulbk01.us.drwholdings.com:3181>
        MissingReplica : 
<http://chhq-vuppulbk04.us.drwholdings.com:3181|chhq-vuppulbk04.us.drwholdings.com:3181>
        MissingReplica : 
<http://chhq-vuppulbk08.us.drwholdings.com:3181|chhq-vuppulbk08.us.drwholdings.com:3181>
        MissingReplica : 
<http://chhq-vuppulbk06.us.drwholdings.com:3181|chhq-vuppulbk06.us.drwholdings.com:3181>
```
----
2019-08-30 02:43:19 UTC - Ali Ahmed: I am thinking of adding a second message 
type to pulsar-client this one will separate messages on semicolon instead of 
comma , so the pulsar-cli can work with json type strings.
----
2019-08-30 02:43:27 UTC - Ali Ahmed: any opinions on the matter
----
2019-08-30 02:45:57 UTC - Igor Zubchenok: I researched a memory allocations in 
pulsar broker that causes a lot of GC especially in my case and there are my 
findings:
Image of java flight recorder, allocated 42GB memory per 5 minutes:
----
2019-08-30 02:46:09 UTC - Igor Zubchenok: 1. In bookkeeper client there is a 
ConcurrentSkipListMap that allocates a lot on iterating.
----
2019-08-30 02:46:25 UTC - Igor Zubchenok: 2. PositionImpl - this class looks 
immutable, however it is copied there and generates 6GB: 
<https://github.com/apache/pulsar/blob/e78beaaf4815509dc906838aaf4057a6c1445d0b/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java#L1849>
----
2019-08-30 02:46:37 UTC - Igor Zubchenok: 3. Is this serialization really 
needed? This allocates 820MB. 
<https://github.com/apache/pulsar/blob/7be1ee1fdb59421ac858b38840d3baf8c9073a5c/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/AbstractBaseDispatcher.java#L77>
----
2019-08-30 02:47:02 UTC - Igor Zubchenok: 4. I have 'expose topic metrics' 
disabled, but there is still allocations for topic stats when collecting data 
for prometheus. Called `NamespaceStatsAggregator.getTopicStats` and it 
allocates 200MB/5min, it not much, but why?
<https://github.com/apache/pulsar/blob/e78beaaf4815509dc906838aaf4057a6c1445d0b/pulsar-broker/src/main/java/org/apache/pulsar/broker/stats/prometheus/NamespaceStatsAggregator.java#L64>
----
2019-08-30 02:51:22 UTC - Chris DiGiovanni: Well I'm able to read the ledger 
just fine using bookkeeper shell readledger.
----
2019-08-30 02:52:28 UTC - Chris DiGiovanni: If i run bookkeeper shell recover 
on any of those bookkeepers I get those bookie operation timeouts.
----
2019-08-30 04:08:38 UTC - tuteng: @Retardust  You can refer to this document 
first, and we will gradually improve it later. 
<https://github.com/apache/pulsar/blob/4ddb51ff8c1b200f329cc70ca24fb8e02c0abbc4/site2/docs/io-netty.md>
----
2019-08-30 06:20:24 UTC - Retardust: I see, thanks! Is there at-most-once 
guaranties only for tcp?  Or I could receive acks somehow?
----
2019-08-30 06:33:12 UTC - tuteng: You can set semantics through this option 
--processing-guarantees Possible Values: [ATLEAST_ONCE, ATMOST_ONCE, 
EFFECTIVELY_ONCE]
----
2019-08-30 06:36:04 UTC - bsideup: right… I missed the null check, sorry :+1:
----
2019-08-30 06:37:00 UTC - bsideup: Great, thank you! Any ETA of this change? I 
assume this is a broker-side change, not client-side?
----
2019-08-30 06:38:18 UTC - Retardust: cool, and how it will work with tcp? I 
need receive ack on client side I suppose for at_least_once? I will experiment 
later)
----
2019-08-30 06:41:40 UTC - bsideup: How does rebalancing work with partitioned 
topics?
Consider we have 2 instances (services, on different machines) of our consumer 
(failover sub. type). The topic has 32 partitions.
The consumers subscribe to each sub-topic, so that every instance has 32 
consumers (1 per partition, for per-partition processing).

How to ensure that the load is distributed equally between the *instances*?
----
2019-08-30 06:58:01 UTC - tuteng: This semantics is applied to the source, here 
netty source, so it can work here. The client receives and then ack according 
to normal logic
----
2019-08-30 07:01:15 UTC - Retardust: I mean on producer side, sorry. "our 
legacy" -&gt; "netty source" -&gt; "pulsar topic". How do I ensure that netty 
source send message to pulsar and receive ack on "our legacy" side?)
----
2019-08-30 07:02:11 UTC - Retardust: I mean "our legacy" is producer:)
----
2019-08-30 07:33:38 UTC - tuteng: I see, I think you can add an ack here 
<https://github.com/apache/pulsar/blob/master/pulsar-io/netty/src/main/java/org/apache/pulsar/io/netty/server/NettyServerHandler.java#L60>
 after successfully sending the data to topic. Similar to this function 
@Override
        public void ack() {
            <http://log.info|log.info>("netty record ack id is {}", this.id);
            connector.ack(this.id);
        }
----
2019-08-30 08:24:30 UTC - heathkang: @heathkang has joined the channel
----
2019-08-30 08:35:33 UTC - heathkang: Hi, I use docker pulsar and try to test 
write topic by go-client, but get a ` BrokerPersistenceError`:
```
2019/08/30 16:22:13.601 c_client.go:68: [info] WARN  | ClientConnection:902 | 
[[::1]:52026 -&gt; [::1]:6650] Received error response from server: 11 -- 
req_id: 0
2019/08/30 16:22:13.601 c_client.go:68: [info] ERROR | ProducerImpl:216 | 
[<persistent://public/default/my-topic>, ] Failed to create producer: 
BrokerPersistenceError
```
I am wondering what is `BrokerPersistenceError`? Thanks for your help.
----
2019-08-30 08:36:47 UTC - Rowanto: @Rowanto has joined the channel
wave : bsideup
----
2019-08-30 08:49:20 UTC - Rowanto: Actually the issue is more at line 2729 and 
2744.

I checked out in Zookeeper on a ledger
`get /ledgers/00/0000/L2242`
which resulted in

```
...
ensembleSize: 2
length: 12066
lastEntryId: 20
state: CLOSED
...
```

doing a `seek( ledgerId: 2242, entryId: 21 )`  will only succeed if it's the 
latest ledger, because in line 2729, the condition is:
`return position.getEntryId() &lt;= (last.getEntryId() + 1);`

and in line 2724 (case when it's an old ledger), the condition is:
`return position.getEntryId() &lt; ls.getEntries()`
+1 : bsideup, Kirill Merkushev
100 : bsideup
----
2019-08-30 08:52:04 UTC - Rowanto: hence, our manual seek for older consumers 
are triggering a lot of InvalidCursorPositionException (in an endless loop)
----

Slack digest for #general - 2019-08-30

Reply via email to