Slack digest for #general - 2020-07-07

Apache Pulsar Slack Tue, 07 Jul 2020 02:11:32 -0700

2020-07-06 11:58:05 UTC - yuanbo: Any idea on how to improve KOP performance in 
large scale cluster ?
----
2020-07-06 11:59:42 UTC - David Lanouette: Could you be a bit more specific?  
What size cluster are you talking about?  What performance issues are you 
seeing? etc.
----
2020-07-06 12:00:15 UTC - David Lanouette: Also, you might get a better 
response over in <#C0106RFKAHM|kop>
----
2020-07-06 12:07:20 UTC - yuanbo: Thanks for your response, I will add more 
information later.
----
2020-07-06 12:24:25 UTC - yuanbo: Seems my colleague has provided details in 
<#C0106RFKAHM|kop> . There is a lot of traffic in our cluster, and KOP doesn't 
very fit in the case. We will try to figure out more ideas about how to improve 
it and hopefully feed them back to the community.
----
2020-07-06 12:27:02 UTC - David Lanouette: Who is your colleague?  I don't see 
anything over there that looks similar to your comment/question.
----
2020-07-06 12:30:40 UTC - yuanbo: 
<https://apache-pulsar.slack.com/archives/C0106RFKAHM/p1585829956017300>
----
2020-07-06 12:32:08 UTC - yuanbo: about 3 months ago
----
2020-07-06 12:32:32 UTC - David Lanouette: OK.  I didn't look back that far.
----
2020-07-06 12:38:18 UTC - David Lanouette: I also see that you didn't really 
get an answer to your problem.
----
2020-07-06 12:40:39 UTC - David Lanouette: Do you know if your colleague tried 
increasing the number of threads, as @Sijie Guo suggested?
----
2020-07-06 12:47:45 UTC - yuanbo: Yeah, it's a bit complecated. When kafka 
messages are written in to Pulsar by kop, Broker needs to unpack those messages 
and makes them in order. It turns out it's not cpu-intense case, it spends a 
lot time in order.
----
2020-07-06 12:52:39 UTC - David Lanouette: Have you looked to see if there's an 
open issue in the [kop](<https://github.com/streamnative/kop>) project?  That 
may be the best place for it to get addressed.
----
2020-07-06 12:53:35 UTC - David Lanouette: Note: if you do create a new issue, 
it will be *very* helpful to have precise steps to reproduce the issue.
----
2020-07-06 12:54:31 UTC - yuanbo: Sure, I will try it, thanks a lot
----
2020-07-06 12:56:39 UTC - David Lanouette: I took a quick look, and I don't see 
an existing issue concerning performance.
----
2020-07-06 12:56:50 UTC - David Lanouette: You'll need to create a new issue.
----
2020-07-06 12:57:10 UTC - David Lanouette: Good luck.  Hopefully the team can 
look at it and find a simple fix for you.
----
2020-07-06 13:00:34 UTC - yuanbo: They have opened a discussion issue there 
:sweat_smile:.
<https://github.com/streamnative/kop/issues/134>
----
2020-07-06 13:07:05 UTC - David Lanouette: Interesting.  Looks like it has been 
fixed.  Have you tried to re-test after that fix was put in?
----
2020-07-06 13:14:08 UTC - yuanbo: It wasn't fixed just closed.
----
2020-07-06 13:15:14 UTC - David Lanouette: It looks like [PR 
#150](<https://github.com/streamnative/kop/pull/150>) was added to address it.
----
2020-07-06 13:15:47 UTC - David Lanouette: But, maybe that didn't fix it.
----
2020-07-06 13:15:51 UTC - yuanbo: Oh, sorry my mistake.
----
2020-07-06 13:16:23 UTC - David Lanouette: if that didn't fix the issue, I'd 
suggest requesting that it gets re-opened.
----
2020-07-06 13:16:47 UTC - David Lanouette: but, hopefully that fixed the issue 
for you.
----
2020-07-06 13:20:04 UTC - yuanbo: this pr comes from my another colleague. I'm 
new to this project. Sadlly this pr doesn't have much performance improvement.
----
2020-07-06 13:50:46 UTC - Vaibhav Aiyar: @Vaibhav Aiyar has joined the channel
----
2020-07-06 14:07:41 UTC - Raphael Enns: Hi, how many active web socket 
connection can we have to the web socket api? Is there any limit?
----
2020-07-06 14:38:23 UTC - Aaron: Does the acknowledgement at batch index level 
work with cumulative acknowledgements from the consumer?
----
2020-07-06 14:56:30 UTC - Kirill Kosenko: Hello
I'm playing with Pulsar and Flink.
In the video "Query Pulsar Streams using Apache Flink - Sijie Guo" @Sijie Guo 
told that it would be possible to have end-to-end "exactly-once" between 
multiple Flink jobs when Pulsar Transactions API is supported.
So my understanding that it will be possible to consume events using 
FlinkPulsarSource, process them by multiple jobs, merge results and produce the 
result to FlinkPulsarSink only once?
Please confirm
Hope it was clear
Thank you
----
2020-07-06 15:02:04 UTC - vali: ty!
----
2020-07-06 16:23:01 UTC - Viktor: @Viktor has joined the channel
----
2020-07-06 17:08:56 UTC - Matteo Merli: Yes, we plan to address this by hiding 
the partitions concept in the future
----
2020-07-06 17:10:35 UTC - Matteo Merli: There's no predefined hard-limit (just 
check file-descriptor limit for the process)
----
2020-07-06 17:11:30 UTC - Raphael Enns: Thanks
----
2020-07-06 17:23:30 UTC - Viktor: Hello. I am trying to benchmark Pulsar using 
the open messaging benchmark framework, since it was mentioned in a few blogs.
I am unable to push pulsar beyond 100-150MB/sec


```17:16:49.708 [main] INFO - Pub rate 101306.6 msg/s / 98.9 Mb/s | Cons rate 
96507.6 msg/s / 94.2 Mb/s | Backlog: 48.6 K | Pub Latency (ms) avg: 573.0 - 
50%: 30.9 - 99%: 10043.6 - 99.9%: 10083.8 - Max: 10106.4
17:16:59.818 [main] INFO - Pub rate 101131.5 msg/s / 98.8 Mb/s | Cons rate 
87866.5 msg/s / 85.8 Mb/s | Backlog: 182.6 K | Pub Latency (ms) avg: 562.7 - 
50%: 31.2 - 99%: 10039.9 - 99.9%: 10070.1 - Max: 10145.3
17:17:09.924 [main] INFO - Pub rate 152701.6 msg/s / 149.1 Mb/s | Cons rate 
84343.7 msg/s / 82.4 Mb/s | Backlog: 873.7 K | Pub Latency (ms) avg: 423.6 - 
50%: 28.9 - 99%: 10013.2 - 99.9%: 10279.0 - Max: 10323.5
17:17:20.023 [main] INFO - Pub rate 55590.7 msg/s / 54.3 Mb/s | Cons rate 
110058.3 msg/s / 107.5 Mb/s | Backlog: 323.5 K | Pub Latency (ms) avg: 984.2 - 
50%: 35.2 - 99%: 10032.8 - 99.9%: 10060.3 - Max: 10143.7
17:17:20.171 [main] INFO - ----- Aggregated Pub Latency (ms) avg: 485.8 - 50%: 
31.5 - 95%: 902.8 - 99%: 10033.4 - 99.9%: 10109.2 - 99.99%: 10280.8 - Max: 
10323.5```
It seems like some queuing happening on the producer side. Server are very 
lightly loaded on disk, cpu, network. So I am sure this is client side. See the 
very high tail latency.

I already tried also setting `maxPendingMessagesAcrossPartitions()` and 
`maxPendingMessages()` to 100K. gave more producers. it's always capped there. 
any suggestions?
----
2020-07-06 17:34:01 UTC - Kiran Chitturi: I enabled auth in pulsar and seeing 
this stacktrace when I hit the admin API with token

```17:33:15.781 [pulsar-io-22-3] WARN  
org.apache.pulsar.broker.service.ServerCnx - [/10.8.85.9:52796] Got exception 
TooLongFrameException : Adjusted frame length exceeds 5253120: 1195725860 - 
discarded
io.netty.handler.codec.TooLongFrameException: Adjusted frame length exceeds 
5253120: 1195725860 - discarded
        at 
io.netty.handler.codec.LengthFieldBasedFrameDecoder.fail(LengthFieldBasedFrameDecoder.java:503)
 ~[io.netty-netty-codec-4.1.48.Final.jar:4.1.48.Final]
        at 
io.netty.handler.codec.LengthFieldBasedFrameDecoder.failIfNecessary(LengthFieldBasedFrameDecoder.java:489)
 ~[io.netty-netty-codec-4.1.48.Final.jar:4.1.48.Final]
        at 
io.netty.handler.codec.LengthFieldBasedFrameDecoder.exceededFrameLength(LengthFieldBasedFrameDecoder.java:376)
 ~[io.netty-netty-codec-4.1.48.Final.jar:4.1.48.Final]
        at 
io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:419)
 ~[io.netty-netty-codec-4.1.48.Final.jar:4.1.48.Final]
        at 
io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:332)
 ~[io.netty-netty-codec-4.1.48.Final.jar:4.1.48.Final]
        at 
io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:498)
 ~[io.netty-netty-codec-4.1.48.Final.jar:4.1.48.Final]
        at 
io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:437)
 ~[io.netty-netty-codec-4.1.48.Final.jar:4.1.48.Final]
        at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
 ~[io.netty-netty-codec-4.1.48.Final.jar:4.1.48.Final]
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
 [io.netty-netty-transport-4.1.48.Final.jar:4.1.48.Final]
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
 [io.netty-netty-transport-4.1.48.Final.jar:4.1.48.Final]
        at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
 [io.netty-netty-transport-4.1.48.Final.jar:4.1.48.Final]
        at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
 [io.netty-netty-transport-4.1.48.Final.jar:4.1.48.Final]
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
 [io.netty-netty-transport-4.1.48.Final.jar:4.1.48.Final]
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
 [io.netty-netty-transport-4.1.48.Final.jar:4.1.48.Final]
        at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
 [io.netty-netty-transport-4.1.48.Final.jar:4.1.48.Final]```
----
2020-07-06 17:42:30 UTC - Kiran Chitturi: sasl/ssl is not enabled
----
2020-07-06 17:52:42 UTC - Sijie Guo: Correct. Currently the source connector is 
exactly-once. However the sink connector is not exactly-once. It will become 
exactly-once after Pulsar transaction is supported.
----
2020-07-06 17:53:13 UTC - Sijie Guo: Yes. It should work by design.
----
2020-07-06 17:55:02 UTC - Sijie Guo: @yuanbo - I think @jia zhai has been 
working with @aloyszhang on this. He might have more context to bring here.
----
2020-07-06 17:59:14 UTC - David Lanouette: Have you tried running `pulsar-perf 
produce`?  Is it giving you similar results?
----
2020-07-06 18:00:36 UTC - David Lanouette: Also, are you sure the network 
between the broker and the tool can handle more than 100Mb/s?
----
2020-07-06 18:04:06 UTC - Viktor: yes.. sure about the network.. it has enough 
room..

I did not run `pulsar-perf`.. but any idea why that would behave differently? 
or is that the supported benchmark tool .. OMB is nice that lets us compare 
different things. so like to stick with it, if possible.
----
2020-07-06 18:05:04 UTC - David Lanouette: That is definitely a supported 
benchmark tool.  It comes "out of the box".  But it only tests pulsar, not 
other types of brokers.
----
2020-07-06 18:05:14 UTC - David Lanouette: But, it would give you another data 
point.
----
2020-07-06 18:06:03 UTC - David Lanouette: If it's fast, but the OMB tool is 
slow, then you might have one issue.  If pulsar-perf is also slow, then it 
might be your setup.
----
2020-07-06 18:06:33 UTC - David Lanouette: You've also hit the limit of my 
knowledge of configuring and testing pulsar :slightly_smiling_face:
----
2020-07-06 18:17:34 UTC - Joshua Decosta: Did you ever manage to get this 
working? I’ve been trying to verify my broker tlsKeystore setup via OpenSSL and 
I’m having a similar issue
----
2020-07-06 18:23:23 UTC - Viktor: Thanks. let me play with pulsar-perf and get 
back
----
2020-07-06 18:23:54 UTC - David Lanouette: :thumbsup: 
----
2020-07-06 19:13:27 UTC - Devin G. Bost: It looks like the debian package for 
Pulsar 2.5.0 is no longer available on the Apache mirror. Does anyone know 
where I can find it?
----
2020-07-06 19:14:02 UTC - Devin G. Bost: I’m working on a Dockerfile for 
building Pulsar’s Go client with 2.5.2.
----
2020-07-06 19:14:05 UTC - Addison Higham: archives
----
2020-07-06 19:14:26 UTC - Devin G. Bost: Thanks. I’ll check there.
----
2020-07-06 19:14:31 UTC - Addison Higham: 
<https://archive.apache.org/dist/pulsar/pulsar-2.5.0/apache-pulsar-2.5.0-bin.tar.gz>
 &lt;- direct link
----
2020-07-06 19:14:45 UTC - Devin G. Bost: That works! Thanks!
----
2020-07-06 19:15:03 UTC - Addison Higham: only most recent minor and patch 
versions are on the mirrors, anything older goes to archives
+1 : Devin G. Bost
----
2020-07-06 19:27:03 UTC - Devin G. Bost: I’m having trouble getting the Go 
client to work in my dockerfile.
It blows up at this line:
```Step 17/24 : RUN go get -u 
<http://github.com/apache/pulsar/pulsar-client-go/pulsar|github.com/apache/pulsar/pulsar-client-go/pulsar>
 ---&gt; Running in 81c32d040c9d
# 
<http://github.com/apache/pulsar/pulsar-client-go/pulsar|github.com/apache/pulsar/pulsar-client-go/pulsar>
/usr/lib/gcc/x86_64-linux-gnu/5/../../../../lib/libpulsar.so: undefined 
reference to `std::thread::_State::~_State()@GLIBCXX_3.4.22'
/usr/lib/gcc/x86_64-linux-gnu/5/../../../../lib/libpulsar.so: undefined 
reference to `typeinfo for std::thread::_State@GLIBCXX_3.4.22'
/usr/lib/gcc/x86_64-linux-gnu/5/../../../../lib/libpulsar.so: undefined 
reference to 
`std::thread::_M_start_thread(std::unique_ptr&lt;std::thread::_State, 
std::default_delete&lt;std::thread::_State&gt; &gt;, void (*)())@GLIBCXX_3.4.22'
. . . 
[many more lines]```
Does anyone recognize this exception?
----
2020-07-06 19:34:02 UTC - Devin G. Bost: It looks like there’s a version 
mismatch. Maybe I need to build from source.
----
2020-07-06 21:01:20 UTC - Devin G. Bost: How do I determine what versions are 
in use?
----
2020-07-06 21:07:34 UTC - Devin G. Bost: e.g.
----
2020-07-06 21:09:12 UTC - Addison Higham: that is the old c++ based golang, the 
newer client, <https://github.com/apache/pulsar-client-go>, is pure golang, no 
C++ bindings involved
----
2020-07-06 21:14:34 UTC - Jon Featherstone: :wave: In the docs, it says
&gt; When subscribing to multiple topics, the Pulsar client will automatically 
make a call to the Pulsar API to discover the topics that match the regex 
pattern/list and then subscribe to all of them. If any of the topics don't 
currently exist, the consumer will auto-subscribe to them once the topics are 
created.
How quickly does the consumer see the new topics? Is this dependent on each 
client implementation?
----
2020-07-06 21:16:34 UTC - Addison Higham: by default, 1 minute, (see 
<https://pulsar.apache.org/docs/en/client-libraries-java/> 
`patternAutoDiscoveryPeriod`). It is client dependent, but most follow the java 
as a guide
----
2020-07-06 21:17:06 UTC - Addison Higham: golang property: 
<https://github.com/apache/pulsar-client-go/blob/master/pulsar/consumer.go#L87>
----
2020-07-06 21:17:53 UTC - Addison Higham: and the default is set here: 
<https://github.com/apache/pulsar-client-go/blob/6edc8f4ef954477eb36eed92e88464f56b4c48ee/pulsar/consumer_regex.go#L37>
----
2020-07-06 21:18:18 UTC - Devin G. Bost: Thanks. When I try using that one, I 
get a different error at the go get line.

```In file included from 
go/src/github.com/apache/pulsar/pulsar-client-go/pulsar/c_client.go:24:0:
./c_go_pulsar.h:22:29: fatal error: pulsar/c/client.h: No such file or directory
compilation terminated.```
I put my Dockerfile in the latest comment here: 
<https://github.com/apache/pulsar/issues/7463>
----
2020-07-06 21:22:50 UTC - Jon Featherstone: Awesome. Thanks addison!
----
2020-07-06 21:23:24 UTC - Addison Higham: oh so you are previously using the 
old client? Well then you will need to change your code (the API is very 
similar though...). In your follow on comment though, you aren't referencing 
the right golang client. The new client should be `go get 
<http://github.com/apache/pulsar-client-go/pulsar|github.com/apache/pulsar-client-go/pulsar>`
----
2020-07-06 21:23:32 UTC - Addison Higham: :bow:
----
2020-07-06 21:24:50 UTC - Devin G. Bost: Ah, thanks! That’s a very subtle path 
difference
----
2020-07-06 21:31:36 UTC - Devin G. Bost: That worked! You’re a life saver.
----
2020-07-06 21:38:09 UTC - Joshua Decosta: If the command “pulsar-admin brokers 
list use” returns “localhost:8080” would that be a sign that tls transport 
encryption is misconfigured? I’m trying to verify that I’ve configured it 
correctly 
----
2020-07-06 21:53:55 UTC - Addison Higham: That wouldn't indicate a problem, by 
default, a cluster registers itself in the list of clusters, but that address 
is only used in the case of geo-replication.
+1 : Joshua Decosta
----
2020-07-06 23:48:40 UTC - Viktor: @David Lanouette qq.. I dont see a way to 
control number of partitions for each topic using `pulsar-perf`. am I missing 
something
----
2020-07-07 02:46:46 UTC - Hiroyuki Yamada: @Penghui Li Sorry, I reported wrong 
results. Both auto-split and consistent hashing can produce inconsistently 
ordered messages
I updated the issue. <https://github.com/apache/pulsar/issues/7455>
----
2020-07-07 02:55:08 UTC - Penghui Li: Are there any messages replayed in your 
testcase? The current implementation allows the replay messages dispatch to the 
new consumer.
----
2020-07-07 02:59:16 UTC - Hiroyuki Yamada: @Penghui Li Honestly I’m not sure 
when the reply happens. Adding consumers will make it happen ?

As you can see, you can reproduce it with a super simple consumer.
I start consumers after a producer so the number of consumers is changing 
during consuming, which is very normal in Pulsar I assume.
<https://github.com/feeblefakie/misc/blob/master/pulsar/src/main/java/MyConsumer.java>
----
2020-07-07 03:02:53 UTC - Penghui Li: Ok
----
2020-07-07 06:38:27 UTC - Hiroyuki Yamada: I start looking at the code to see 
if I can do anything for that and check the behavior by adding debug print in 
the code.

Is there any better way other than doing below every time the code is updated ?
``` mvn install -DskipTests```
It takes too long time so I’m wondering what others are doing.
(Sorry I’m new to maven)
----
2020-07-07 07:46:19 UTC - Kirill Kosenko: Thank you @Sijie Guo
One more question
There're two connectors:

1. <https://github.com/streamnative/pulsar-flink>
2. 
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-72%3A+Introduce+Pulsar+Connector>

Which one should we use?
Is there any difference in functionality?

As I understood, we should use the first one(StreamNative connector) until 
Flink community merge these pull requests:
<https://github.com/apache/flink/pull/10875>
<https://github.com/apache/flink/pull/10455>
Please advice
Thank you
----
2020-07-07 08:48:09 UTC - Rahul Vashishth: it seems proxy is required for 
broker lookup. But as mentioned in the 
<https://www.youtube.com/watch?v=wGgEx1M17O0&amp;list=PLqRma1oIkcWhWAhKgImEeRiQi5vMlqTc-&amp;index=13|TGI
 Pulsar 002: Proxy and Kubernetes Deployment>, it says the client still does 
the broker lookup via proxy.

@Sijie Guo Is it important to publish both the pulsar binary port 6550 and HTTP 
port 80 on the same DNS?  As we only mention the broker URL on pulsar client, 
does the client then do a lookup on the HTTP URL on the same DNS?

can we create two different DNS for binary port and admin API HTTP port?
----

Slack digest for #general - 2020-07-07

Reply via email to