Slack digest for #general - 2019-08-23

Apache Pulsar Slack Fri, 23 Aug 2019 02:11:25 -0700

2019-08-22 09:14:46 UTC - Bon: I think you can read document again in 
geo-replication.
----
2019-08-22 10:08:12 UTC - Jan-Pieter George: Nice, checking it out. Do you use 
the websocket connection or a raw socket? I'm interested into having a multi 
threaded message pump (especially interesting with the nack and delayed 
delivery capabilities, incredibly amazing).
----
2019-08-22 11:49:39 UTC - JoeyDeng: @JoeyDeng has joined the channel
----
2019-08-22 12:33:50 UTC - Alexandre DUVAL: Pulsar SQL worker can't be joined 
through pulsar proxy?
----
2019-08-22 13:40:14 UTC - Sijie Guo: the later
----
2019-08-22 14:53:09 UTC - Addison Higham: they directly interact with bookkeeper
----
2019-08-22 14:59:04 UTC - Richard Sherman: Is there any documentation on 
setting up bookies to be rack aware?
----
2019-08-22 15:01:39 UTC - Alexandre DUVAL: What do you mean?
----
2019-08-22 15:03:38 UTC - Alexandre DUVAL: pulsar-proxy reads ssl 
configurations periodically or watch them? Because I updated my certificates 
files without reload or restart it and the certificate is the new one.
----
2019-08-22 15:03:50 UTC - Addison Higham: question about the configuration 
store (globak zk): Has there been any consideration for making that pluggable 
with different storage backends? I looked through the code and AFAICT, it isn't 
using any more advanced ZK features (locks, leader election, only uses watches 
for cache invalidation) for the global zk and it is mostly just pretty straight 
forward storage of data config.


The reason I ask: I wonder how reasonable it would be if you could implement 
the config store on top of something like gcp cloud spanner or dynamodb global 
tables. For dynamodb global tables, you could have conflicting writes, but for 
the config store, I wonder if that would be okay?
----
2019-08-22 15:05:31 UTC - Addison Higham: maybe I misunderstood your question, 
are you asking if you can have sql workers connect to pulsar via the proxy? Or 
are you asking if you can interact with the workers API via the proxy
----
2019-08-22 15:06:22 UTC - Addison Higham: in 2.4.0 they added that yes
----
2019-08-22 15:08:42 UTC - Alexandre DUVAL: Second part :slightly_smiling_face:
----
2019-08-22 15:09:21 UTC - Addison Higham: ah, not sure of that
----
2019-08-22 15:09:44 UTC - Alexandre DUVAL: It watches? Have your more 
informations of the process used?
----
2019-08-22 15:10:27 UTC - Alexandre DUVAL: From the proxy.conf I'll say no (yet 
not :p). But okay.
----
2019-08-22 15:12:39 UTC - Addison Higham: trying to find the code, but IIRC, it 
polls periodically, will post a link to code here in a minute
----
2019-08-22 15:14:14 UTC - Alexandre DUVAL: Thanks, more is it the same behavior 
for pulsar sql worker ssl conf? :smile:
----
2019-08-22 15:14:38 UTC - Alexandre DUVAL: because if pulsar-proxy can't manage 
sql workers need to have this on pulsar sql workers too.
----
2019-08-22 15:15:13 UTC - Addison Higham: 
<https://github.com/apache/pulsar/blob/d3643a072c6dfd444974e0f8b864fc053cfdb4f8/pulsar-common/src/main/java/org/apache/pulsar/common/util/SslContextAutoRefreshBuilder.java>
----
2019-08-22 15:16:16 UTC - Alexandre DUVAL: Oh it's a generic one, so probably 
used in pulsar sql worker too :slightly_smiling_face:
----
2019-08-22 15:21:10 UTC - Addison Higham: was trying to track down it's usage, 
haven't yet, but I would hope so!
----
2019-08-22 16:22:22 UTC - Raman Gupta: Can I "copy" the list of acked messages 
from one consumer to another? The use case would be creating a consumer that 
starts from where an existing consumer has left off. In Kafka, this would be 
done simply be just setting the offsets of the new consumer to the offsets of 
the old one.
----
2019-08-22 16:23:25 UTC - Addison Higham: @Raman Gupta 
<https://pulsar.apache.org/api/client/org/apache/pulsar/client/api/Consumer.html#seek-org.apache.pulsar.client.api.MessageId->
----
2019-08-22 16:24:33 UTC - Addison Higham: the seek basically resets the 
position on the server, so you just restart from there
----
2019-08-22 16:24:49 UTC - Raman Gupta: Yeah, that covers the equivalent of what 
Kafka does with offset, but it doesn't handle the per-message ack semantics of 
Pulsar.
----
2019-08-22 16:27:25 UTC - Addison Higham: not sure I understand, once you seek 
to that position, you subscription behaves pretty much like a normal 
subscription
----
2019-08-22 16:29:11 UTC - Raman Gupta: As I understand Pulsar consumer 
semantics (and I'm only researching at this point, so my understanding could be 
wrong), a consumer could receive messages 1, 2, 3, ack 2, and die. That leaves 
1 and 3 unacked. At this point, if I use the seek method above to seek to ~2~ 
3, I will miss message 1 right?
----
2019-08-22 16:31:00 UTC - Addison Higham: yes, but if you are just trying to 
reconnect to an existing subscription, why would you seek? pulsar will 
redeliver any unacknowledged messages when you reconnect
----
2019-08-22 16:31:50 UTC - Addison Higham: (this would be shared subscription 
behavior, with an exclusive or failover subscription, cumulative acking is 
used, so acking 2 would also ack 1)
----
2019-08-22 16:32:56 UTC - Raman Gupta: I'm just thinking through operational 
concerns. For example, I might want to rename a subscription. There might be 
other use cases as well for creating a new subscription that is "copy" of an 
existing one.
----
2019-08-22 16:44:03 UTC - Raman Gupta: Seek works, just have to make sure that 
all the messages prior to the seeked one have been consumed/acked, and that 
none of the messages after it have been, by the original subscription. Is there 
an easy way to verify that?
----
2019-08-22 17:12:45 UTC - jialin liu: Hi, what is the typical message size that 
pulsar can handle? ~KB or ~MB?
----
2019-08-22 17:34:57 UTC - Jon Bock: Both are possible with the right 
configuration, KBs are by far the most common but up to 1MB message sizes are 
regularly tested.
----
2019-08-22 17:35:39 UTC - jialin liu: Is it designed to handle image or video? 
if not, do you have any suggestions?
----
2019-08-22 17:46:14 UTC - Matteo Merli: @Addison Higham Yes, the plan was to 
make it pluggable. The only feature we use (other than pure get/put) is the 
notifications. The watches are use to make sure the policies caches are updated 
in all brokers.

That part needs to be abstracted out, since most key-value stores won’t offer 
that.
----
2019-08-22 17:48:38 UTC - Jon Bock: Pulsar is agnostic to the message content 
type.  For very large message payloads, you can either break up the payload 
into smaller pieces (e.g. break up a video into frames) or have the message 
body include a reference to the external location of the object.
----
2019-08-22 17:49:27 UTC - Addison Higham: :thinking_face: what are the 
consistency needs for that data? I couldn't see any immediate problem why the 
dynamo global tables (multi-master eventual consistency with last-write-wins 
conflict resolution) could  be okay as most changes are made to the local table 
and then just followed by other regions
----
2019-08-22 17:49:43 UTC - Matteo Merli: eventual consistency is perfectly fine
----
2019-08-22 17:50:04 UTC - Addison Higham: and the last-write-wins? is there any 
data that would be contented between different regions?
----
2019-08-22 17:50:21 UTC - Addison Higham: I couldn't see anything obvious in 
the geo-replication use case
----
2019-08-22 17:51:06 UTC - Matteo Merli: uhm, ideally there you’d want to 
last-write-wins, to ensure that every one will eventually reach the same state
----
2019-08-22 17:52:04 UTC - Matteo Merli: the advantage for ZK there is that we 
have a global quorum that is writable when 1 region is out, yet still 
strong-consistent on writes
----
2019-08-22 17:57:12 UTC - Addison Higham: :thinking_face: so dynamo global 
tables would seem to work fine from a consistency perspective, you could use 
dynamodb streams for notifications to get the cache updates as well
----
2019-08-22 18:00:06 UTC - Addison Higham: and you could mostly just change the 
current globalZk to a new interface of something like:
- put(fullPath, data)
- get(fullPath)
- listChildren(path)
- watch(path, callback)
?

Or would you want to make more of a DAO/Model pattern and move most of the 
state changes into a more explicit calls?
Like `configStore.addNamespace(namespace)` etc?
----
2019-08-22 18:08:00 UTC - Karthik Ramasamy: Some users are using close to 5MB 
size messages
----
2019-08-22 18:11:01 UTC - Raman Gupta: A related question to my previous one: 
what would the best metric be to track that would be the equivalent of Kafka's 
consumer lag?
----
2019-08-22 18:18:48 UTC - Matteo Merli: I need to retake a look at that code. 
The current approach is caching objects, already deserialized from JSON after 
reading from ZK, with the watches to trigger the cache invalidation.
----
2019-08-22 18:20:17 UTC - Devin G. Bost: ```
rg.apache.pulsar.client.impl.ClientCnx  : Error during handshake

javax.net.ssl.SSLException: SSLEngine closed already
        at 
org.apache.pulsar.shade.io.netty.handler.ssl.SslHandler.wrap(...)(Unknown 
Source) ~[pulsar-client-2.4.0.jar!/:2.4.0]

2019-08-22 12:14:34.068  WARN 7 --- [r-client-io-1-1] 
org.apache.pulsar.client.impl.ClientCnx  : 
[<http://dec01.overstock.com/10.15.33.233:8080|dec01.overstock.com/10.15.33.233:8080>]
 Got exception DecoderException : javax.net.ssl.SSLHandshakeException: 
error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
```

Have any of you guys seen this before?
----
2019-08-22 18:20:55 UTC - Karthik Ramasamy: @Raman Gupta There is a backlog 
metric for every subscription
----
2019-08-22 18:30:38 UTC - Addison Higham: I will look as well and try and do a 
write up, I am about to embark on either this or global zk, so motivated to at 
least do the research to see if it is worth pursuing
----
2019-08-22 18:31:00 UTC - Matteo Merli: :+1:
----
2019-08-22 18:38:19 UTC - Devin G. Bost: I'm looking at:

```
PulsarClient client = PulsarClient.builder()
    .serviceUrl("<pulsar+ssl://broker.example.com:6651/>")
    .enableTls(true)
    .tlsTrustCertsFilePath("/path/to/ca.cert.pem")
    .authentication("org.apache.pulsar.client.impl.auth.AuthenticationTls",
                    
"tlsCertFile:/path/to/my-role.cert.pem,tlsKeyFile:/path/to/my-role.key-pk8.pem")
    .build();
```
in the docs (<https://pulsar.apache.org/docs/en/security-tls-authentication/>), 
and I'm noticing that `.enableTls(..)` is actually deprecated.
----
2019-08-22 18:38:24 UTC - Devin G. Bost: What's the reason for the deprecation?
----
2019-08-22 18:38:58 UTC - Devin G. Bost: We're thinking that the handshake 
exception has something to do with the TLS configuration.
----
2019-08-22 18:39:58 UTC - Devin G. Bost: @David Kjerrumgaard Have you seen this 
before?
----
2019-08-22 18:47:34 UTC - Sijie Guo: `pulsar+ssl` indicates it is a TLS secured 
service. so we don’t actually need to specify `enableTls(true)`.
+1 : Devin G. Bost, Ali Ahmed
----
2019-08-22 18:50:26 UTC - Matteo Merli: Most of the time it’s a client with TLS 
connecting to a non encrypted endpoint or vice-versa
+1 : Devin G. Bost
----
2019-08-22 18:50:57 UTC - Matteo Merli: (or non valid certificates configured 
in brokers)
----
2019-08-22 19:11:08 UTC - Pete Tanski: @Pete Tanski has joined the channel
----
2019-08-22 20:30:21 UTC - Raman Gupta: Thanks. I did note when I fired up the 
sandbox the backlog shown in the dashboard was wildly incorrect.
----
2019-08-22 20:54:08 UTC - Igor Zubchenok: --
After update from build `Pulsar 2.2.0-streamlio-5` to `Pulsar 2.4.0`, I noticed 
~3-4x slower* performance than we had with 2.2.0.
We investigated a bit changes and found out that for every topic (we have 
around 50-100K alive topics) there is a new delayed delivery feature that uses 
some system resources and this caused to some performance degradation.
After I added `delayedDeliveryEnabled=false` we got I a bit better, but it is 
still ~2x slower.
What else could be tuned to get better performance as it was in 2.2.0 or better?
P.S. * _slower_ — I mean we have slower time of delivery of a message from 
publisher to consumer.
----
2019-08-22 21:42:21 UTC - Luke Lu: You mean the latency between consumer time 
and producer time? Is throughput (messages per second) is affected? Do you have 
any numbers?
----
2019-08-22 23:45:11 UTC - Matteo Merli: @Igor Zubchenok

&gt; there is a new delayed delivery feature that uses some system resources 
and this caused to some performance degradation.

Do you have a heap dump that shows the diff between the 2? The expectation is 
that there should be no difference if the messages are not marked for delays.

&gt;  I mean we have slower time of delivery of a message from publisher to 
consumer.

Can you quantify it in absolute numbers (eg: 10 to 20ms? avg or 99pct?)
----
2019-08-22 23:57:02 UTC - Addison Higham: @Matteo Merli 
<https://docs.google.com/document/d/18HPgFN8LOsxSBIScrldKWTmJvFkUvRemMoit1KQjcik/edit#>
 first really rough pass at some research and my best approximation of what I 
think might work
----
2019-08-22 23:57:44 UTC - Addison Higham: right now, I am thinking we are going 
to move forward with global ZK right now, but we might be able to lend a hand 
if this is something that get some traction
----
2019-08-22 23:58:20 UTC - Addison Higham: am headed out, but feel free to add 
comments there
----
2019-08-22 23:58:51 UTC - Matteo Merli: yes, we had this task planned for this 
year, both to make the conf store pluggable as well as the general metadata 
store
----
2019-08-22 23:59:14 UTC - Matteo Merli: (don’t have access to your doc yet)
----
2019-08-23 00:56:25 UTC - Igor Zubchenok: I need to prepare to answer. I'll 
back with something.
----
2019-08-23 03:12:00 UTC - Igor Zubchenok: I din't found anything in Pulsar 
broker metrics, but found that producer has unstable send latency. (this chat 
is for pulsar producer stat, I publish a small message every 100 ms to topic in 
production cluster and collect stat every second)
----
2019-08-23 03:14:33 UTC - Igor Zubchenok: @Matteo Merli I din't found anything 
in Pulsar broker metrics, but found that producer has unstable send latency.
----
2019-08-23 03:15:29 UTC - Igor Zubchenok: &gt;Do you have a heap dump that 
shows the diff between the 2
No, I don't have old version running anymore.
----
2019-08-23 03:46:59 UTC - Igor Zubchenok: How to find the reason of latency?
----
2019-08-23 04:34:03 UTC - tuteng: hi jerry

any progress on this PR?
----
2019-08-23 06:40:44 UTC - Walter: @Walter has joined the channel
----
2019-08-23 07:17:13 UTC - Vladimir Ontikov: @Vladimir Ontikov has joined the 
channel
----
2019-08-23 07:44:27 UTC - Richard Sherman: It isn't so much wildly incorrect 
just out of date, as the dashboard by default  collects stats once a minute.
----
2019-08-23 08:38:30 UTC - Wenjian Jiang: @Wenjian Jiang has joined the channel
----

Slack digest for #general - 2019-08-23

Reply via email to