Slack digest for #general - 2020-01-26

Apache Pulsar Slack Sun, 26 Jan 2020 01:11:25 -0800

2020-01-25 10:36:48 UTC - Rahul: @Rahul has joined the channel
----
2020-01-25 16:36:31 UTC - Gaetan SNL: Hello, I'm considering Apache pulsar to 
schedule messages at specific time using the "delayed message" functionality. I 
can't find any documentation about limitations. Is it OK for example to have 1 
million messages waiting to be delivered ? Or is it ok to schedule a message in 
2 years for example ? If I understand how it's implemented it should be fine, 
no ? Thank you
----
2020-01-25 18:06:13 UTC - Nouvelle: I'm trying to determine which function 
runtime option is active on my cluster, however the `get-runtime-config` option 
of the `brokers` command is not available despite the documentation: 
<https://pulsar.apache.org/docs/en/2.4.1/pulsar-admin/#get-runtime-config>
Is there another way to determine which function runtime option is active on my 
cluster (besides inspecting the conf file; I'm looking for a way to verify)?
----
2020-01-25 18:31:54 UTC - Paul Danckaert: @Paul Danckaert has joined the channel
----
2020-01-25 19:58:26 UTC - Roman Popenov: Will do!
----
2020-01-25 20:51:28 UTC - Md. Farhan Memon: @Md. Farhan Memon has joined the 
channel
----
2020-01-25 21:35:31 UTC - Roman Popenov: 
<https://github.com/apache/pulsar/issues/6141>
----
2020-01-25 21:35:35 UTC - Roman Popenov: Done
----
2020-01-25 23:45:56 UTC - Sijie Guo: The current implementation is not good for 
scheduling too many delayed messages or long delayed duration.

There is a proposal to implement a time-wheel based approach. Once it is done,
we should able to support the use cases you mentioned.
----
2020-01-25 23:47:08 UTC - Sijie Guo: what do you mean “not available”?
----
2020-01-25 23:47:21 UTC - Sijie Guo: thanks
+1 : Roman Popenov
----
2020-01-25 23:57:47 UTC - Eugen: When consuming historical data (using
`SubscriptionInitialPosition.Earliest`) from a partitioned topic, I assume (
<https://pulsar.apache.org/docs/en/concepts-messaging/#ordering-guarantee> )
that the order of items with different keys is not strictly guaranteed. Will
items however be consumed "roughly" in order, e.g. using the `publish time` to
merge the streams at the consumer, or is the relative timing of items from
different partitions a completely nondeterministic outcome that depends on
things like network and i/o performance of the involved bookies (or even S3,
HDFS, etc in case of tiered storage)?
----
2020-01-26 00:11:09 UTC - Sijie Guo: I don’t think you can rely on publish
time. publish time currently is assigned at the client side, not the broker
side.
----
2020-01-26 04:12:29 UTC - Eugen: In my case, relying on the publish time would
in fact be preferable to broker generated timestamps, due to the fact that I
have only a single producer (with a single clock), but there would be multiple
brokers (per partition - with different clocks). But I'm not after strict
ordering here - I just want to be able to consume historical data, without
items from different partitions drifting further and further apart over time.
In other words, I want to be able to consume data "roughly" in order. I'd be
happy if items from different partitions are never out-of-sync for more than 1
second - but as the publish time is more fine-grained (at least milliseconds)
it would theoretically be possible to keep them much more in sync than that. As
I mentioned earlier, this is for consumption of historical data only. For
real-time data consumption, I would _not_ expect any pacing / throttling of
partitions so they are "roughly" in sync.
----
2020-01-26 04:35:29 UTC - Addison Higham: @Eugen this article touches on what
you can expect out of Pulsar (both in terms of tailing and historical reads)
and also gives some ways you can maybe get more control
<https://jack-vanlightly.com/blog/2019/9/4/a-look-at-multi-topic-subscriptions-with-apache-pulsar>
----
2020-01-26 04:43:47 UTC - Eugen: @Addison Higham Although the title reads
"multi-topic subscriptions", grepping for "partition" in the article, it seems
jack is addressing my question as well, and Pulsar is much better for this than
Kafka. Will read - thanks a lot!
----
2020-01-26 04:51:03 UTC - Addison Higham: :thumbsup: yeah, in pulsar,
multi-topic and partitioned subscriptions are the same thing (since partitions
in pulsar are just multiple topics)
----
2020-01-26 06:36:54 UTC - Eugen: Great article, saved me hours and days of my
time! His "A Best-Effort Strategy Based on Publisher Timestamps" section at the
end is what I was considering as well. But I don't think I can get this to work
in Pulsar with partitions (in contrast to multiple topics), as I cannot
subscribe to partitions individually and peek and pace. But I'd like to avoid
the use of multiple topics in this case where data is coming from a single
producer anyways and is one thing, logically. And as partitions are implemented
as topics in Pulsar, there may even be some (undocumented?) way to subscribe to
those internal topics... But this is getting a bit more involved than I'd like
it to be. (I think stream processing engines allow for simple stream merging
based on time stamps along those lines)
----
2020-01-26 06:58:13 UTC - Addison Higham: @Eugen you can subscribe to
individual partitions, internally, a partitioned topic is just represented by a
topic with a numbered suffix, so if you have a topic `public/default/my-topic`
that has 5 partitions, you can individually subscribe to each topic by doing
`public/default/my-topic-0` `public/default/my-topic-1` , etc
+1 : Eugen
----
2020-01-26 06:58:31 UTC - Addison Higham: err, might be `partition-&lt;n&gt;`
----
2020-01-26 07:53:18 UTC - Eugen: Good to know! So it would seem technically
possible to merge partitions on the consuming end using e.g. the publish-time.
If I end up going this route, I may in fact add this as an consumer/reader
option to Pulsar...
----

Slack digest for #general - 2020-01-26

Reply via email to