2020-01-25 10:36:48 UTC - Rahul: @Rahul has joined the channel ---- 2020-01-25 16:36:31 UTC - Gaetan SNL: Hello, I'm considering Apache pulsar to schedule messages at specific time using the "delayed message" functionality. I can't find any documentation about limitations. Is it OK for example to have 1 million messages waiting to be delivered ? Or is it ok to schedule a message in 2 years for example ? If I understand how it's implemented it should be fine, no ? Thank you ---- 2020-01-25 18:06:13 UTC - Nouvelle: I'm trying to determine which function runtime option is active on my cluster, however the `get-runtime-config` option of the `brokers` command is not available despite the documentation: <https://pulsar.apache.org/docs/en/2.4.1/pulsar-admin/#get-runtime-config> Is there another way to determine which function runtime option is active on my cluster (besides inspecting the conf file; I'm looking for a way to verify)? ---- 2020-01-25 18:31:54 UTC - Paul Danckaert: @Paul Danckaert has joined the channel ---- 2020-01-25 19:58:26 UTC - Roman Popenov: Will do! ---- 2020-01-25 20:51:28 UTC - Md. Farhan Memon: @Md. Farhan Memon has joined the channel ---- 2020-01-25 21:35:31 UTC - Roman Popenov: <https://github.com/apache/pulsar/issues/6141> ---- 2020-01-25 21:35:35 UTC - Roman Popenov: Done ---- 2020-01-25 23:45:56 UTC - Sijie Guo: The current implementation is not good for scheduling too many delayed messages or long delayed duration.
There is a proposal to implement a time-wheel based approach. Once it is done, we should able to support the use cases you mentioned. ---- 2020-01-25 23:47:08 UTC - Sijie Guo: what do you mean “not available”? ---- 2020-01-25 23:47:21 UTC - Sijie Guo: thanks +1 : Roman Popenov ---- 2020-01-25 23:57:47 UTC - Eugen: When consuming historical data (using `SubscriptionInitialPosition.Earliest`) from a partitioned topic, I assume ( <https://pulsar.apache.org/docs/en/concepts-messaging/#ordering-guarantee> ) that the order of items with different keys is not strictly guaranteed. Will items however be consumed "roughly" in order, e.g. using the `publish time` to merge the streams at the consumer, or is the relative timing of items from different partitions a completely nondeterministic outcome that depends on things like network and i/o performance of the involved bookies (or even S3, HDFS, etc in case of tiered storage)? ---- 2020-01-26 00:11:09 UTC - Sijie Guo: I don’t think you can rely on publish time. publish time currently is assigned at the client side, not the broker side. ---- 2020-01-26 04:12:29 UTC - Eugen: In my case, relying on the publish time would in fact be preferable to broker generated timestamps, due to the fact that I have only a single producer (with a single clock), but there would be multiple brokers (per partition - with different clocks). But I'm not after strict ordering here - I just want to be able to consume historical data, without items from different partitions drifting further and further apart over time. In other words, I want to be able to consume data "roughly" in order. I'd be happy if items from different partitions are never out-of-sync for more than 1 second - but as the publish time is more fine-grained (at least milliseconds) it would theoretically be possible to keep them much more in sync than that. As I mentioned earlier, this is for consumption of historical data only. For real-time data consumption, I would _not_ expect any pacing / throttling of partitions so they are "roughly" in sync. ---- 2020-01-26 04:35:29 UTC - Addison Higham: @Eugen this article touches on what you can expect out of Pulsar (both in terms of tailing and historical reads) and also gives some ways you can maybe get more control <https://jack-vanlightly.com/blog/2019/9/4/a-look-at-multi-topic-subscriptions-with-apache-pulsar> ---- 2020-01-26 04:43:47 UTC - Eugen: @Addison Higham Although the title reads "multi-topic subscriptions", grepping for "partition" in the article, it seems jack is addressing my question as well, and Pulsar is much better for this than Kafka. Will read - thanks a lot! ---- 2020-01-26 04:51:03 UTC - Addison Higham: :thumbsup: yeah, in pulsar, multi-topic and partitioned subscriptions are the same thing (since partitions in pulsar are just multiple topics) ---- 2020-01-26 06:36:54 UTC - Eugen: Great article, saved me hours and days of my time! His "A Best-Effort Strategy Based on Publisher Timestamps" section at the end is what I was considering as well. But I don't think I can get this to work in Pulsar with partitions (in contrast to multiple topics), as I cannot subscribe to partitions individually and peek and pace. But I'd like to avoid the use of multiple topics in this case where data is coming from a single producer anyways and is one thing, logically. And as partitions are implemented as topics in Pulsar, there may even be some (undocumented?) way to subscribe to those internal topics... But this is getting a bit more involved than I'd like it to be. (I think stream processing engines allow for simple stream merging based on time stamps along those lines) ---- 2020-01-26 06:58:13 UTC - Addison Higham: @Eugen you can subscribe to individual partitions, internally, a partitioned topic is just represented by a topic with a numbered suffix, so if you have a topic `public/default/my-topic` that has 5 partitions, you can individually subscribe to each topic by doing `public/default/my-topic-0` `public/default/my-topic-1` , etc +1 : Eugen ---- 2020-01-26 06:58:31 UTC - Addison Higham: err, might be `partition-<n>` ---- 2020-01-26 07:53:18 UTC - Eugen: Good to know! So it would seem technically possible to merge partitions on the consuming end using e.g. the publish-time. If I end up going this route, I may in fact add this as an consumer/reader option to Pulsar... ----
