2020-01-05 10:16:31 UTC - Eugen: Question about multiple subscriptions. [The 
docs](<https://pulsar.apache.org/docs/en/concepts-messaging/#multi-topic-subscriptions>)
 read:
&gt; *No ordering guarantees*
&gt; When a consumer subscribes to multiple topics, all ordering guarantees 
normally provided by Pulsar on single topics do not hold. If your use case for 
Pulsar involves any strict ordering requirements, we would strongly recommend 
against using this feature.
`all ordering guarantees normally provided by Pulsar on single topics do not 
hold` sounds like even messages of a single topic can be received out of order, 
which is something I cannot really believe. So am I reading this right? And if 
so, do these out-of-order deliveries happen in failover scenarios only?
----
2020-01-05 12:14:42 UTC - juraj: two topics can run on two different brokers 
(processes), so naturally there's no built-in ordering guarantee, as the 
performance and complexity cost to organize this would be prohibitive.
to get a guaranteed per-topic ordering, just use one consumer per one topic.
these separate consumers still share the same pulsar client on the client side, 
so not a huge deal.
(you can also add ordering information into your data, and sort on the client.)
----
2020-01-05 13:31:34 UTC - Nicolò Paganin: @Nicolò Paganin has joined the channel
----
2020-01-05 14:36:27 UTC - Erik Oomen: @Erik Oomen has joined the channel
----
2020-01-05 19:27:40 UTC - Sean Carroll: is there any where to read information 
regarding this and why that is the case? I can understand why there would be no 
ordering guarantees across topics but would still think that there would be a 
order guarantee for each individual topic
+1 : Eugen
----
2020-01-05 20:22:15 UTC - Devin Fee: @Devin Fee has joined the channel
----
2020-01-05 20:46:47 UTC - Devin Fee: I have a few questions regarding how 
Pulsar readers work:
1. is it possible to update the topics a reader is subscribed to, or would I 
create new readers on the client for each new topic i’d like to subscribe to? 
(e.g. for something like 20k topics on a single server, and aren’t regex 
friendly)
2. i was reading about using pulsar with 1 topic per user – and that works fine 
enough if all of your messages can be handled by the same subscriber… but as 
soon as you want different services to handle different messages – this model 
seems to break. Unless… there’s a way to subscribe to topic-partitions?
3. (aka 2.b) is the general suggested architecture to have one topic per 
activity-type and use user-ids as keys? but then without scanning through all 
events, how would we get the activity feed for a particular user?
Thanks!
----
2020-01-05 22:22:35 UTC - Eugen: @juraj So to paraphrase: Per-topic ordering 
guarantees are not affected by multi topic subscriptions, but cross-topic 
ordering guarantees cannot be made
----
2020-01-05 22:23:16 UTC - Eugen: This is what I would have expected, but the 
wording of the docs suggest that "all bets are off", which did not make a lot 
of sense to me.
----
2020-01-05 22:44:34 UTC - juraj: yeah that's what i'd expect based on how i 
understand pulsar working inside... the docs have a big room for improvement
----
2020-01-06 00:04:29 UTC - markg: @Matteo Merli  - Thanks for running through 
those points from that medium post.
----
2020-01-06 00:47:05 UTC - David Kjerrumgaard: @Devin Fee For #1, I am going to 
assume you mean consumers, which have subscriptions, vs. readers, which read 
data from a fixed position based on how it is configured. There isn't a way to 
have a consumer dynamically subscribe to new topics other than the regex 
subscription. If you can't describe the new set of topics using a regex, how 
would you identify the topics you are interested in?   For #2,  There are a 
couple of solutions that come to mind, the first is keyed-subscriptions, which 
ensures that messages with the same key are consumed by the same consumer. If 
that doesn't work, it would be very easy to have a simple Pulsar function that 
consumes from the "main" topic and routes the messages to different topics 
based on the message content and/or properties.  For #3, I guess it depends on 
what you are trying to achieve. If you want to use the same logic for all 
events of the same activity-type, then one topic with one consumer that 
processes the activity-type data is the best approach. You can also have a 
second key-shared subscription that separates the activity by user-id.
----
2020-01-06 02:50:47 UTC - Devin Fee: thanks for replying! a couple thoughts…

i _could_ describe it as a regex, but that could be a huge regex : 
`&lt;user-id-1&gt;|&lt;user-id-2&gt;|&lt;user-id-3&gt;|&lt;...etc.&gt;` and 
then my consumer wants to update those topics it’s interested in, rather than 
providing a change-set of incremental (un-)subs. my hunch is that this is not a 
good idea.
----
2020-01-06 03:27:44 UTC - Eugen: I've created 
<https://github.com/apache/pulsar/pull/5995> to improve the docs
----
2020-01-06 03:36:42 UTC - David Kjerrumgaard: @Devin Fee Why not use a regex 
like `user-id-*` ?
----
2020-01-06 03:42:01 UTC - David Kjerrumgaard: The regex subscription would only 
pick up topics that are created AFTER the consumer is started (assuming they 
match the regex).  Are you looking for a way to dynamically change the topics 
that the consumer is consuming from, i.e. add and remove topics? E.g.  Start 
consuming from `user-id-1`, `user-id-5`, and `user-id-7`.....and later add 
`user-id-9` &amp; `user-id-10`......then later stop listening to `user-id-5` ?
----
2020-01-06 03:43:47 UTC - Devin Fee: ^^ exactly
----
2020-01-06 03:43:55 UTC - Devin Fee: 
----
2020-01-06 03:45:59 UTC - David Kjerrumgaard: First question then is how would 
you determine / generate this list?  If you can automate it then there might be 
a way to have the code running the consumer dynamically scan a DB table, file 
etc for this list
----
2020-01-06 03:46:13 UTC - Devin Fee: you could even think of that as a basic 
chat app. a user might want to join a particular channel (e.g. slack channel), 
and register a subscription.
----
2020-01-06 03:48:05 UTC - Devin Fee: yeah, so what topics a user wants to 
follow… is up to the user. let’s assume it’s stored as many-to-many mapping in 
a sql-db. e.g. `[channels] &gt;--&lt; [channels_users] &gt;---&lt; [users]`
----
2020-01-06 03:48:51 UTC - David Kjerrumgaard: I think that is doable with a 
combination of some coding logic wrapped around the consumer that gets alerted 
to these changes and can start new subscriptions for additions and stop 
subscriptions for deletions.....
----
2020-01-06 03:49:11 UTC - Devin Fee: yeah, i actually thought that might be too 
complex.
----
2020-01-06 03:49:53 UTC - David Kjerrumgaard: It is a bit complex, which is why 
isn't implemented directly inside Pulsar...  :smiley:
----
2020-01-06 03:50:21 UTC - Devin Fee: basically, i can get an ordered list of 
messages for any channel from SQL (`select * from … order by created_at desc 
limit 100`), but it’s the real-time component that makes this interesting…
----
2020-01-06 03:53:48 UTC - Devin Fee: i.e. if i have 20 websocket servers, each 
serving 1000 clients (browsers, smartphones, whatever) and each one of those 
can monitor a particular channel in realtime (e.g. you might be watching this 
thread between you and me right now)…
----
2020-01-06 03:55:19 UTC - David Kjerrumgaard: Yea, it would be best if there 
was a way to have the web-clients send these notifications to a 
"consumer-config" topic that notifies it of changes to the subscriptions then 
you could react accordingly....
----
2020-01-06 03:55:21 UTC - Devin Fee: the big gap in my understanding is whether 
pulsar supports this notion of drift… a user might want to switch the channel 
they’re watching (i.e. -1 / +1 subscription event at the user level)
----
2020-01-06 03:56:56 UTC - Devin Fee: so is your suggestion effectively to 
provide routing as a microservice itself? i.e. receiving the firehose of event 
data,  then re-publishing to downstread (subscribed) consumers?
----
2020-01-06 03:57:08 UTC - David Kjerrumgaard: So the goal is to be able to 
dynamically configure a consumer to listen to N of the 20K channels?
----
2020-01-06 03:57:18 UTC - Devin Fee: exactly
----
2020-01-06 03:58:36 UTC - David Kjerrumgaard: Yes, I am suggesting something to 
that effect.  It solves the real-time update nature of the problem. Your 
consumer would ALWAYS be listing to a "control topic" to receive these 
requests. Then you can adjust your consumers accordingly
----
2020-01-06 03:59:05 UTC - David Kjerrumgaard: check it is already subsribed, if 
it matches an existing regex sub, etc.
----
2020-01-06 03:59:32 UTC - David Kjerrumgaard: if not, start a new consumer 
thread.
----
2020-01-06 03:59:45 UTC - Devin Fee: “consumer thread” being a new “topic”?
----
2020-01-06 04:01:05 UTC - David Kjerrumgaard: potentially, depending on the 
request.  You could have requests that match a regex, etc
----
2020-01-06 04:02:04 UTC - David Kjerrumgaard: the problem would be scaling one 
process past 50+ topics.
----
2020-01-06 04:02:20 UTC - Devin Fee: ok yeah, you mean “thread” as in the 
operating system unit
----
2020-01-06 04:02:26 UTC - David Kjerrumgaard: so getting more topics / thread 
would be a big win
----
2020-01-06 04:03:47 UTC - Devin Fee: it’s really about where the filtering 
happens though… right?
----
2020-01-06 04:04:02 UTC - David Kjerrumgaard: Yes, I am envisioning the 
consumer being a Java / Python app that has a thread pool associated with it. 
One thread is always reading from the "control topic" and when a new message 
comes in, it decodes the command and either starts a new consumer in a thread 
or halts one for a delete etc.
----
2020-01-06 04:04:31 UTC - Devin Fee: e.g. each `websocket subscription server` 
could receive a firehose, and filter that firehose itself… or that could be 
pre-filtered.
----
2020-01-06 04:05:38 UTC - David Kjerrumgaard: What message type is in the 
firehose?
----
2020-01-06 04:05:48 UTC - Devin Fee: i guess the point i’m trying to clear up 
is that when you say “topic” you’re talking about a concept outside the domain 
of pulsar right?
----
2020-01-06 04:06:22 UTC - David Kjerrumgaard: no, I meant a pulsar topic. just 
used for a different purpose.
----
2020-01-06 04:09:38 UTC - David Kjerrumgaard: If I am understanding the use 
case correctly (big if). Then when a web-client wants to start listening to a 
subset of the 20K channels (20 websocket servers, each serving 1000 clients) 
then it would send a command that encodes that into a Pulsar topic that the 
Consumer app is subscribed to.  One consumer app / web-client
----
2020-01-06 04:10:30 UTC - David Kjerrumgaard: the consumer app would interpret 
the commands to get the establish a subscription on the proper channels, 
collect the messages from them and send them back to the web-client
----
2020-01-06 04:10:59 UTC - David Kjerrumgaard: web-client -----&gt; consumer app 
---&gt; (multiple topics)
----
2020-01-06 04:17:23 UTC - Devin Fee: maybe a concrete example is `slack`. you 
and i are both in this `#general` channel in our web browser, (or electron 
apps, or mobile apps, etc.) and potentially have multiple concurrent sessions.

`slack` has (let’s say) 100 websocket-subscription servers running that we’re 
connecting to … to get real time updates from these chat messages).

if they were going to use `pulsar` , then a dumb implementation would be that 
each websocket-subscription server would need to receive the firehose of all 
messages, filter messages relevant to it’s websocket-consumers, and then 
transform-and-forward them.

i.e. “it’s brokers all the way down, but a pulsar broker at the top”
----
2020-01-06 04:22:16 UTC - Devin Fee: because they can’t revise the topics 
they’re subscribed to without destroying their current `subscription` and 
creating a new `subscription`, they really do have to subscribe to all topics 
from the get-go.
----
2020-01-06 04:23:19 UTC - Devin Fee: also, these subscriptions are ephemeral. 
if kubernetes kills off a server, or creates a new one, we don’t need to 
persist those pulsar-subscriptions indefinitely. (maybe this is where the idea 
of the reader interface comes in?)
----
2020-01-06 04:25:28 UTC - Devin Fee: this appears to be the same problem as the 
one in computer networking – “broadcast / multicast / unicast” 
<https://www.esds.co.in/blog/wp-content/uploads/2016/04/Difference-between-unicast-broadcast-and-multicast-diagram.png>
----
2020-01-06 04:29:13 UTC - Devin Fee: so pulsar seems to support `uincast` fine 
(one subscription), and `broadcast` fine (more-than-one subscription… and 
perhaps this mysterious “reader interface”), but `multicast` is a 
“domain-level” problem that gets pushed to devs
----
2020-01-06 04:31:15 UTC - David Kjerrumgaard: The reader interface just allows 
you to start consuming messages from a topic that a previous point in time, so 
you can review historical data, i.e. data was delivered and acknowledged by all 
active consumers at the time it was created.
----
2020-01-06 04:32:06 UTC - Devin Fee: does the reader interface also update you 
with the latest messages in a topic?
----
2020-01-06 04:32:27 UTC - Devin Fee: i.e. is it like an *ephemeral* 
subscription, or is there something like that with pulsar?
----
2020-01-06 04:44:31 UTC - David Kjerrumgaard: No it allows you to control where 
you start reading from and you can do a `while reader.*hasMessageAvailable*() { 
reader.readNext() }`
----
2020-01-06 04:47:35 UTC - David Kjerrumgaard: The consumer on the other hand 
does `while (consumer.isConnected() ) { Message m = consumer.receive(); }`
----
2020-01-06 04:48:09 UTC - David Kjerrumgaard: so it blocks until a new message 
arrives, but doesn't read previous messages on the topic. It starts from the 
most recent message
----
2020-01-06 04:49:17 UTC - David Kjerrumgaard: You _can_ use the `seek` method 
on a consumer to position yourself before the most recent message if you desire.
----
2020-01-06 04:51:32 UTC - Devin Fee: alright, thanks for your help.
----
2020-01-06 04:51:42 UTC - Devin Fee: i’ve got to spend some time thinking about 
these constraints
----
2020-01-06 06:34:02 UTC - Tilden: Hi All, anyone knows how to do a backup and 
restore of Apache Pulsar ZooKeeper and Bookkeeper ? any document reference?
----

Reply via email to