2018-03-22 13:02:26 UTC - Piotr: Is there any convenient trick to subscribe to a topic but besides getting all the new messages, start with getting the messages produced in the last X seconds? (slight rewind on subscribe) ---- 2018-03-22 13:16:22 UTC - jia zhai: @Piotr There was a PR merged recently in master branch, Which could consume from earliest message when subscribe: <https://github.com/apache/incubator-pulsar/pull/1397> ---- 2018-03-22 13:17:30 UTC - jia zhai: when get Consumer, you could call seek to reset the message id to consume. ---- 2018-03-22 13:22:13 UTC - Piotr: Ah ok - so I would still need to know the messageId. I was thinking maybe there would be a way to start from latest - X timeUnits since the topic has the creation date of the messages ---- 2018-03-22 13:24:08 UTC - Piotr: my use case: I would like to make sure I don´t miss out on any updates between fetching all data regarding a "project" and subscribing to the project changes. I was thinking that if I just start subscribing from a few seconds back in the future, that would make sure I don´t miss out on anything. But maybe you have a better solution for this kind of thing? ---- 2018-03-22 13:24:50 UTC - Piotr: currently I do a checksum check to make sure I haven´t missed out on anything, but that is quite expensive ---- 2018-03-22 13:30:51 UTC - jia zhai: when you produce message with producer.send(message), you will get a messageId back. ---- 2018-03-22 14:06:35 UTC - Matteo Merli: @Piotr you can enable message expiration based on TTL ---- 2018-03-22 14:07:07 UTC - Piotr: @Matteo Merli ah true - I could do that and then start reading from earliest. Smart ---- 2018-03-22 14:08:19 UTC - Piotr: Regarding my use case - any suggestions to how to bridge that potential gap in any other way? Assuming that no updates will be missed if I use a TTL is to assume that no update will be in transit for longer than the TTL ---- 2018-03-22 15:12:26 UTC - Piotr: another question: are subscriptions persisted when I disconnect the consumer? If not, is there any way to do that and to reconnect to it at a later time? Or do I need to do this myself using the reader interface? ---- 2018-03-22 15:14:35 UTC - Matteo Merli: Yes, the subscription is "persistent", after it's created it will be there (and retain data) until you either acknowledge the messages or unsubscribe ---- 2018-03-22 15:19:56 UTC - Piotr: perfect ---- 2018-03-22 18:38:11 UTC - Piotr: Is there any way to delete a subscription by name, without first starting a consumer on it? ---- 2018-03-22 18:41:53 UTC - Matteo Merli: yes, you can create/delete subscriptions through REST / Java / CLI ---- 2018-03-22 18:42:38 UTC - Matteo Merli: eg: `bin/pulsar-admin persistent create-subscription $MY_TOPIC --subscription $MY_SUB` ---- 2018-03-22 18:43:05 UTC - Matteo Merli: or `unsubscribe` ---- 2018-03-22 18:46:32 UTC - Piotr: ahh there it is, in the admin-client, `admin.persistentTopics().deleteSubscription(destination, subscriptionName);` Thanks ---- 2018-03-22 23:09:24 UTC - Rabi Panda: @Sijie Guo Thanks for that.. I'll try java 8 and get back to you.. ---- 2018-03-22 23:10:05 UTC - Rabi Panda: @Piotr I downloaded streamlio sandbox.. works fine.. Thanks for pointing me in that direction +1 : Sijie Guo ---- 2018-03-22 23:19:24 UTC - Piotr: @Rabi Panda good to hear. I am having some problems with my homebrew install so might try that as well at some point :slightly_smiling_face: ---- 2018-03-22 23:20:53 UTC - Matteo Merli: Yes, probably the easies way to start the standalone (at last version) is to use it with docker: <http://pulsar.apache.org/docs/latest/getting-started/docker/> ---- 2018-03-22 23:29:39 UTC - Piotr: In the java client, I create a consumer and set a MessageListener with reachedEndOfTopic. Is there any way to listen for errors etc that mean the consumer is no longer available and streaming? I can not find any way to listen for any exceptions in the client, consumer etc (Need to clean up resources like my reactive streams etc) ---- 2018-03-22 23:30:07 UTC - Piotr: Feels like a stupid question but I really can´t find it :slightly_smiling_face: ---- 2018-03-22 23:31:25 UTC - Matteo Merli: No stupid question! So, `reachedEndOfTopic` you can ignore, unless you want to explicitely “terminate” a topic (eg: mark it as permanently closed). That’s kind of uncommon to need that feature though. ---- 2018-03-22 23:31:51 UTC - Matteo Merli: In general, a consumer instance (same as producer) is good “forever” ---- 2018-03-22 23:32:18 UTC - Matteo Merli: errors are handled internally by the client library (reconnections, failovers, etc..) ---- 2018-03-22 23:32:55 UTC - Matteo Merli: if you’re disconnected from brokers, you won’t receive messages but you won’t get exception ---- 2018-03-22 23:33:32 UTC - Matteo Merli: the only exception you can get is when acknowledging: `consumer.acknowledge(message)` throws exception if it cannot send the ack back to broker ---- 2018-03-22 23:33:59 UTC - Matteo Merli: the suggestion there is to use `consumer.acknowledgeAsync(message)` and forget about it ---- 2018-03-22 23:34:16 UTC - Matteo Merli: (unacked messages are anyway redelivered) ---- 2018-03-22 23:35:54 UTC - Piotr: Sweet, makes sense! :tada::+1: thanks ---- 2018-03-22 23:36:19 UTC - Piotr: Love a smart API/client ---- 2018-03-23 08:06:23 UTC - Piotr: I have been struggling with a architectural problem for a while now. It feels like a problem a lot of people must have solved but can´t seem to find an elegant solution to this. I think in the case of pulsar, my proposed solution idea 1 could work - if the cluster can guarantee that no messages that are sent "at the same time" as the subscription is being inited will be missed? Would appreciate it if you could take a look: <https://stackoverflow.com/questions/49439977/avoiding-client-server-data-synchronization-gap> ---- 2018-03-23 08:23:49 UTC - Ivan Kelly: @Piotr snapshot should be tagged with the message ID of the last message used in materializing the snapshot. Then you should resume reading updates from that message id + 1. Depending on how you make the snapshot, you can use a subscription to take care of all of this. ---- 2018-03-23 08:24:03 UTC - Ivan Kelly: Or you could use compaction, if it fits your usecase ---- 2018-03-23 08:29:32 UTC - Piotr: @Ivan Kelly that is one option I have considered, but that would require me to have infinite retention on the stream (or save it down to some other db after, and combine it with the stream) in order to be able to replay it fully. Do you see any problem with saving a load of streams in the cluster forever? ---- 2018-03-23 08:30:05 UTC - Ivan Kelly: which? compaction or the first thing i said? ---- 2018-03-23 08:31:32 UTC - Piotr: well, I am materializing the snapshot from my DB and not pulsar, so in order to make your solution work I would have to keep the log forever and use that as the source of truth - right? ---- 2018-03-23 08:36:01 UTC - Ivan Kelly: ah, so events go both to pulsar and to the database? ---- 2018-03-23 08:36:30 UTC - Piotr: yes - I am adding pulsar as a system to push out changes to clients ---- 2018-03-23 08:38:24 UTC - Ivan Kelly: hmm ---- 2018-03-23 08:40:18 UTC - Ivan Kelly: where are the events coming from? and how many of them are there? ---- 2018-03-23 08:43:15 UTC - Piotr: OK so my system is a ticketing system of sorts. Every time a ticket is added, used etc that change is streamed. So not a massive amount - maybe in the order of tens of thousands per topic ---- 2018-03-23 08:43:45 UTC - Ivan Kelly: ok ---- 2018-03-23 08:43:56 UTC - Piotr: every time the web cluster does an operation on the event (physical) or anything belonging to it and saves it to the db an event is emitted ---- 2018-03-23 08:44:28 UTC - Ivan Kelly: and the clients need to see all events, in the same order? ---- 2018-03-23 08:45:10 UTC - Piotr: no, the ordering is not critical ---- 2018-03-23 08:45:29 UTC - Piotr: but they need to get all the events yes :slightly_smiling_face: ---- 2018-03-23 08:47:23 UTC - Ivan Kelly: ok, so weaker than total order atomic broadcast, but the client can't keep a record of everything it receives so you may as well keep order ---- 2018-03-23 08:48:44 UTC - Ivan Kelly: ok, so how I would do this is, push everything to pulsar first, and have a process pull events from pulsar and push them into the database, tagging the updates with the message id from pulsar. ---- 2018-03-23 08:49:40 UTC - Ivan Kelly: when a client connects, it pulls the snapshot from the database, works out the latest event it got from pulsar from the tags, and then pulls from pulsar ---- 2018-03-23 08:50:23 UTC - Piotr: I see - and are the pulsar ids sequential? I tried to find docs about that - and couldnt but then I just debugged them and they seem to look like 1:15:2 etc ---- 2018-03-23 08:50:30 UTC - Ivan Kelly: you're basically externalizing the database wal ---- 2018-03-23 08:51:28 UTC - Ivan Kelly: pulsar ids are sequential yes. you can read it as <segment>:<entryinsegment>:<partition>:<positioninbatch> ---- 2018-03-23 08:52:55 UTC - Piotr: Ok - so they are not sequential really? ---- 2018-03-23 08:53:16 UTC - Piotr: I mean, how would you get the latest messageId from a set of those? ---- 2018-03-23 08:54:30 UTC - Ivan Kelly: they are. there are gaps, but they are monotonically increasing ---- 2018-03-23 08:56:13 UTC - Piotr: so you would order those first by segment, then by entryinsegment, then by partition and then by positioninbatch? ---- 2018-03-23 08:56:48 UTC - Piotr: you can never get a messageId with partition 2 and then the next one with partition 1? ---- 2018-03-23 08:58:50 UTC - Ivan Kelly: well, there's no ordering between partitions, so you shouldn't try to ---- 2018-03-23 08:59:03 UTC - Ivan Kelly: but yes, first by segment, then entry, then batch ---- 2018-03-23 09:03:25 UTC - Piotr: That could indeed work! With a potential side-effect of making DB writes asynchronous, which I should be able to handle hopefully ---- 2018-03-23 09:08:53 UTC - Ivan Kelly: happy to help :slightly_smiling_face: ----
