2019-05-01 13:19:14 UTC - Vlad Lazarenko: @Vlad Lazarenko has joined the channel ---- 2019-05-01 13:47:56 UTC - Vlad Lazarenko: Hey guys. I have a few questions I could not figure out. When using deduplication with (custom) sequence IDs, it looks like sequence Id is not exposed to clients. Is that right or am I missing something? ---- 2019-05-01 13:52:28 UTC - Vlad Lazarenko: My actual problem is a little more complicated, though. I am looking to make consumer subscribe to messages using sequence Id instead of message Id. Basically, something similar to Kafka’s stream offset. It doesn’t look like there is a way to subscribe by sequence ID. Do you know what’s my best bet in trying to accomplish this? I can only think of another “service” on the side that journals all MessageKey + SequenceId pairs and allows for fast lookup of message id so client can query it, and then proceed with a standard subscription using MesaageKey. Thoughts? ---- 2019-05-01 16:05:04 UTC - Matteo Merli: @Vlad Lazarenko The main difference is that while MessageId is per-topic (assigned after messages from multiple producers are serialized and persisted), the sequence id is only relative to a particular producer (identified by its producer name). ---- 2019-05-01 16:06:16 UTC - Vlad Lazarenko: That’s sounds right. I have a specific case with a single producer and non-partitioned topic with deduplication enabled ---- 2019-05-01 16:08:28 UTC - Vlad Lazarenko: Thing is that I integrate with another not-very-reliable messaging system, and the intention is to use Pulsar for recovery of lost messages (which is rare but could happen). So what I am trying to avoid is, say, when a message is lost at the end of the week, I don’t want to replay a week worth of messages. And I only know a sequence number at that point (mapped to message key 1 to 1) ---- 2019-05-01 16:12:03 UTC - Matteo Merli: Ok, so you want to specify a sequence id when you publish and then have the consumer position on that message afterwards.. ---- 2019-05-01 16:12:59 UTC - Matteo Merli: As it is, is not directly possible, since we don’t “index” by sequence id. We basically have 2 indices: messages id and publish timestamp ---- 2019-05-01 16:13:46 UTC - Matteo Merli: One other option would be to collect the messageId after you publish and store it as well. So you’ll be able to associate a sequence id to a message id ---- 2019-05-01 16:16:31 UTC - Devin G. Bost: For anyone in the Utah area, I'm presenting on Pulsar on May 22nd at Overstock: <https://www.meetup.com/utah-data-engineering-meetup/events/261032242/> clap : Matteo Merli +1 : Dan C, Jon Bock, David Kjerrumgaard, Vlad Lazarenko ---- 2019-05-01 16:39:56 UTC - Vlad Lazarenko: Sounds reasonable. I was thinking along those lines as well. Will have to figure out the details around where and how to store, how to work around failure cases etc.. Thanks! ---- 2019-05-01 16:44:49 UTC - Joe Francis: My suggestion is if you know the time window, of the loss, use the approximate timestamp and use the Reader API to filter. ---- 2019-05-01 16:50:29 UTC - Vlad Lazarenko: @Joe Francis I’m using C++ and can’t seem to find anything in the reader API that takes time stamp or allows to not get the payload or get sequence IDs. Guessing Java API is more rich in this regard? ---- 2019-05-01 16:59:29 UTC - Joe Francis: There's your opportunity .. open a github issue and also submit a PR :grinning: for the C++ client. Is the payload large? ---- 2019-05-01 17:12:52 UTC - Vlad Lazarenko: The system is generic, so have to account for worst case scenarios even tho in most cases replaying all stuff from the beginning of time is viable :laughing: ---- 2019-05-01 17:52:49 UTC - Thor Sigurjonsson: The other day I found I wanted to scale a function to zero (or sink/source by extension). `pulsar-admin` told me it had to be greater than zero. Is this something that should be supported? I can think of use cases where we don't want to do a deploy or update that changes anything that we know works -- but we want to turn a flow off temporarily (might be because of timing issues around deploys or health issues of downstream components, etc). In my case it was just a way to kill instance with id 0 when that was in a bad state but I found I could not do that in this case. In that case I could have used some other way to poke at a particular running instance of a function. This might also be useful. ---- 2019-05-01 17:53:42 UTC - Matteo Merli: An alternative is to “stop” the function ---- 2019-05-01 17:55:13 UTC - Thor Sigurjonsson: Yes, that is a good point -- and `pulsar-admin functions stop` does support `--instance-id` as well. ---- 2019-05-01 17:55:50 UTC - Thor Sigurjonsson: I guess I'm not seeing that exposed on sources/sinks where that could be useful as well. ---- 2019-05-01 17:56:16 UTC - Thor Sigurjonsson: I hadn't looked at the `stop` command on functions. ---- 2019-05-01 18:01:24 UTC - Thor Sigurjonsson: Does the `stop` command decommission an instance or just stop flow to it? ---- 2019-05-01 18:02:02 UTC - Matteo Merli: the process/thread/container is stopped. though the metadata is maintained ---- 2019-05-01 18:32:55 UTC - Byron: @Matteo Merli just peeked at the (new?) schema support in the Go client. I noticed the ProtoSchema type embeds the AvroCodec.. is this right? <https://godoc.org/github.com/apache/pulsar/pulsar-client-go/pulsar#ProtoSchema> ---- 2019-05-01 18:34:47 UTC - Matteo Merli: yes, even in Java, we (internally) standardize the schema definition to Avro, in order to have a consistent definition of the schema. Even for JSON we use Avro internally ---- 2019-05-01 18:34:48 UTC - Byron: ^sorry AvroCodec ---- 2019-05-01 18:35:59 UTC - Byron: I see. So an Avro schema is defined containing a field that contains the protobuf schema? ---- 2019-05-01 18:41:31 UTC - Thor Sigurjonsson: @Chris Bartholomew Do you take any steps to have bookkeeper more resilient there? Like favoring having more nodes than fewer etc? ---- 2019-05-01 18:41:54 UTC - Matteo Merli: Correct ---- 2019-05-01 18:42:41 UTC - Byron: or is the protobuf schema just modeled as an Avro schema? i see in the implementation the encode and decode still depend on the internal proto registry.. i.e. the generate Go types need to be imported. i guess i assumed the protobuf descriptor would have been embedded so that the server could do validation on the serialized bytes ---- 2019-05-01 18:44:26 UTC - Byron: or not even the server necessarily, but the registry would hold the descriptor so consumer could use that to decode. no problem that it works this way, just _typing_ out loud. ---- 2019-05-01 18:44:58 UTC - Byron: i presume as an SDK however, this is just for managing the encoding/decoding by the client and doesn’t necessarily overlap with the registry itself? ---- 2019-05-01 19:02:42 UTC - Patrick Lange: @Patrick Lange has joined the channel ---- 2019-05-02 00:09:01 UTC - Patrick Lange: @Matteo Merli I am running into similar issues. The new 2.3.1 python client doesn’t install correctly from pip on MacOS 10.14.4 under python 3.7.4 (anaconda). If I run the default command mmh3 fails to install. When I install it with `CXX=<path-to-g++-8> pip install pulsar-client==2.3.1` it segfaults on import. I can import ‘mmh3’ and use it. ---- 2019-05-02 07:43:03 UTC - Sébastien de Melo: Ok :+1: ---- 2019-05-02 09:05:43 UTC - Yuvaraj Loganathan: @Shivji Kumar Jha ^^ ----
