Slack digest for #general - 2019-05-02

Apache Pulsar Slack Thu, 02 May 2019 02:11:38 -0700
2019-05-01 13:19:14 UTC - Vlad Lazarenko: @Vlad Lazarenko has joined the channel
----
2019-05-01 13:47:56 UTC - Vlad Lazarenko: Hey guys. I have a few questions I 
could not figure out. When using deduplication with (custom) sequence IDs, it 
looks like sequence Id is not exposed to clients. Is that right or am I missing 
something?
----
2019-05-01 13:52:28 UTC - Vlad Lazarenko: My actual problem is a little more 
complicated, though. I am looking to make consumer subscribe to messages using 
sequence Id instead of message Id. Basically, something similar to Kafka’s 
stream offset. It doesn’t look like there is a way to subscribe by sequence ID. 
Do you know what’s my best bet in trying to accomplish this? I can only think 
of another “service” on the side that journals all MessageKey + SequenceId 
pairs and allows for fast lookup of message id so client can query it, and then 
proceed with a standard subscription using MesaageKey. Thoughts?
----
2019-05-01 16:05:04 UTC - Matteo Merli: @Vlad Lazarenko The main difference is 
that while MessageId is per-topic (assigned after messages from multiple 
producers are serialized and persisted), the sequence id is only relative to a 
particular producer (identified by its producer name).
----
2019-05-01 16:06:16 UTC - Vlad Lazarenko: That’s sounds right. I have a 
specific case with a single producer and non-partitioned topic with 
deduplication enabled 
----
2019-05-01 16:08:28 UTC - Vlad Lazarenko: Thing is that I integrate with 
another not-very-reliable messaging system, and the intention is to use Pulsar 
for recovery of lost messages (which is rare but could happen). So what I am 
trying to avoid is, say, when a message is lost at the end of the week, I don’t 
want to replay a week worth of messages. And I only know a sequence number at 
that point (mapped to message key 1 to 1)
----
2019-05-01 16:12:03 UTC - Matteo Merli: Ok, so you want to specify a sequence 
id when you publish and then have the consumer position on that message 
afterwards..
----
2019-05-01 16:12:59 UTC - Matteo Merli: As it is, is not directly possible, 
since we don’t “index” by sequence id. We basically have 2 indices: messages id 
and publish timestamp
----
2019-05-01 16:13:46 UTC - Matteo Merli: One other option would be to collect 
the messageId after you publish and store it as well. So you’ll be able to 
associate a sequence id to a message id
----
2019-05-01 16:16:31 UTC - Devin G. Bost: For anyone in the Utah area, I'm 
presenting on Pulsar on May 22nd at Overstock: 
<https://www.meetup.com/utah-data-engineering-meetup/events/261032242/>
clap : Matteo Merli
+1 : Dan C, Jon Bock, David Kjerrumgaard, Vlad Lazarenko
----
2019-05-01 16:39:56 UTC - Vlad Lazarenko: Sounds reasonable. I was thinking 
along those lines as well. Will have to figure out the details around where and 
how to store, how to work around failure cases etc.. Thanks!
----
2019-05-01 16:44:49 UTC - Joe Francis: My suggestion is if you know the time 
window,  of the loss, use the approximate timestamp and use the Reader API to 
filter.
----
2019-05-01 16:50:29 UTC - Vlad Lazarenko: @Joe Francis  I’m using C++ and can’t 
seem to find anything in the reader API that takes time stamp or allows to not 
get the payload or get sequence IDs. Guessing Java API is more rich in this 
regard?
----
2019-05-01 16:59:29 UTC - Joe Francis: There's your opportunity .. open a 
github issue and also submit a  PR :grinning: for the C++ client.  Is the 
payload large?
----
2019-05-01 17:12:52 UTC - Vlad Lazarenko: The system is generic, so have to 
account for worst case scenarios even tho in most cases replaying all stuff 
from the beginning of time is viable :laughing:
----
2019-05-01 17:52:49 UTC - Thor Sigurjonsson: The other day I found I wanted to 
scale a function to zero (or sink/source by extension). `pulsar-admin` told me 
it had to be greater than zero. Is this something that should be supported? I 
can think of use cases where we don't want to do a deploy or update that 
changes anything that we know works -- but we want to turn a flow off 
temporarily (might be because of timing issues around deploys or health issues 
of downstream components, etc). In my case it was just a way to kill instance 
with id 0 when that was in a bad state but I found I could not do that in this 
case. In that case I could have used some other way to poke at a particular 
running instance of a function. This might also be useful.
----
2019-05-01 17:53:42 UTC - Matteo Merli: An alternative is to “stop” the function
----
2019-05-01 17:55:13 UTC - Thor Sigurjonsson: Yes, that is a good point -- and 
`pulsar-admin functions stop` does support `--instance-id` as well.
----
2019-05-01 17:55:50 UTC - Thor Sigurjonsson: I guess I'm not seeing that 
exposed on sources/sinks where that could be useful as well.
----
2019-05-01 17:56:16 UTC - Thor Sigurjonsson: I hadn't looked at the `stop` 
command on functions.
----
2019-05-01 18:01:24 UTC - Thor Sigurjonsson: Does the `stop` command 
decommission an instance or just stop flow to it?
----
2019-05-01 18:02:02 UTC - Matteo Merli: the process/thread/container is 
stopped. though the metadata is maintained
----
2019-05-01 18:32:55 UTC - Byron: @Matteo Merli just peeked at the (new?) schema 
support in the Go client. I noticed the ProtoSchema type embeds the AvroCodec.. 
is this right?  
<https://godoc.org/github.com/apache/pulsar/pulsar-client-go/pulsar#ProtoSchema>
----
2019-05-01 18:34:47 UTC - Matteo Merli: yes, even in Java, we (internally) 
standardize the schema definition to Avro, in order to have a consistent 
definition of the schema. Even for JSON we use Avro internally
----
2019-05-01 18:34:48 UTC - Byron: ^sorry AvroCodec
----
2019-05-01 18:35:59 UTC - Byron: I see. So an Avro schema is defined containing 
a field that contains the protobuf schema?
----
2019-05-01 18:41:31 UTC - Thor Sigurjonsson: @Chris Bartholomew Do you take any 
steps to have bookkeeper more resilient there? Like favoring having more nodes 
than fewer etc?
----
2019-05-01 18:41:54 UTC - Matteo Merli: Correct
----
2019-05-01 18:42:41 UTC - Byron: or is the protobuf schema just modeled as an 
Avro schema? i see in the implementation the encode and decode still depend on 
the internal proto registry.. i.e. the generate Go types need to be imported. i 
guess i assumed the protobuf descriptor would have been embedded so that the 
server could do validation on the serialized bytes
----
2019-05-01 18:44:26 UTC - Byron: or not even the server necessarily, but the 
registry would hold the descriptor so consumer could use that to decode. no 
problem that it works this way, just _typing_ out loud.
----
2019-05-01 18:44:58 UTC - Byron: i presume as an SDK however, this is just for 
managing the encoding/decoding by the client and doesn’t necessarily overlap 
with the registry itself?
----
2019-05-01 19:02:42 UTC - Patrick Lange: @Patrick Lange has joined the channel
----
2019-05-02 00:09:01 UTC - Patrick Lange: @Matteo Merli I am running into 
similar issues. The new 2.3.1 python client doesn’t install correctly from pip 
on MacOS 10.14.4 under python 3.7.4 (anaconda). If I run the default command 
mmh3 fails to install. When I install it with `CXX=&lt;path-to-g++-8&gt; pip 
install pulsar-client==2.3.1` it segfaults on import. I can import ‘mmh3’ and 
use it.
----
2019-05-02 07:43:03 UTC - Sébastien de Melo: Ok :+1:
----
2019-05-02 09:05:43 UTC - Yuvaraj Loganathan: @Shivji Kumar Jha ^^
----
Slack digest for #general - 2019-05-02

Reply via email to