Slack digest for #general - 2019-09-13

Apache Pulsar Slack Fri, 13 Sep 2019 02:11:30 -0700

2019-09-12 09:25:49 UTC - Shishir Pandey: Anyone ? I'd really appreciate it if 
I could get some pointers.
----
2019-09-12 09:50:47 UTC - vikash: @Vladimir Shchur above   solution  works  for 
  .net  Framework  after  changing  in  app.config and  added 
dependentAssembly,But  in  our  case   i  have  used  Service Fabric  where  i  
am  not  getting  app.config so  in  that  case  how   to  solve  above  
issue,Here  is  issue  with  service  fabric  
<https://github.com/Azure/service-fabric-issues/issues/779>
----
2019-09-12 09:53:49 UTC - Vladimir Shchur: Hi! Look at the response "No, the SF 
app shouldn't have an app.config file. Individual services may each have one 
however or in the case of web services would have web.config.," E.g. each 
service can have app.config or web.config
----
2019-09-12 12:17:13 UTC - vikash: ok  thanku
----
2019-09-12 13:05:33 UTC - Fredrick P Eisele: I would like some verification on 
expiry and retention.
If the retention policy is set to -1 then the message is not deleted once it 
has acknowledged.
If the expiry is set to some positive value, which is allowed to expire without 
the message being consumed, what happens to the message?
If the expiration is treated like an implied ack (what I want) then the message 
will be implicitly consumed and trivially retained.
<https://pulsar.apache.org/docs/en/concepts-messaging/#message-retention-and-expiry>
"With message expiry, shown at the bottom, some messages are *deleted*...".
Are they actually deleted or implicitly acknowledged?
----
2019-09-12 13:59:41 UTC - Shishir Pandey: folks does anyone have any suggestion 
for my question earlier?
----
2019-09-12 14:53:02 UTC - David Kjerrumgaard: Have you looking into the shared 
key subscription?  This would ensure that the same consumer receives messages 
with the same key. It would NOT guarantee the message ordering. However, if you 
were to send the messages from the same producer then message ordering would be 
guaranteed.
----
2019-09-12 15:00:20 UTC - Tarek Shaar: I was testing book keeper shutdown while 
messages are being produced and consumed. I have a topic with these settings 
bookkeeperEnsemble: 2 bookkeeperWriteQuorum: 2 bookkeeperAckQuorum: 2. While 
doing message production and consumption, I purposefully shutdown one of the 
book keeper nodes but message production and consumption continued as normal. 
Is that right? Isn't the message supposed to be saved in two book keeper nodes 
before delivering it? I only have 2 Book Keeper nodes and I shutdown one of 
them so it means I only had one bode available
----
2019-09-12 15:00:25 UTC - Shishir Pandey: Thank you @David Kjerrumgaard, 
key_shared subscription with multiple consumers would not guarantee ordering 
and since I need to process all messages between [t_0, t_0 + defined limit] 
together so I am guessing you're suggesting that since messages for same key or 
same ordering key is delivered to same consumer, I could keep track of this in 
the consumer..is my understanding correct? Unfortunately my producers are 
different..and my message's defined limit for processing could be as large as 
60 days. [ due to the nature of the domain].
----
2019-09-12 15:02:33 UTC - David Kjerrumgaard: That was my suggestion. Or you 
could use a single consumer depending on the message volume.
----
2019-09-12 15:03:52 UTC - Shishir Pandey: Got it! I'd have about 600-800 
million messages per 7 day period, message sizes are relatively small so I am 
guessing this should be ok for a single consumer.
----
2019-09-12 15:04:23 UTC - Shishir Pandey: Thank you @David Kjerrumgaard! Much 
appreciated.
----
2019-09-12 15:04:49 UTC - David Kjerrumgaard: Perhaps a stateful function would 
be able to retain the data, and when a new item arrives the function would 
first check the state to see if the item already exists or not. If not, store 
it. If it does exist, you now have both the "pieces" and can perform the logic 
on them both.
----
2019-09-12 15:05:39 UTC - Shishir Pandey: Yes, that actually is a better idea I 
think
----
2019-09-12 15:05:43 UTC - David Kjerrumgaard: the tradeoff to the above 
approach is latency on each message (to check the state), and increase storage 
in BK.
----
2019-09-12 15:06:16 UTC - Shishir Pandey: The publish latency is not as much of 
an issue for me since as I said the messages arrive relatively slowly anyway.
----
2019-09-12 15:07:42 UTC - Shishir Pandey: increase of storage in BK, we would 
be purging every 60 days, and message ingestion rate is nearly fixed between 
that range so we should be able to stabilise the storage after sometime and I 
can plan for that up ahead.
----
2019-09-12 15:08:03 UTC - Shishir Pandey: The function proposal does appear to 
be considerably better, I will do some more research on that and test it out.
----
2019-09-12 15:09:03 UTC - Shishir Pandey: Once again, thank you!
----
2019-09-12 15:10:12 UTC - David Kjerrumgaard: Sure.  One last step on the 
stateful approach. When you are finished processing the "pair" you can delete 
the key from state.
----
2019-09-12 15:11:10 UTC - David Kjerrumgaard: @Tarek Shaar It depends on the 
ack and write quorm configs you have in place.
----
2019-09-12 15:11:30 UTC - Tarek Shaar: bookkeeperEnsemble: 2 
bookkeeperWriteQuorum: 2 bookkeeperAckQuorum: 2
----
2019-09-12 15:11:59 UTC - Tarek Shaar: Actually production and consumption 
continued, even though I shut down both Book Keeper nodes
----
2019-09-12 15:22:27 UTC - David Kjerrumgaard: @Tarek Shaar In the above 
scenario you are most likely consuming messages that were immediately published 
(tailing -reads), and as such they were able to be served out of the message 
cache in memory. Are your publishing the messages asynchronously?
----
2019-09-12 15:23:03 UTC - Tarek Shaar: yes I am doing asyn publish
----
2019-09-12 15:24:31 UTC - David Kjerrumgaard: "If the retention policy is set 
to -1 then the message is not deleted once it has acknowledged."  Yes, AFAIK
----
2019-09-12 15:25:04 UTC - Tarek Shaar: That makes sense they are served from 
memory, but how is the producer getting the ack? I thought the ack comes back 
only if the messages are saved into two book keeper nodes. (Two saves to the 
Journal)
----
2019-09-12 15:26:03 UTC - David Kjerrumgaard: "If the expiry is set to some 
positive value, which is allowed to expire without the message being consumed, 
what happens to the message?"  It is just deleted from the topic.
----
2019-09-12 15:26:48 UTC - David Kjerrumgaard: "If the expiration is treated 
like an implied ack (what I want) then the message will be implicitly consumed 
and trivially retained."  No, they are treated as if they never existed. The 
messages are essentially "skipped"
----
2019-09-12 15:29:49 UTC - David Kjerrumgaard: The ack (or lack thereof) is 
communicated in the CompletableFuture that is returned from the async call.  
Published messages are first written to cache and then synced to disk before an 
ack is returned. However in your case the flow is message to cache which 
succeeds, message to disk (fails) and no ack is returned.  However the message 
is still in the cache, so it can be served.
----
2019-09-12 15:30:34 UTC - David Kjerrumgaard: This allows Pulsar to continue 
serving messages even in the event of a BK failure.
----
2019-09-12 15:53:35 UTC - Tarek Shaar: Thanks David understood. Another 
observation I have is that while producing and consuming messages, if I shut 
down the broker (that's serving my topic), the production and consumption 
continues smoothly if I am sending one message every 10 milli seconds. But if I 
am sending messages continuously without waiting, and I stop the broker (that's 
serving my topic), production and consumption stops, until my broker is back up 
which is when production and consumption resumes
----
2019-09-12 16:07:20 UTC - Nick Marchessault: Is there a default ackTimeout set 
in pulsar 2.3.1 if that configuration is not explicitly set?
----
2019-09-12 16:08:00 UTC - Matteo Merli: No, the ack timeout is not set by 
default because it’s impossible to establish a safe value
----
2019-09-12 16:08:35 UTC - Matteo Merli: eg. if you set 1min by default.. any 
application for which the processing takes &gt;1min will see a storm of 
redeliveries
----
2019-09-12 16:09:03 UTC - Matteo Merli: a better option is to instead rely on 
“negative acks” (since 2.4)
----
2019-09-12 16:23:00 UTC - Matteo Merli: Of course, the order is guaranteed for 
the single producer and Pulsar offers full linearizable order semantic.


Above, I was referring to 2 independent producers publishing to a topic. In 
that case, the messages from the 2 producers are interleaved in the topic, 
though, still, messages from the same producer will be ordered.
----
2019-09-12 16:23:11 UTC - Fredrick P Eisele: Hmm, not what I was hoping. I 
suppose I could make a consumer whose only purpose is to ack the message thus 
preventing it from expiring but that seems wrong. Is there a nicer approach?
----
2019-09-12 17:10:48 UTC - David Kjerrumgaard: @Tarek Shaar Yes, by default a 
single broker serves a topic (both producer and consumer) . therefore on 
failure the cached messages aren't replicated across brokers quickly enough
----
2019-09-12 17:15:56 UTC - David Kjerrumgaard: Yes, you can just set the TTL to 
-1 , which is effectively forever. Then the messages will not be deleted simply 
because they weren't acknowledged.
----
2019-09-12 17:25:43 UTC - Addison Higham: :thinking_face: can a consumer be 
configured to only read messages that have been synced to disk? or is this 
always the case?
----
2019-09-12 17:29:21 UTC - David Kjerrumgaard: @Addison Higham It would be best 
to halt the producer if you are notified that the data is not synched to disk. 
The consumers might not be online at that time and won't be able to react to 
the BK outage.
----
2019-09-12 17:35:13 UTC - Addison Higham: but that would still mean that some 
given batch of messages is delivered to the consumer, and then assuming the 
broker also died, either the producer re-connects and re-sends "duplicates" 
(the consumer won't know) or it doesn't re-send the messages and a future 
consumer replaying the stream would get a different result. Seems like a pretty 
edge case... but it does seem like for certain cases it seems like a consumer 
should be able to have a subscription that won't pull messages until the 
message is fsynced by BK
----
2019-09-12 18:32:35 UTC - Jerry Peng: @Tarek Shaar are you using persistent 
topics or non-persistent topics?
----
2019-09-12 18:33:14 UTC - Tarek Shaar: I am using persistent topics
----
2019-09-12 18:42:52 UTC - Tarek Shaar: @David Kjerrumgaard I have narrowed this 
down to 6000 messages per minute. So if I am producing 6000 per minute then the 
broker shuts down, my producer just stops (and so does my consumer) until the 
broker is back up, at which point traffic resumes again. Are you saying that is 
the expected behavior? If I producer less than 6000 then production and 
consumption just carries on (barring a very small pause)
----
2019-09-12 18:45:17 UTC - Jerry Peng: @Tarek Shaar if all your bookies are 
down, messages should not be able to be produced successfully, i.e. received 
ack successfully.
----
2019-09-12 18:46:03 UTC - Matteo Merli: with `bookkeeperEnsemble: 2` you’ll 
need &gt;=2 bookies to operate
----
2019-09-12 18:47:06 UTC - Jerry Peng: You shouldn’t also be able to consume the 
messages you produced when all your bookies are down
----
2019-09-12 18:51:17 UTC - Tarek Shaar: @Jerry Peng when my consumer is down, it 
will miss all the messages, regardless of whether I shut down one book keeper 
node or  both of them during production. Perhaps this may be due to what 
@Matteo Merli pointed out for the bookkeeprEnsenble value or may be this is 
expected behavior
----
2019-09-12 18:51:53 UTC - Matteo Merli: &gt; when my consumer is down, it will 
miss all the messages

Messages are not lost, the subscription keeps track of the consumer position
----
2019-09-12 18:56:10 UTC - Nicolas Ha: Hello :slightly_smiling_face: is there a 
way to backup / restore all messages? Ideally to a file or S3 bucket
----
2019-09-12 18:57:32 UTC - Ali Ahmed: @Nicolas Ha not really you can backup say 
the data folder for the standalone cluster otherwise for  production cluster 
you want to setup replication
----
2019-09-12 18:57:52 UTC - Fredrick P Eisele: That sounds right, thanks.  TTL 
can be set as the default with ?
ttlDurationDefaultInSeconds=-1
----
2019-09-12 18:58:45 UTC - Ryan Samo: Is there a way to load the client certs 
via memory instead of the file system? Like via a call to HashiVault for 
example?
----
2019-09-12 18:59:11 UTC - Matteo Merli: currently, not for TLS certs… just for 
tokens
----
2019-09-12 18:59:36 UTC - Nicolas Ha: ah that’s bad news - I see now that 
bookkeeper does not support it apparently 
<https://github.com/apache/bookkeeper/issues/1193> so maybe that comes from 
there
----
2019-09-12 19:00:02 UTC - Ryan Samo: Ok thanks, do you see an enhancement 
coming for tls?
----
2019-09-12 19:05:57 UTC - David Kjerrumgaard: I would have check the docs to 
confirm, but yes there is a value that means infinity.
----
2019-09-12 19:06:51 UTC - Matteo Merli: that would be certainly possible to 
extend it to accept it from strings
----
2019-09-12 19:08:52 UTC - Ryan Samo: Cool thanks
----
2019-09-12 19:35:03 UTC - Tarek Shaar: Sorry @Matteo Merli I meant to say if my 
consumer was down and while producing messages I shut down one or all my Book 
Keeper nodes, then I being up my consumer, I noticed that it missed the 
messages that were produced (while one or all Book Keeper nodes were down). I 
am missing something?
----
2019-09-12 19:35:39 UTC - Karthik Ramasamy: @Nicolas Ha --- use the tiered 
storage capability in Pulsar
----
2019-09-12 19:39:23 UTC - Jon Bock: What use case for that did you have in 
mind?  As Karthik says, tiered storage in Pulsar addresses a number of the 
scenarios where people otherwise might manually export and restore messages.
----
2019-09-12 19:43:26 UTC - Matteo Merli: While the BookKeeper nodes were down, 
you would have not received a positive ack when publishing a message
----
2019-09-12 19:54:22 UTC - Ryan Samo: Is there a way for the consumer client to 
detect backlog? Or know that there is a backlog?
----
2019-09-12 20:30:14 UTC - Nicolas Ha: I'll have a look, thanks!
----
2019-09-12 20:31:32 UTC - Nicolas Ha: Backup / restore really. Which allows 
moving data from one cloud provider to the other, across environments, and also 
recover from a disaster
----
2019-09-12 20:44:23 UTC - Jon Bock: OK.  Replication could possibly provide 
most of those scenarios, so may be worth a look.
+1 : Nicolas Ha
----
2019-09-12 20:59:00 UTC - Tarek Shaar: While my consumer was down I shutdown 
one of the two Book Keeper nodes (while still publishing). I then started my 
consumer, but I missed all those messages that were published during the time 
that one of the Book Keeper nodes was down.
----
2019-09-12 21:00:52 UTC - Tarek Shaar: @Ryan Samo you can probably use the Java 
Admin API within the consumer process or make a REST call.
----
2019-09-12 21:03:02 UTC - Ryan Samo: Ok thanks @Tarek Shaar . I was hoping to 
allow a consumer to see the backlog and receiverQueueSize stats locally so that 
they can dynamically adjust themselves. Without involving the admin API which 
would allow major access to the cluster 
----
2019-09-12 21:07:15 UTC - Ryan Samo: Thinking of that and auto scaling shared 
consumers based on that
----
2019-09-12 21:09:00 UTC - Tarek Shaar: I am not sure the consumer API or the 
consumer builder API allows one to dynamically adjust its parameters. But 
that's an interesting one please do let me know if you find an alternative to 
Admin or REST calls
----
2019-09-12 21:10:07 UTC - Ryan Samo: Ok thanks!
----
2019-09-12 21:43:53 UTC - Ali Ahmed: I am adding an option to specify the 
splittable character for messages for pulsar-client this should allow one to 
produce json messages without issues 
<https://github.com/apache/pulsar/pull/5187>
----
2019-09-12 22:49:55 UTC - Fredrick P Eisele: We are talking about `messageTTL`? 
Because when I try to set it to `-1` I get "Invalid value for message TTL", "", 
"Reason: Invalid value for message TTL"
----
2019-09-12 22:52:32 UTC - Fredrick P Eisele: ["./bin/pulsar-admin", 
"namespaces", "set-message-ttl", "foo/bar", "--messageTTL", "-1"]
----
2019-09-12 22:56:23 UTC - Fredrick P Eisele: Using a really big integer, 
2147483647, works fine. Should I report a bug?
----
2019-09-12 23:00:01 UTC - Fredrick P Eisele: I think setting the ttl to 0 means 
forever. Is that true?
----
2019-09-12 23:12:55 UTC - Ted Hwang: @Ted Hwang has joined the channel
----
2019-09-12 23:28:16 UTC - David Kjerrumgaard: the default cluster setting is -1 
which means forever. So you won't need to set it explicitly
----
2019-09-13 00:06:13 UTC - tmcothran: @tmcothran has joined the channel
----
2019-09-13 03:49:20 UTC - Luke Lu: It appears that reader (both Java and C++ 
client) leaks subscriptions (reader-xxx subscriptions that don’t go away upon 
close) that have to been cleaned up manually with pulsar-admin?
----
2019-09-13 04:25:22 UTC - Matteo Merli: Yes, it’s a bug — there’s a fix in 
<https://github.com/apache/pulsar/pull/5022> though some work on unit tests is 
still pending
----
2019-09-13 04:31:22 UTC - Matteo Merli: Actually, the setting is already there 
in the storage abstraction.. but it’s not configurable on the broker.conf… we 
just set to  2GB
----
2019-09-13 04:31:32 UTC - Matteo Merli: that should be easy to add
----
2019-09-13 04:35:42 UTC - vikash: @Vladimir Shchur i   have  used f#  client  
for  Producer  and  sending  Payload  through  Nifi (Data  injestion  tool)  on 
 sending  messages  i  am  getting  below  Error
----
2019-09-13 04:35:43 UTC - vikash: System.AggregateException: One or more errors 
occurred. ---&gt; Pulsar.Client.Common.ProducerBusyException: Exception of type 
'Pulsar.Client.Common.ProducerBusyException' was thrown
----
2019-09-13 04:37:29 UTC - vikash: i   have  sent 7478  messages
----
2019-09-13 04:37:44 UTC - vikash: Messages  sending  Exception
----
2019-09-13 06:42:00 UTC - Vladimir Shchur: @Karthik Ramasamy If I use tiered 
storage and then something bad happens and I loose all cluster with its 
storage, only S3 bucket are lefts. Will it be possible to start a new cluster 
again with that data saved in S3 buckets?
----
2019-09-13 06:52:32 UTC - Vladimir Shchur: Hi I'm not sure I've understood it 
well. How does Nifi relates to .net client producer? One more thing - could you 
please configure logging like that 
<https://github.com/fsharplang-ru/pulsar-client-dotnet/blob/develop/tests/IntegrationTests/Common.fs#L24-L30>
 and provide the full exception message.
----
2019-09-13 06:54:45 UTC - Vladimir Shchur: Regarding the ProducerBusyException 
- I don't have full understanding what it means and how client should handle it 
(it is sent from broker). Can someone clear things up?
----
2019-09-13 06:56:41 UTC - Karthik Ramasamy: @Matteo Merli 
----
2019-09-13 07:42:29 UTC - vikash: i  think  i  have  code  issue
----
2019-09-13 07:42:30 UTC - vikash: 
<https://pulsar.apache.org/api/client/org/apache/pulsar/client/api/ProducerBuilder.html>
----
2019-09-13 07:43:00 UTC - vikash: PulsarClientException.ProducerBusyException - 
if a producer with the same "producer name" is already connected to the topic
+1 : Vladimir Shchur
----

Slack digest for #general - 2019-09-13

Reply via email to