Slack digest for #general - 2019-12-13

Apache Pulsar Slack Fri, 13 Dec 2019 01:11:36 -0800

2019-12-12 09:12:14 UTC - Fernando: I’m creating a python reader with
```reader = client.create_reader(
    '<persistent://tenant/namespace/topic>',
    start_message_id=pulsar.MessageId.earliest)```
and then read messages with
```msg = reader.read_next(1000)```
I’ve noticed that messages are deleted after a while. Is this normal? What do I 
need to configure for this to not happen?
----
2019-12-12 09:32:24 UTC - Fernando: Ok retention has to be set for this to work 
<https://github.com/apache/pulsar/issues/355#issuecomment-298784473>
----
2019-12-12 09:35:59 UTC - Ryan: We share the same concerns. If you create a 
GitHub issue or post on StackOverflow, can you please post a link to the URL in 
this thread too! Thanks!
----
2019-12-12 11:58:16 UTC - Ryan Samo: We use Grafana to visualize our Pulsar 
metrics. Is there a way to look at the backlog metric and deduce the difference 
between message TTL and retention? I know that when TTL expires it sends the 
message into the retention you have set, just wondering if you can actually see 
this via the metrics?
----
2019-12-12 12:01:41 UTC - Brian Doran: Thanks @Sijie Guo We've tried turning 
fsync off before to no avail but maybe we did it incorrectly .. what is the 
procedure to do this correctly?
----
2019-12-12 12:24:57 UTC - robertsiri: @robertsiri has joined the channel
----
2019-12-12 12:27:43 UTC - robertsiri: HI, how can I use my data type "e.g., 
User" to work with Spark Streaming instead of  "byte[]" as shown in the example 
e.g. JavaReceiverInputDStream&lt;byte[]&gt; lineDStream = 
jsc.receiverStream(pulsarReceiver);
----
2019-12-12 15:39:06 UTC - Brendan Price: sure thing!
----
2019-12-12 15:52:15 UTC - Brendan Price: @Sanjeev Kulkarni as requested: 
<https://github.com/apache/pulsar/issues/5846>
----
2019-12-12 16:40:36 UTC - Jose Estefania: @Jose Estefania has joined the channel
----
2019-12-12 16:45:49 UTC - David Kjerrumgaard: @Ryan Samo Great question. AFAIK, 
there isn't a way to differentiate between the 2.
+1 : Ryan Samo
----
2019-12-12 17:04:55 UTC - Sanjeev Kulkarni: thanks!@
----
2019-12-12 17:36:02 UTC - Sandeep Kotagiri: Within Kubernetes Pulsar 
deployment, I plan to use a Pulsar Proxy in front of my brokers. In this case, 
what should be the value I should be configuring in the zookeeper metadata as 
explained in 
<https://pulsar.apache.org/docs/en/2.4.2/deploy-bare-metal/#initializing-cluster-metadata>?
 Should the hostname part in web-service-url and broker-service-url point to 
the proxy host name or the broker hostname?
----
2019-12-12 17:48:55 UTC - Logan B: As pulsar tracks the consumer cursors in a 
shared subscription - what happens during broker failure? Where would the new 
broker begin offering messages?


I couldn't find details of this in any of the docs.
----
2019-12-12 18:04:28 UTC - David Kjerrumgaard: @Logan B The consumer cursors are 
stored in the bookkeeper layer. Therefore, if the broker serving the messages 
from a topic fails, the newly assigned broker can access the consumer cursors 
from there and resume exactly where the previous broker left off.
----
2019-12-12 18:07:58 UTC - David Kjerrumgaard: @Sandeep Kotagiri I think you 
want to follow the steps outlined here in the deploying to K8s docs. 
<https://pulsar.apache.org/docs/en/2.4.2/deploy-kubernetes/#initialize-cluster-metadata>
----
2019-12-12 18:08:49 UTC - Aditya badramraju: @Aditya badramraju has joined the 
channel
----
2019-12-12 18:19:03 UTC - Nick Ruhl: @Ryan Yes no problem. I plan on creating 
this within a few days
+1 : Sijie Guo
----
2019-12-12 18:22:51 UTC - Sandeep Kotagiri: @David Kjerrumgaard Thank you. I am 
using the HELM chart. In the documentation, per the steps outlined, we seed 
metadata with broker's web/pulsar service urls. And after that we also deploy 
Proxy. So I will go with using broker's service urls for these values. (I am 
going to use Proxy as the front end. I was not sure if this had to be the proxy 
or the broker. And hence my question)
----
2019-12-12 18:24:45 UTC - David Kjerrumgaard: @Sandeep Kotagiri The proxy uses 
the information stored in ZK to determine where to route the requests. 
Therefore, those values MUST be the broker URLS.  The flow is  client ---&gt; 
proxy ----&gt; ZK (lookup borker addr) ---&gt; broker (forwarded by tge proxy)
----
2019-12-12 18:24:46 UTC - David Kjerrumgaard: HTH
----
2019-12-12 18:25:16 UTC - Sandeep Kotagiri: @David Kjerrumgaard thank you
----
2019-12-12 19:29:38 UTC - Ryan Samo: Thanks @David Kjerrumgaard . Do you think 
this is a worthwhile enhancement to the Prometheus metrics? I think it would be 
great to visualize this behavior 
----
2019-12-12 19:30:30 UTC - David Kjerrumgaard: It would be for sure. I am just 
not sure that we even track that information.
----
2019-12-12 19:40:34 UTC - Ryan Samo: Ok thanks!
----
2019-12-12 19:43:56 UTC - Logan B: Yes, but what positions are stored?

For example, if I have a shared subscription with two consumers, and 10 
messages in the topic (id 1-10).

Consumer A gets message 1 and does not ack
Consumer B gets messages 2, 3, 4 and acks messages 2 &amp; 3 but NOT 4

Broker dies, and consumers fail and go do something else.

New broker comes online, message ack timeouts expire, and a new consumer C 
starts reading from the subscription - where is the cursor and what messages 
will be sent to consumer C?

Does C see messages 1, 2, 3, 4, 5 ...? This implies redelivery of 2 &amp; 3.
Does C see messages 1, 4, 5, ...? How would it know 2 &amp; 3 were processed?
Does C see messages 5, 6, 7, 8? How &amp; when would 1&amp;4 get processed?
----
2019-12-12 20:04:26 UTC - David Kjerrumgaard: The new consumer would get 
messages 1 and 4 since they weren't acked, along with 5 and upwards
----
2019-12-12 20:05:32 UTC - David Kjerrumgaard: the only way the broker "knows" a 
message was "processed" is via an ack from the consumer. No ak means not 
processed and the message is redelivered automatically
----
2019-12-12 22:12:05 UTC - Joe Francis: @Logan B In general, consumers always 
begin with the latest available unacked message.  Case 1 applies,( or should ) 
Everything will be redlivered.  But recently there has been  
managedLedgerMaxUnackedRangesToPersist  added to Pulsar which will try to 
persist "ack holes" and this is enabled by default, so  the ack holes[(1)] will 
be persisted, you and will see, as David said, 1, and 4.   To me,  this covers 
for poor application and use case design. Those who require  random deletes 
should use a database, not a queue.  And  if the application cant handle 
idempotency there are bigger issues. All our applications (in the hundreds) run 
with managedLedgerMaxUnackedRangesToPersist set to zero.
----
2019-12-13 00:31:31 UTC - Logan B: @Joe Francis , perfect, that was exactly 
what I was wondering and the docs for this help clarify. Thank you!
----
2019-12-13 05:27:44 UTC - Sandeep Kotagiri: Hello, for 2.4.1 Pulsar, I am 
observing a strange issue. I have turned on TLS and Authentication. 
Authentication provider is 
org.apache.pulsar.broker.authentication.AuthenticationProviderTls. With this 
setting, when using pulsar-admin CLI tool to create tenants, I am getting a 500 
Internal Error. I see a NullpointerException in Broker logs. I do not have any 
problem with other functions of pulsar-admin tool like retrieving clusters, 
creating namespaces etc. However, any operations on Tenants are failing. Is 
this a known issue?
----

Slack digest for #general - 2019-12-13

Reply via email to