Slack digest for #general - 2018-11-13

Apache Pulsar Slack Tue, 13 Nov 2018 01:11:46 -0800

2018-11-12 09:11:06 UTC - Sijie Guo: @tuan nguyen anh can you please create a 
github issue with your hardware settings and testing steps? so we can look into 
it and help you understand this.
----
2018-11-12 09:23:26 UTC - Kesav Kolla: Is there any way to create persistent 
topic before subscribers come up and start publishing messages into topic?  
Right now there is no way to create a topic via admin-cli and also if a 
publisher tries to put messages on topic those messages are lost.  There has to 
be at least one subscriber to start working with persistent topic.
----
2018-11-12 09:25:44 UTC - Ali Ahmed: @Kesav Kolla pulsar will create topics on 
the fly you don’t need to create them in advance
----
2018-11-12 09:27:41 UTC - Kesav Kolla: @Ali Ahmed I observer the topic creation 
is only happening when I start subscriber.  Otherwise all messages are lost
----
2018-11-12 09:28:26 UTC - Ali Ahmed: is this with persistent topics ?
----
2018-11-12 09:29:25 UTC - Kesav Kolla: yes
----
2018-11-12 09:30:23 UTC - Ali Ahmed: please create an issue with the steps in 
github we will investigate, by design that shouldn’t happen
----
2018-11-12 09:31:29 UTC - jia zhai: @Kesav Kolla If no subscribtion is created 
before produce message, All the messages are treated as ACKed.
----
2018-11-12 09:32:00 UTC - jia zhai: That maybe the reason that you find the 
messages are lost
----
2018-11-12 09:36:07 UTC - Kesav Kolla: @jia zhai OMG all messages are ACKed.... 
Is there any setting I can configure to not do that.  I've a simple usecase 
there will be one publisher who puts messages into topic.  Subscriptions come 
and go depends on my scheduling pattern.
----
2018-11-12 09:38:07 UTC - Kesav Kolla: Right now I'm struggling with two things 
keeping messages in topic second is even though I call acknowledge from my 
python code still lots of messages sit in backlog.  I appreciate some help
----
2018-11-12 09:41:44 UTC - Kesav Kolla: I'm trying to put a simple queue 
semantics using pulsar.  I thought it'll support the queue semantics
----
2018-11-12 09:42:29 UTC - Ali Ahmed: @Kesav Kolla have a look at this
<https://pulsar.apache.org/docs/en/cookbooks-message-queue/>
----
2018-11-12 09:46:03 UTC - Kesav Kolla: This is the exact document I've read 
before I wrote my code.  I kept receiver_queue_size=0 in my subscribers and 
also set shared subscription
----
2018-11-12 09:46:46 UTC - Kesav Kolla: This document doesn't mention anything 
about how to make the topic retain messages when there are no subscribers
----
2018-11-12 09:47:38 UTC - Sijie Guo: @Kesav Kolla :


1) you can configure retention to keep the data when there is no subscription 
<http://pulsar.apache.org/docs/en/cookbooks-retention-expiry/#set-retention-policy>
2) when you create consumer, specify SubscriptionIntialPostion.earliest.
----
2018-11-12 09:48:48 UTC - Kesav Kolla: @Sijie Guo retention means what?  It's 
retain after ACK also?
----
2018-11-12 09:50:13 UTC - Kesav Kolla: @Sijie Guo Setting earliest will it 
re-process already ACK messages?
----
2018-11-12 09:51:47 UTC - Sijie Guo: “retention” =&gt; 
<http://pulsar.apache.org/docs/en/cookbooks-retention-expiry/#retention-policies>
----
2018-11-12 09:53:14 UTC - Sijie Guo: “Setting earliest will it re-process 
already ACK messages?”

it is not re-processing already ACK messages. it is telling pulsar when the 
subscription first created, it will start from the earliest position.
----
2018-11-12 09:54:10 UTC - Sijie Guo: once the subscription is created, the 
parameter won’t take effects
----
2018-11-12 09:54:49 UTC - Kesav Kolla: If I start 10 subscriptions then will 
there be any duplicates in each subscription?  Assuming my 
receiver_queue_size=0.
----
2018-11-12 09:55:20 UTC - Kesav Kolla: Also there is no argument to specify 
SubscriptionIntialPostion in python client of `subscribe`.  How to set the 
SubscriptionIntialPostion?
----
2018-11-12 09:57:43 UTC - Sijie Guo: &gt; If I start 10 subscriptions then will 
there be any duplicates in each subscription?

if you started 10 subscriptions, each subscription will receive a full copy of 
the messages.

I guess you mean 10 consumers here?
----
2018-11-12 09:58:17 UTC - Sijie Guo: I guess it is not supported in python yet.
----
2018-11-12 10:03:07 UTC - Kesav Kolla: yes 10 consumers I mean
----
2018-11-12 10:03:24 UTC - Kesav Kolla: so I'll end up in re-processing all the 
messages whenever consumer starts
----
2018-11-12 10:05:10 UTC - Sijie Guo: no
----
2018-11-12 10:06:30 UTC - Sijie Guo: the setting only applies when there is no 
subscription. but once the subscription is created, that initial position won’t 
take effects.
if you have 10 consumers, only the first connected consumer will create the 
subscription
----
2018-11-12 10:07:51 UTC - Kesav Kolla: I mean 
SubscriptionIntialPostion.earliest when all 10 consumers have earliest all 10 
will read the messages
----
2018-11-12 10:10:52 UTC - Sijie Guo: sorry I don’t understand what you mean 
here “read the messages”
----
2018-11-12 10:16:27 UTC - Kesav Kolla: all 10 consumers will start reading 
messages at earliest.  All consumers will get messages which are already ACKed
----
2018-11-12 10:19:23 UTC - Sijie Guo: &gt; all 10 consumers will start reading 
messages at earliest.

yes.

&gt;  All consumers will get messages which are already ACKed

it will get the messages starting from the earliest messages.
----
2018-11-12 10:19:51 UTC - Kesav Kolla: Which means duplicate processing of 
messages.
----
2018-11-12 10:20:20 UTC - Kesav Kolla: In fact anytime new consumer starts it 
will re-process of messages from begining
----
2018-11-12 10:20:38 UTC - Kesav Kolla: Ideally when there is no consumer 
messages should not be auto ACKed
----
2018-11-12 10:20:51 UTC - Kesav Kolla: or atleast a setting should control the 
behavior of that
----
2018-11-12 10:20:54 UTC - Sijie Guo: no
----
2018-11-12 10:21:01 UTC - Sijie Guo: as there is no duplicates
----
2018-11-12 10:21:19 UTC - Kesav Kolla: It's not duplicate messages.... It's 
re-processing of messages
----
2018-11-12 10:21:30 UTC - Kesav Kolla: re-processing of message by each consumer
----
2018-11-12 10:21:36 UTC - Sijie Guo: no
----
2018-11-12 10:23:57 UTC - Kesav Kolla: Let's say there is a topic with 
retention policy set to 1 day:

Producer put 100 messages.  (All of them are ACKed)

Assumptions on consumer (shared subscription, queue size is 0)

Consumer 1 starts (It will process message1 and going on)
Consumer2 starts (where will it start receiving messages)?
Consumer3 starts (where will it start receiving messages)?
----
2018-11-12 10:25:44 UTC - Sijie Guo: consumer 1, 2, 3 are in one subscription, 
3 consumers (together) will receive messages starting from 1.

if subscription mode is failover, only consumer 1 will receive the messages.

if subscription mode is shared, consumer 2 will receive 2, consumer 3 will 
receive 3
----
2018-11-12 10:25:53 UTC - Sijie Guo: consumer 2 and 3 will not receive 1
----
2018-11-12 10:29:46 UTC - Sijie Guo: that says once the first consumer 1 
“created” the subscription, how to consume the messages depends on the 
subscription mode. 
<http://pulsar.apache.org/docs/en/concepts-messaging/#subscription-modes>
----
2018-11-12 10:30:08 UTC - Sijie Guo: (sorry I have to step out now, it is too 
late for me)
----
2018-11-12 11:12:11 UTC - Kesav Kolla: When I try to make receive queue size to 
0 I'm getting an error:
```
2018-11-12 02:30:05.822 INFO  ConnectionPool:63 | Created connection for 
<pulsar://queues-production.weave.local:6650>
2018-11-12 02:30:05.823 INFO  ClientConnection:287 | [10.47.0.23:45586 -&gt; 
10.47.0.6:6650] Connected to broker through proxy. Logical broker: 
<pulsar://queues-production.weave.local:6650>
2018-11-12 02:30:05.824 INFO  ConsumerImpl:168 | 
[<persistent://hotelsoft/rateshop/jobs>, spydr.queue, 0] Created consumer on 
broker [10.47.0.23:45586 -&gt; 10.47.0.6:6650]
2018-11-12 02:30:05.826 INFO  HandlerBase:53 | 
[<persistent://hotelsoft/rateshop/jobs>, ] Getting connection from pool
2018-11-12 02:30:05.826 INFO  ProducerImpl:155 | 
[<persistent://hotelsoft/rateshop/jobs>, ] Created producer on broker 
[10.47.0.23:45586 -&gt; 10.47.0.6:6650]
2018-11-12 02:30:05.826 WARN  ConsumerImpl:541 | 
[<persistent://hotelsoft/rateshop/jobs>, spydr.queue, 0] Can't use this 
function if the queue size is 0
2018-11-12 02:30:05.827 INFO  ProducerImpl:467 | 
[<persistent://hotelsoft/rateshop/jobs>, standalone-3-6995] Closed producer
2018-11-12 02:30:05.827 INFO  ConsumerImpl:761 | 
[<persistent://hotelsoft/rateshop/jobs>, spydr.queue, 0] Closed consumer 0
```

There is a WARN saying Can't use this function if the queue size is 0.

I'm creating consumer by the following code:

```
client.subscribe('<persistent://hotelsoft/rateshop/jobs>', 'spydr.queue', 
consumer_type=ConsumerType.Shared, unacked_messages_timeout_ms=5 * 60 * 1000, 
receiver_queue_size=0)
```
----
2018-11-12 11:25:25 UTC - Kesav Kolla: I call `receive()` function and 
expecting the timeout_millis is None.  The error ConsumerImpl:541 is basically 
occurs when receive is called with some timeout.  Has anyone face this issue?  
I'm using Pulsar 2.2.0
----
2018-11-12 11:27:22 UTC - Kesav Kolla: I attached Python debugger also to 
validate the None for timeout_millis
----
2018-11-12 11:27:28 UTC - Kesav Kolla: 
----
2018-11-12 11:39:37 UTC - Ivan Kelly: why are you changing receiver_queue_size?
----
2018-11-12 11:39:51 UTC - Ivan Kelly: it should be at least 1
----
2018-11-12 11:53:14 UTC - Kesav Kolla: @Ivan Kelly to make the shared 
subscription semantics works like traditional queue
----
2018-11-12 11:54:13 UTC - Ivan Kelly: #receive isn't a request/response type 
call
----
2018-11-12 11:54:56 UTC - Ivan Kelly: when you subscribe, you open a stream to 
the broker, and the broker sends messages to the clients internal queue. 
receive pulls off that queue. so there must be at least one space on it
----
2018-11-12 11:56:28 UTC - Kesav Kolla: When I've receive_queue_size to `1` then 
I'm seeing few messages not getting ACKed.  And over time the unacked messages 
are creeping up and after some time messages are not even delivering to 
consumers at all
----
2018-11-12 11:57:16 UTC - Ivan Kelly: that sounds like a problem with acking, 
not with the queue size
----
2018-11-12 11:58:05 UTC - Kesav Kolla: Also from the API documentation of 
python here is the snippet:

```
receiver_queue_size: Sets the size of the consumer receive queue. The consumer 
receive queue controls how many messages can be accumulated by the consumer 
before the application calls receive(). Using a higher value could potentially 
increase the consumer throughput at the expense of higher memory utilization. 
Setting the consumer queue size to zero decreases the throughput of the 
consumer by disabling pre-fetching of messages. This approach improves the 
message distribution on shared subscription by pushing messages only to those 
consumers that are ready to process them.
```
----
2018-11-12 11:59:24 UTC - Ivan Kelly: ok, one sec
----
2018-11-12 11:59:35 UTC - Ivan Kelly: my assumptions must be wrong
----
2018-11-12 11:59:52 UTC - Kesav Kolla: I've a piece of code like this:

```
while True:
        msg: _Message = self.jobsconsumer.receive()
        self.jobsconsumer.acknowledge(msg)
```
I'm doing ACK right after receive.  Still I see so many messages which are un 
ACKed
----
2018-11-12 12:01:36 UTC - Kesav Kolla: I'm checking Python API implementation 
(python/src/consumer.cc (line 30-56):

```
Message Consumer_receive(Consumer&amp; consumer) {
    Message msg;
    Result res;

    while (true) {
        Py_BEGIN_ALLOW_THREADS
        // Use 100ms timeout to periodically check whether the
        // interpreter was interrupted
        res = consumer.receive(msg, 100);
        Py_END_ALLOW_THREADS
```
Here by default it's calling receive with `100` as timeout
----
2018-11-12 12:02:37 UTC - Ivan Kelly: hmm, so
----
2018-11-12 12:02:47 UTC - Ivan Kelly: 2018-11-12 02:30:05.826 WARN  
ConsumerImpl:541 | [<persistent://hotelsoft/rateshop/jobs>, spydr.queue, 0] 
Can't use this function if the queue size is 0
----
2018-11-12 12:02:57 UTC - Ivan Kelly: Comes from :
----
2018-11-12 12:02:58 UTC - Ivan Kelly: Result 
ConsumerImpl::fetchSingleMessageFromBroker(Message&amp; msg) {
    if (config_.getReceiverQueueSize() != 0) {
        LOG_ERROR(getName() &lt;&lt; " Can't use receiveForZeroQueueSize if the 
queue size is not 0");
        return ResultInvalidConfiguration;
    }
----
2018-11-12 12:03:10 UTC - Ivan Kelly: oh, actually no
----
2018-11-12 12:04:59 UTC - Ivan Kelly: ok, bug
----
2018-11-12 12:05:23 UTC - Ivan Kelly: you can't currently use queuesize 0 with 
python client
----
2018-11-12 12:06:27 UTC - Ivan Kelly: 
<https://github.com/apache/pulsar/blob/master/pulsar-client-cpp/lib/ConsumerImpl.cc#L505>
----
2018-11-12 12:06:55 UTC - Ivan Kelly: it's only supported when you call receive 
with no timeout, but python always calls with the timeout, even when the 
calling code doesn't
----
2018-11-12 12:07:13 UTC - Kesav Kolla: yes that is true
----
2018-11-12 12:07:18 UTC - Ivan Kelly: 
<https://github.com/apache/pulsar/blob/master/pulsar-client-cpp/python/src/consumer.cc#L38>
----
2018-11-12 12:07:28 UTC - Kesav Kolla: yup this is the bug
----
2018-11-12 12:08:56 UTC - Kesav Kolla: Also why some messages go unACK even if 
I call ```self.jobsconsumer.acknowledge(msg)```
----
2018-11-12 12:09:33 UTC - Kesav Kolla: Can python client return ACK status back 
to caller so that we can even debug why ACK is not happening
----
2018-11-12 12:10:24 UTC - Ivan Kelly: that i don't know
----
2018-11-12 12:11:01 UTC - Ivan Kelly: there is no ack status, ack is fire and 
forget
----
2018-11-12 12:11:51 UTC - Ivan Kelly: are unacked messages getting redelivered?
----
2018-11-12 12:12:51 UTC - Kesav Kolla: nope.  I think Broker is marking the 
consumer as undeliverable and eventually stops sending messages
----
2018-11-12 12:14:51 UTC - Ivan Kelly: you've set redeliver to 5 minutes. you've 
waited 5 minutes?
----
2018-11-12 12:14:59 UTC - Ivan Kelly: I assume you're testing against 
standalone?
----
2018-11-12 12:15:19 UTC - Kesav Kolla: yes
----
2018-11-12 12:15:27 UTC - Kesav Kolla: it's standalone
----
2018-11-12 12:15:41 UTC - Kesav Kolla: my consumers were running for weeks
----
2018-11-12 12:19:13 UTC - Ivan Kelly: oh, ok
----
2018-11-12 12:19:20 UTC - Ivan Kelly: what sort of data rate?
----
2018-11-12 12:20:37 UTC - Kesav Kolla: I don't know the data rate.  But I do 
publish around 1000 messages onto topic over a day
----
2018-11-12 12:21:04 UTC - Ivan Kelly: ah, quite low then
----
2018-11-12 12:21:23 UTC - Kesav Kolla: oh yes very low
----
2018-11-12 12:22:04 UTC - Kesav Kolla: my guess was something to do with 
receive queue size so thought of reducing to `0`.  Unfortunately python doesn't 
support it
----
2018-11-12 12:22:20 UTC - Ivan Kelly: sounds like a legit bug
----
2018-11-12 12:22:52 UTC - Ivan Kelly: or at least something that warrants 
investigation. is the data confidential?
----
2018-11-12 12:23:45 UTC - Kesav Kolla: not confidential
----
2018-11-12 12:23:53 UTC - Kesav Kolla: but how to debug?
----
2018-11-12 12:24:57 UTC - Ivan Kelly: I'm not 100% sure of the steps, but it 
seems that there should be a lot of traces
----
2018-11-12 12:25:28 UTC - Kesav Kolla: where?  In pulsar standalone logs or my 
consumer logs?
----
2018-11-12 12:25:41 UTC - Ivan Kelly: in the actual event data
----
2018-11-12 12:26:05 UTC - Ivan Kelly: could you package up data/standalone and 
email it?
----
2018-11-12 12:26:14 UTC - Ivan Kelly: <mailto:[email protected]|[email protected]>
----
2018-11-12 12:27:18 UTC - Kesav Kolla: Curious how will actual data in the 
message can cause the ACK to fail?  My message is protobuf binary message
----
2018-11-12 12:27:58 UTC - Kesav Kolla: I can generate messages and write them 
into file but don't know how will you use that data
----
2018-11-12 12:27:58 UTC - Ivan Kelly: not the message in the data. but I'd like 
to see what the subscription state actually looks like
----
2018-11-12 12:28:24 UTC - Kesav Kolla: you mean topics stats and topics 
stats-internal ?
----
2018-11-12 12:28:40 UTC - Ivan Kelly: one sec
----
2018-11-12 12:30:57 UTC - Ivan Kelly: ya, give me the stats and stats-internal 
for the topic
----
2018-11-12 12:32:31 UTC - Kesav Kolla: Thanks for offering help.  I'll email to 
you
----
2018-11-12 12:39:54 UTC - Ivan Kelly: what's the id of the last message you've 
received from the topic?
----
2018-11-12 12:40:27 UTC - Kesav Kolla: I don't know
----
2018-11-12 12:40:48 UTC - Kesav Kolla: I killed all my consumers just an hour 
ago and re-started consumers
----
2018-11-12 12:40:58 UTC - Kesav Kolla: also cleared backlog
----
2018-11-12 12:41:17 UTC - Ivan Kelly: ok
----
2018-11-12 12:41:26 UTC - Ivan Kelly: can you send me the logs from the broker?
----
2018-11-12 12:55:58 UTC - Kesav Kolla: Can pulsar SQL works against PROTOBUF 
schema?  I've uploaded a PROTOBUF schema to my topic.  I'm trying to execute 
SQL I'm getting error saying topic doesn't have valid schema
----
2018-11-12 12:57:13 UTC - Kesav Kolla: The way I've created the schema file is :
```
{
        "type": "PROTOBUF",
        "schema": ".....base64 encoded Proto file contents......"
}
```
----
2018-11-12 12:57:22 UTC - Julien Plissonneau Duquène: @Julien Plissonneau 
Duquène has joined the channel
----
2018-11-12 13:00:26 UTC - Julien Plissonneau Duquène: hello there
any work already started/in progress/done for deb/rpm packaging?
----
2018-11-12 13:05:23 UTC - Ivan Kelly: @Kesav Kolla have you always been using 
.acknowledge(messageId)?
----
2018-11-12 13:07:11 UTC - Ivan Kelly: the logs have loads of lines like ] [0] 
Received cumulative ack on shared subscription, ignoring
----
2018-11-12 13:18:40 UTC - Kesav Kolla: I've changed to ack cumulative couple of 
days ago
----
2018-11-12 13:18:55 UTC - Kesav Kolla: To test if it makes any difference
----
2018-11-12 13:19:19 UTC - Kesav Kolla: Maybe a week ago
----
2018-11-12 13:20:13 UTC - Julien Plissonneau Duquène: there are deb and rpm 
packages available for the c++ client but I was wondering if any work was 
started to package the broker
----
2018-11-12 13:39:10 UTC - Ivan Kelly: @Kesav Kolla ok
----
2018-11-12 13:46:08 UTC - Ivan Kelly: @Kesav Kolla are the consumers running in 
long running processes?
----
2018-11-12 13:55:29 UTC - Kesav Kolla: Yes
----
2018-11-12 13:56:48 UTC - Ivan Kelly: i don't see anything suspicious in the 
logs
----
2018-11-12 13:57:32 UTC - Ivan Kelly: and the unacked messages should be 
redelivered
----
2018-11-12 14:01:12 UTC - Kesav Kolla: You must have seen backlog of almost 19k
----
2018-11-12 14:01:40 UTC - Kesav Kolla: All messages were in backlog
----
2018-11-12 14:04:14 UTC - Ivan Kelly: ya, there's an issue there, but nothing 
suspicious in logs to point to cause
----
2018-11-12 14:04:40 UTC - Ivan Kelly: and it sounds like you've wiped the data 
now, no?
----
2018-11-12 14:07:32 UTC - Kesav Kolla: Yes now I wiped data
----
2018-11-12 14:08:37 UTC - Ivan Kelly: did you restart the broker to see if the 
problem persisted?
----
2018-11-12 14:10:56 UTC - Kesav Kolla: yes that also I've tried before.  
Restarted the broker and yet no change.  Once messages are in backlog I 
couldn't get them going again
----
2018-11-12 14:12:52 UTC - Ivan Kelly: that's a good thing though, if you can 
trigger the issue again. You can package up the data and we can take a look 
directly
----
2018-11-12 14:13:24 UTC - Ivan Kelly: also, enable DEBUG logging in your 
broker, so we can see more details
----
2018-11-12 14:14:33 UTC - Ivan Kelly: add a logger:
    - name: org.apache.pulsar
        level: DEBUG
        additivity: false
        AppenderRef:
          - ref: Console
----
2018-11-12 14:14:40 UTC - Ivan Kelly: to conf/log4j2.yaml
----
2018-11-12 14:17:49 UTC - Kesav Kolla: I'll enable DEBUG logging on broker
----
2018-11-12 14:25:22 UTC - Kesav Kolla: oh I enabled DEBUG for org.apache.pulsar 
now I see the logs are like non stop continuously flowing.  With those messages 
I feel my log file is going to become huge in no matter of time
----
2018-11-12 14:32:40 UTC - Kesav Kolla: @Ivan Kelly I've generated some messages 
to test it out.  Shall I send the logs and stats on your way?
----
2018-11-12 14:33:29 UTC - Ivan Kelly: you may need to narrow the log category
----
2018-11-12 14:33:42 UTC - Ivan Kelly: do you have the problem reproducing?
----
2018-11-12 14:34:52 UTC - Kesav Kolla: I don't know yet whether problem is 
there or not.  Right now consumers are working on the messages.  I do see 
"unackedMessages" : 30 like this on some subscribers.  I don't know why would 
there be so many unackedMessages
----
2018-11-12 14:35:38 UTC - Kesav Kolla: I've set receiver_queue_size to 1
----
2018-11-12 14:36:03 UTC - Ivan Kelly: ok, send me the stats
----
2018-11-12 14:45:27 UTC - Ivan Kelly: what rate are you writing at?
----
2018-11-12 14:48:00 UTC - Kesav Kolla: producer?
----
2018-11-12 14:48:50 UTC - Kesav Kolla: the way my workflow is I put messages in 
topic then 50 consumers work in shared subscription mode take message one at a 
time and work on message.  If work on message fails they put back message into 
topic again
----
2018-11-12 14:49:34 UTC - Kesav Kolla: when I put back message on to topic I 
increase some RETRY header and I only do that retry in my application logic for 
5 times
----
2018-11-12 14:50:13 UTC - Kesav Kolla: the rate at which producer does is 
around 500 msgs at every 5 min
----
2018-11-12 14:50:46 UTC - Kesav Kolla: consumer will take around 5-10 seconds 
to process message.  If it fails on the message it will keep it back into topic
----
2018-11-12 14:51:44 UTC - Ivan Kelly: it's all on the jobs topic, no?
----
2018-11-12 14:52:04 UTC - Kesav Kolla: yes
----
2018-11-12 14:52:06 UTC - Ivan Kelly: could you post the code snippet doing the 
acks?
----
2018-11-12 14:54:39 UTC - Ivan Kelly: I'm still seeing a lot of :  [0] Received 
cumulative ack on shared subscription, ignoring
----
2018-11-12 14:54:45 UTC - Kesav Kolla: ```
while True:
     msg: _Message = self.jobsconsumer.receive()
     self.jobsconsumer.acknowledge_cumulative(msg)
     try:
         # Logic to work on message
     except:
          if retrycnt &lt; self.RETRY_COUNT:
               <http://self.logger.info|self.logger.info>('Resubmitting job')
               self.jobsproducer.send(job.SerializeToString(),
               properties={'RETRY_COUNT': f'RETRY-{retrycnt + 1}'})
```
----
2018-11-12 14:55:17 UTC - Ivan Kelly: use acknowledge() rather than 
acknowledge_cumulative()
----
2018-11-12 14:55:33 UTC - Kesav Kolla: I'm doing it now
----
2018-11-12 14:55:40 UTC - Kesav Kolla: do you want DEBUG log?
----
2018-11-12 14:56:13 UTC - Ivan Kelly: let it run for a while and just send the 
stats-internal to start
----
2018-11-12 14:56:23 UTC - Kesav Kolla: ok fine
----
2018-11-12 16:36:22 UTC - Matteo Merli: @Kesav Kolla The SQL connector only 
works with JSON and Avro at the moment.

We plan to add support for Protobuf soon (est. in 2.4 release). The “tricky” 
part is that we don’t have the protobuf generated code in the SQL connector, so 
we need to write a generic deserializer for the protobuf binary data.
----
2018-11-12 16:43:21 UTC - Beast in Black: @Matteo Merli reposting a comment I 
made on Friday:
as a followup to all those annoying questions I asked yesterday, I have a 
couple more questions which popped up when I was making the change to have my 
consumer type be shared. These questions are due to what I understand is the 
round-robin nature of the `shared` consumer type and the documentation for the 
same available at 
`<https://pulsar.apache.org/docs/latest/getting-started/ConceptsAndArchitecture/#Exclusive-u84f1>`

*To recap:* In my use case, each app instance (C++ app using the pulsar CPP 
client) will subscribe to a global topic (persistent as well as non-persistent) 
using a unique consumer subscription name i.e. each app instance will use a 
unique subscription name which is different from all the rest of the app 
instances. This unique name is persisted on each app instance so when the app 
comes back after a crash it will continue to use the same subscription name as 
before.

Thus, on each app instance there will be exactly one consumer and one 
subscriber for the topic, and this consumer's type will be set to `shared` to 
allow the app to restart after a crash and to re-subscribe to the topic using 
the same consumer name it used as before without experiencing pulsar client 
errors related to trying to resubscribe to an existing (i.e unreaped) broker 
consumer connection when using the same subscription name.

*Here come the questions:*

1. If new messages were published to that topic while the app instance was 
offline due to the crash/restart, when it connects back using the same 
subscription name to the same shared consumer (to which no other subscriptions 
will ever connect), will it receive/consume the messages which it missed while 
it was down?
2. After coming back up from the crash, will it receive/consume all new 
messages published after it came back up? Essentially, the question asks 
whether the round-robin nature of shared consumers will cause message receipt 
issues when using the same subscription name for the same shared consumer after 
an app crash/restart.
3. On each app instance, since there is only a single consumer/subscriber for 
the given topic, can message ordering still be guaranteed despite the 
round-robin nature of a shared consumer?
+1 : Matteo Merli, Ali Ahmed
----
2018-11-12 16:52:43 UTC - Julien Plissonneau Duquène: @Matteo Merli do you know 
about any ongoing work to make deb/rpm packages for pulsar-broker?
----
2018-11-12 16:53:54 UTC - Matteo Merli: &gt; 1. If new messages were published 
to that topic while the app instance was offline due to the crash/restart, when 
it connects back using the same subscription name to the same shared consumer 
(to which no other subscriptions will ever connect), will it receive/consume 
the messages which it missed while it was down?

Yes, the subscription is durable. When consumers are slow or not connected, 
Pulsar will retain all the non-acknowledged messages

&gt; 2. After coming back up from the crash, will it receive/consume all new 
messages published after it came back up? Essentially, the question asks 
whether the round-robin nature of shared consumers will cause message receipt 
issues when using the same subscription name for the same shared consumer after 
an app crash/restart.

It would come up and start receiving from older to newer messages (with 
occasional out of order if there are multiple consumers connected)

&gt; 3. On each app instance, since there is only a single consumer/subscriber 
for the given topic, can message ordering still be guaranteed despite the 
round-robin nature of a shared consumer?

If there’s only 1 consumer connected, messages will come in order. If, for some 
reason, there are 2+ consumers connected for some time, there might be a blip 
of out of order there.
+1 : Beast in Black
----
2018-11-12 16:54:52 UTC - Matteo Merli: Hi @Julien Plissonneau Duquène. I don’t 
think there’s anyone working on that. It would be great to have!
----
2018-11-12 17:05:05 UTC - Julien Plissonneau Duquène: ok thanks, there is some 
challenge to do that, but I will suggest to my team that we contribute some 
packaging work to the project
----
2018-11-12 17:09:21 UTC - Beast in Black: @Matteo Merli awesome, thanks! I 
might have some questions related to some producer errors I'm seeing in my app 
when it is restarted. But I will do my own due diligence on that before I 
pester you with questions :slightly_smiling_face:
----
2018-11-12 18:24:19 UTC - Matteo Merli: I think the easier option is to build 
these inside docker container. So it’s easy for every one to have the correct 
environment for RPMs and Debs
----
2018-11-12 22:19:01 UTC - koushik chitta: @koushik chitta has joined the channel
----
2018-11-12 22:34:11 UTC - koushik chitta: We are heavy Kafka users in 
Production, I just started looking into Pulsar which can solve a couple of 
scenarios where Kafka cannot do as of now for us.
Does anyone here runs Pulsar Server on Windows OS without any issues?
----
2018-11-12 22:38:02 UTC - Matteo Merli: Koushik, I’m not aware of any 
significant deployment on windows, though I wouldn’t expect any major problem 
there
----
2018-11-12 22:38:39 UTC - Ali Ahmed: @koushik chitta as long as you have java 8 
it should be fine
----
2018-11-12 22:38:49 UTC - Matteo Merli: (a part from converting bash scripts 
into correspondent batch or similar scripts for starting the JVM with proper 
args)
----
2018-11-12 22:40:04 UTC - Matteo Merli: (actually there was someone some time 
back that mentioned it was running on Windows and promised to contribute the 
script.. though that didn’t happen :slightly_smiling_face:
----
2018-11-12 22:41:23 UTC - Ali Ahmed: @koushik chitta you can install bash and 
unix shell utilities in windows , so you can run the scripts that way too.
----
2018-11-12 23:06:36 UTC - koushik chitta: Thanks @Ali Ahmed and @Matteo Merli. 
Unfortunately Installing shell utilities is not an option for us. I would like 
to write corresponding bat scripts. and would surely contribute back if I 
pursue this.
----

Slack digest for #general - 2018-11-13

Reply via email to