Slack digest for #general - 2019-02-01

Apache Pulsar Slack Fri, 01 Feb 2019 01:11:26 -0800

2019-01-31 09:39:13 UTC - jia zhai: @Vincent Ngan One topic could have several 
subscriptions. If all subscriptions ack-ed a message till a certain segment, 
then this segment will be marked as delete.  and bookkeeper will delete this 
segment.
reclaim physical storage of this segment will happen when bookkeeper doing next 
garbage collection.
----
2019-01-31 14:13:01 UTC - Vincent Ngan: So, what if I have created a 
subscription that has never ack-ed any message, will it cause all the messages 
of the topic to be retained in the storage?
----
2019-01-31 14:29:08 UTC - Matteo Merli: There’s a default “max backlog quota” 
configured (and that can be overridden). 
<http://pulsar.apache.org/docs/en/cookbooks-retention-expiry.html#backlog-quotas>
----
2019-01-31 16:00:57 UTC - Jonathan Budd: @Matteo Merli, I'm from Grant's 
organization; the first time we saw this was actually on a cluster where we 
_didn't_ have any of our own Pulsar functions deployed - at least, as far as 
I'm aware. You mentioned, though, that the issue is related to "when recovering 
the topic used for functions metadata and assignments". Would this topic for 
these metadata/assignments ever even be used if we weren't using pulsar 
functions?
----
2019-01-31 16:03:32 UTC - Jonathan Budd: The main reason I ask is that we've 
been playing with pulsar functions a little bit, and have seen this bug a 
couple of times in pulsar clusters where we have functions deployed - so it 
seems like a reasonable correlation to draw, but again, we didn't have any 
functions deployed the first time we saw this. Just wanted to see how certain 
you are that this would be related to functions we've deployed, or whether 
there's anything else going on under the hood.
----
2019-01-31 18:01:14 UTC - Matteo Merli: @Jonathan Budd In both stack traces 
that were pasted here, the error was on recovering the cursor for topic 
<persistent://public/functions/persistent/coordinate> which is used internally 
by the functions coordinator
----
2019-01-31 18:01:54 UTC - Matteo Merli: That would be created if you enable the 
functions worker to get started (eg. as part of brokers) even though no 
function was actually deployed
----
2019-01-31 18:02:02 UTC - Grant Wu: Aha, yes, we do that
----
2019-01-31 18:02:32 UTC - Matteo Merli: The issue is that since the function 
worker fails to start, and it’s being started as part of the broker, the broker 
will fail to start
----
2019-01-31 18:15:08 UTC - Jonathan Budd: Gotcha, that definitely sounds like 
the issue then. Thanks for the quick response time on all of this :+1:
----
2019-01-31 18:16:59 UTC - Matteo Merli: No problem. The above PR is fixing the 
recovery for the unexpected condition. I’m still going through the steps to 
reproduce the same issue. I think I have some clue based on the fact that this 
particular topic is meant to have no messages written, it’s just used to 
“elect” a leader worker
----
2019-01-31 20:02:44 UTC - Ambud Sharma: @Matteo Merli can the default "max 
backlog quota" be configured on the broker.conf?
----
2019-01-31 21:02:30 UTC - Matteo Merli: Yes, that would be through:


```
# Default per-topic backlog quota limit
backlogQuotaDefaultLimitGB=10

# Default backlog quota retention policy. Default is producer_request_hold
# 'producer_request_hold' Policy which holds producer's send request until the 
resource becomes available (or holding times out)
# 'producer_exception' Policy which throws 
javax.jms.ResourceAllocationException to the producer
# 'consumer_backlog_eviction' Policy which evicts the oldest message from the 
slowest consumer's backlog
backlogQuotaDefaultRetentionPolicy=producer_request_hold
```
----
2019-01-31 21:03:22 UTC - Matteo Merli: Same as retention:

```
# Default message retention time
defaultRetentionTimeInMinutes=0

# Default retention size
defaultRetentionSizeInMB=0
```
----
2019-01-31 21:16:31 UTC - Ambud Sharma: does this mean that if we configure 
retention to 0 we don't need to worry about backlogQuota?
----
2019-01-31 21:17:19 UTC - Matteo Merli: The 2 are a bit orthogonal.

Retention is meant as in “retain data even though it was already acked, or if 
there is no subscription”
----
2019-01-31 21:17:43 UTC - Matteo Merli: backlogQuota is referred to then 
“unacked” messages for a subscription
----
2019-01-31 21:18:48 UTC - Ambud Sharma: I am trying to see what would be the 
best behavior to make Pulsar mimic Kafka behavior which doesn't create producer 
back pressure. Will `backlogQuotaDefaultRetentionPolicy=producer_request_hold` 
be sufficient?
----
2019-01-31 21:19:21 UTC - Matteo Merli: Put backlogQuotaDefaultLimitGB to a 
very high value
----
2019-01-31 21:20:06 UTC - Matteo Merli: and set
```
defaultRetentionTimeInMinutes=X_NUMBER

defaultRetentionSizeInMB=-1 # Unlimited
```
----
2019-01-31 21:21:32 UTC - Matteo Merli: There’s also an additional tunable: TTL

That let’s you configure basically an auto-acknowledgment of the data after a 
certain amount of time
----
2019-01-31 21:22:28 UTC - Ambud Sharma: defaultRetentionSizeInMB wouldn't this 
cause out of disk issues?
----
2019-01-31 21:23:20 UTC - Matteo Merli: sure, disk space is not infinite by any 
mean… do you configure a max data size in Kafka?
----
2019-01-31 21:23:26 UTC - Matteo Merli: or is it just time based?
----
2019-01-31 21:23:33 UTC - Ambud Sharma: it's purely time based
----
2019-01-31 21:23:58 UTC - Matteo Merli: Ok, so the “unlimited size” will match 
the same behavior
----
2019-01-31 21:24:30 UTC - Ambud Sharma: also need to set this right? 
`backlogQuotaDefaultRetentionPolicy=producer_request_hold`
----
2019-01-31 21:25:07 UTC - Matteo Merli: if you set:

```
defaultRetentionTimeInMinutes=24 * 60
defaultRetentionSizeInMB=1024
```

That would be to keep the data for at least 24 hours and at-most 1GB. After 1GB 
the data will start to get trimmed
----
2019-01-31 21:25:27 UTC - Matteo Merli: Same as in, trim the data after 24h
----
2019-01-31 21:26:11 UTC - Ambud Sharma: got it; still confused about producer 
backpressure. Will the producer still backpressure if there are no consumers?
----
2019-01-31 21:26:16 UTC - Matteo Merli: &gt; also need to set this right? 
`backlogQuotaDefaultRetentionPolicy=producer_request_hold`

Yes, that’s the default behavior. You can also set it to discard the old data 
and it will never be blocking the producer
----
2019-01-31 21:29:18 UTC - Ambud Sharma: thank you
----
2019-01-31 21:32:16 UTC - Matteo Merli: &gt; got it; still confused about 
producer backpressure. Will the producer still backpressure if there are no 
consumers?

That is more tied to the “subscription”. If a subscription is created it will 
retain data (until consumers acknowledge the data).

If consumers are slow or disconnected, the amount of “backlog” on that 
subscription will increase and at some point will hit the quota.

You may also have the case in which you have no subscriptions (eg: using Reader 
instead of Consumer). In that case there’s no backlog quota, only the time/size 
based retention is applied
----
2019-01-31 22:21:05 UTC - Ambud Sharma: makes sense, thanks for clarifying
----
2019-01-31 23:35:17 UTC - Ambud Sharma: Is there a command/API to trigger 
broker rebalance?
----
2019-01-31 23:36:24 UTC - Matteo Merli: You can “unload” a namespace (or a 
bundle, which is a shard within a namespace). That will force the reassignment
----
2019-01-31 23:37:05 UTC - Matteo Merli: 
<https://pulsar.apache.org/docs/en/administration-load-distribution/#unloading-topics-and-bundles>
----
2019-01-31 23:44:04 UTC - Ambud Sharma: thanks @Matteo Merli seeing exception: 
```Caused by: java.lang.NullPointerException
        at 
org.apache.pulsar.broker.admin.impl.PersistentTopicsBase.unloadTopic(PersistentTopicsBase.java:1389)
 ~[org.apache.pulsar-pulsar-broker-2.2.1.jar:2.2.1]
        at 
org.apache.pulsar.broker.admin.impl.PersistentTopicsBase.internalUnloadTopic(PersistentTopicsBase.java:482)
 ~[org.apache.pulsar-pulsar-broker-2.2.1.jar:2.2.1]
        at 
org.apache.pulsar.broker.admin.v2.PersistentTopics.unloadTopic(PersistentTopics.java:210)
 ~[org.apache.pulsar-pulsar-broker-2.2.1.jar:2.2.1]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:1.8.0_172]
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_172]
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_172]
```
----
2019-01-31 23:46:14 UTC - Ambud Sharma: topic is persistent has 12 partitions, 
there are 4 available brokers. topic is currently on 3 brokers, we induced a 
controlled broker stop that caused a rebalance on the partitions; moving from 4 
brokers to 3.
----
2019-01-31 23:47:01 UTC - Matteo Merli: Ok, do you have by any chance the full 
stack trace of the NPE ?
----
2019-01-31 23:47:43 UTC - Matteo Merli: it should be logged just before what 
you pasted:

```
 } catch (Exception e) {
            log.error("[{}] Failed to unload topic {}, {}", clientAppId(), 
topicName, e.getCause().getMessage(), e);
            throw new RestException(e.getCause());
        }
```
----
2019-01-31 23:48:57 UTC - Ambud Sharma: I have DMed you the trace
----
2019-01-31 23:50:12 UTC - Matteo Merli: +1
----
2019-02-01 01:17:17 UTC - Eric Lopez: @Eric Lopez has joined the channel
----
2019-02-01 02:05:18 UTC - Vincent Ngan: On the other hand regarding the message 
retention issue, what is the simplest way to make sure all messages be retained 
regardless whether they have been ack-ed or not until an explicit action to 
remove them?
----
2019-02-01 02:06:37 UTC - Matteo Merli: I don’t think it’s possible to remove 
them with an “explicit action”, other than controlling that with acknowledgment
----
2019-02-01 02:07:05 UTC - Matteo Merli: on top of acknowledgment, you have 
retention (size, time)
----
2019-02-01 02:07:09 UTC - Joe Francis: You can do same as you will do for Kafka 
- use retention
----
2019-02-01 02:09:14 UTC - Joe Francis: It's hard to understand what is the use 
case. How is  your use of ack different from explict removal?
----
2019-02-01 02:13:12 UTC - Grant Wu: You can set retention based on size and 
time, such that all messages will be retained regardless of whether they have 
been acked or not
----
2019-02-01 02:13:36 UTC - Grant Wu: Uh I think I should let merlimat and joef 
take this one
----
2019-02-01 02:14:08 UTC - Matteo Merli: :slightly_smiling_face: that was 100% 
correct though
----
2019-02-01 02:14:30 UTC - Grant Wu: I mean, I know - but I think it’s not 
helpful to have a third person jumping in here
----
2019-02-01 02:16:11 UTC - Vincent Ngan: I have a use case that I want to keep 
all messages permanently without worrying about some of them being removed 
accidentally because of being acknowledged until I think I have finished using 
them and remove them by deleting the whole topic.
----
2019-02-01 02:18:25 UTC - Joe Francis: Sure - I'm just tryring to figure out 
what is the difference between acknowledgement and "I think I have finished 
using them" .. what happens between ack and you deciding you have finished with 
the topic?
----
2019-02-01 02:21:11 UTC - Vincent Ngan: I should have simply said “until I have 
finished using them”.
----
2019-02-01 02:21:22 UTC - Matteo Merli: @Vincent Ngan you have 2 options:
 * Use “infinite” retention (size=-1, time=-1) and delete the topic when done
 * Use acks (especially, cumulative acks) to drop messages when you’re done 
with them
----
2019-02-01 02:22:15 UTC - Joe Francis: Yeah, but that is what acks are meant 
for. You only have to ack when you are super duper confident that you have 
processed the message. Or is this some other recon you do after you finish with 
everything?
----
2019-02-01 02:22:51 UTC - Grant Wu: I think there are use cases where you may 
want to process a message more than once
----
2019-02-01 02:23:57 UTC - Vincent Ngan: I think @Matteo Merli’s suggestion 1 - 
Use “infinite” retention is what I want.
----
2019-02-01 02:24:35 UTC - Matteo Merli: @Grant Wu sure, you can do that either 
with retention, or by keeping multiple subscriptions
----
2019-02-01 02:25:09 UTC - Grant Wu: We use #1 as well
----
2019-02-01 02:26:32 UTC - Grant Wu: One of our uses is for user notifications; 
we ack messages on delivery, but we retain them… well our notifications are 
still very primitive, but I can imagine it being useful to be able to look at 
them historically; for example, if you refresh your browser, users probably 
expect their notifications to not just be wiped every time)
----
2019-02-01 02:27:25 UTC - Grant Wu: I think our org treats some of our Pulsar 
topics more of a seekable stream of messages than as a FIFO queue
----
2019-02-01 02:29:26 UTC - Grant Wu: (but others are treated as a FIFO queue - 
for example, we have a topic where notifications that we want to send to a 
group of users are produced.  then a Pulsar function distributed them into 
per-user topics)
----
2019-02-01 02:29:45 UTC - Matteo Merli: :+1:
----
2019-02-01 02:37:20 UTC - Vincent Ngan: My use case is this: I have a large set 
of messages produced by a producer, and I have a number of different sets of 
logic to analyse these messages using different consumer/subscriptions and they 
will acknowledge the messages differently. These sets of logic will be executed 
from time to time so that I need to retain all the messages for repeated 
executions. Also I may introduce new sets of logic later on to analyse the 
messages again. The things I am worrying about is that if some messages have 
been ack-ed by all the current consumers and  then deleted from the storage due 
to some settings of the retention policy, the later processes with different 
consumer/subscriptions will fail to see the deleted messages.
----
2019-02-01 02:38:37 UTC - Joe Francis: It really depends on what you want to 
do.  (1)If you want to just process once, and then do a verification pass, 2 
subs is best. (You dont have to manage retention.) (2) If you want replay on 
demand, (repeated execution) then retention and a sub better, but you have to 
manage when to chop  (3)If you want to run a lossy downstream pipe, with replay 
on demand, the retention + the ReaderAPI , but you have to manage when to chop
----
2019-02-01 02:39:35 UTC - Grant Wu: 
<https://pulsar.apache.org/docs/en/cookbooks-retention-expiry/> I think this is 
definitely the place to start
----
2019-02-01 02:45:56 UTC - Vincent Ngan: I am just asking what is the 
configuration settings I need to set in order to make sure I can always have a 
full set of messages to analyse regardless of how they have been acknowledged. 
Of course I have to housekeep these message eventually. But, that will happen 
only when we have finished the whole exercise of analysing the messages. Then 
we can simply drop the topic.
----
2019-02-01 02:46:47 UTC - Grant Wu: My link has the details ^
----
2019-02-01 02:46:59 UTC - Grant Wu: 
<https://pulsar.apache.org/docs/en/cookbooks-retention-expiry/#examples>
----
2019-02-01 02:54:59 UTC - Joe Francis: That would be to set 
defaultRetentionTimeInMinutes= XXXX  (high enough), and 
backlogQuotaDefaultLimitGB = YYYY (large enough to accomodate all your data) . 
Dont  use expiry.
----
2019-02-01 02:55:40 UTC - Joe Francis: Whenever you want to replay, reset you 
sub to the beginning and replay.
----
2019-02-01 02:56:47 UTC - Vincent Ngan: Thanks! or setting them to -1 as 
mentioned by @Matteo Merli
----
2019-02-01 02:57:32 UTC - Joe Francis: I think so
----
2019-02-01 03:07:28 UTC - Vincent Ngan: I am sorry that I keep on asking stupid 
questions: can I rename a namespace?
----
2019-02-01 03:07:48 UTC - Joe Francis: No.
----
2019-02-01 03:08:17 UTC - Vincent Ngan: Can I move a topic from a namespace to 
another?
----
2019-02-01 03:08:35 UTC - Joe Francis: Its a system identifier. That goes 
against the definition
----
2019-02-01 03:11:23 UTC - Joe Francis: No. These are not supported. Currently 
these are all identifiers - the full ID is 
&lt;tenant&gt;/&lt;namespace&gt;/&lt;topic&gt;  and immutable
----
2019-02-01 04:41:57 UTC - Simon: I'm trying to understand how to set 
replication clusters via the REST api. I see the end point 
/namespaces/{tenant}/{namespace}/replication
----
2019-02-01 04:42:27 UTC - Simon: so if i want to set 'mycluster' to a 
namespace, how do i do it?
----
2019-02-01 04:43:00 UTC - Simon: the doco doesn't really show how to add the 
cluster name to that end point
----
2019-02-01 04:43:28 UTC - Matteo Merli: It should be a JSON list of the cluster 
names
----
2019-02-01 04:43:37 UTC - Matteo Merli: (In the body)
----
2019-02-01 04:43:38 UTC - Simon: ah simple. cheers
----
2019-02-01 04:44:34 UTC - Matteo Merli: (Going by memory, might actually be 
wrong ;) )
----
2019-02-01 05:25:17 UTC - Simon: had to use [ 'myCluster' ]
----
2019-02-01 05:25:20 UTC - Simon: thanks
----
2019-02-01 07:53:22 UTC - David Tinker: I have a Pulsar cluster of 3 machines. 
Each one running Pulsar broker, Zookeeper and Bookkeeper. I have the following 
in my broker.conf:

managedLedgerDefaultEnsembleSize=3
managedLedgerDefaultWriteQuorum=3
managedLedgerDefaultAckQuorum=2

So I should be able to take any one of the 3 machines down for a while without 
any disruption in service right? And when I bring it up will it get copies of 
all the message it missed? I just want to make sure I am understanding things 
correctly before I do this to our live cluster. I don’t want to have a very bad 
weekend!
----
2019-02-01 08:50:06 UTC - jia zhai: @David Tinker Before taking one machine 
off, be sure turn bookkeeper auto-recovery off.
----
2019-02-01 08:59:02 UTC - jia zhai: If not set off, bookie will do 
auto-recovery once a machine is offline, because bk was expected to have 3 data 
copies, but it only have 2 copies now.
And since it will not success to find a third available machine to place the 
recovered copy, so auto-recovery will be fail.
----
2019-02-01 08:59:20 UTC - David Tinker: So I need to run "bin/bookkeeper shell 
autorecovery -disable" on each machine in the cluster?
----
2019-02-01 08:59:50 UTC - jia zhai: yes
----
2019-02-01 09:00:16 UTC - David Tinker: What happens if auto-recovery fails? 
Example: if we lose one of our machines unexpectedly?
----
2019-02-01 09:01:22 UTC - jia zhai: 
<http://bookkeeper.apache.org/docs/latest/admin/autorecovery/>
----
2019-02-01 09:01:35 UTC - jia zhai: Here is more info related to autorecovery.
----
2019-02-01 09:02:08 UTC - jia zhai: ```
Disable AutoRecovery
You can disable AutoRecovery at any time, for example during maintenance. 
Disabling AutoRecovery ensures that bookies' data isn't unnecessarily 
rereplicated when the bookie is only taken down for a short period of time, for 
example when the bookie is being updated or the configuration if being changed.
```
----
2019-02-01 09:07:06 UTC - David Tinker: Tx. One other thing: We have an A 
record in DNS for the cluster listing the 3 machines and clients use this name 
to connect. Will they automatically connect to one of the up machines when one 
machine goes down?
----
2019-02-01 09:10:26 UTC - jia zhai: Yes, with DNS, it will automatically 
connect to other machines.
----

Slack digest for #general - 2019-02-01

Reply via email to