2019-01-30 16:01:29 UTC - naga: Any billing app to plugin on pulsar?
----
2019-01-30 16:12:59 UTC - David Kjerrumgaard: @r Are you referring to something
that meters the usage of Pulsar and creates a bill for an end user?
----
2019-01-30 16:16:08 UTC - naga: Yes @David Kjerrumgaard
----
2019-01-30 16:48:45 UTC - David Kjerrumgaard: I am unaware of any such
application at the moment, but are always open to feature requests and/or
contributions. :smiley:
----
2019-01-30 17:04:29 UTC - Joe Francis: @naga Pulsar provides usage metrics, but
you will have to aggregate them. How do you plan to bill? Pulsar has knobs
for provisioning. You can set quotas for storage and dispatch rates.
----
2019-01-30 17:26:22 UTC - naga: @David Kjerrumgaard and @Joe Francis thank you
:grinning:
----
2019-01-30 18:27:04 UTC - Jon Bock: @naga Are you looking to offer Pulsar as a
managed service, or looking to build a tool to help assess cost of running
Pulsar (e.g. for internal chargeback or other)?
----
2019-01-30 19:56:45 UTC - Arun M. Krishnakumar: @Arun M. Krishnakumar has
joined the channel
----
2019-01-30 21:11:20 UTC - Narciso Guillen de la Mora: @Narciso Guillen de la
Mora has joined the channel
----
2019-01-30 21:46:25 UTC - Ivan Serdyuk: @Ivan Serdyuk has joined the channel
----
2019-01-30 23:28:02 UTC - naga: @Jon Bock yes you are right. Trying to run as
managed service
----
2019-01-30 23:59:27 UTC - Grant Wu: Hey, @Matteo Merli this happened again
----
2019-01-30 23:59:38 UTC - Grant Wu: Not 100% sure but I’m starting to think it
might be reproducible; this time it occurred when I deployed and shut down my
company’s product in a loop
----
2019-01-30 23:59:44 UTC - Grant Wu: Any thoughts on what could possibly be
causing this?
----
2019-01-31 00:01:29 UTC - Matteo Merli: I never saw this :confused:
----
2019-01-31 00:01:43 UTC - Matteo Merli: any hint on how to reproduce?
----
2019-01-31 00:02:00 UTC - Grant Wu: uh….
----
2019-01-31 00:02:26 UTC - Grant Wu: Well, I can’t exactly send you our deploy
scripts and they wouldn’t be useful anyways
----
2019-01-31 00:03:12 UTC - Grant Wu: Let me see if I can reliably recreate this
internally first tomorrow
----
2019-01-31 00:03:16 UTC - Matteo Merli: Sure, can you describe when it happens?
Or any config that’s different from defaults, or any info that might help
----
2019-01-31 00:06:10 UTC - Grant Wu: I’m not an expert on how we deploy Pulsar
----
2019-01-31 00:07:16 UTC - Grant Wu: one thing that is strange is that we run
three brokers
----
2019-01-31 00:07:19 UTC - Grant Wu: And that only 2 of them are dying
----
2019-01-31 00:07:58 UTC - Matteo Merli: It appears the issue is when recovering
the topic used for functions metadata and assignemntes
----
2019-01-31 00:08:09 UTC - Grant Wu: ah
----
2019-01-31 00:08:14 UTC - Grant Wu: okay, so it is related to Pulsar Functions…
hrm
----
2019-01-31 00:08:22 UTC - Matteo Merli: `public/functions/persistent/coordinate`
----
2019-01-31 00:09:32 UTC - Grant Wu: well our deploys do have automation around
creating and deleting Pulsar functions
----
2019-01-31 00:09:52 UTC - Grant Wu: ```
put_pfs()
{
find . -maxdepth 2 -name 'function-config.yaml' | while read -r i; do
d=$(dirname "$i")
pushd "$d" || exit 1
sed -i "s/PULSAR_TENANT/${PULSAR_TENANT}/g" function-config.yaml
sed -i "s/K8S_NAMESPACE/${K8S_NAMESPACE}/g" function-config.yaml
pf_name=$(grep "^name:" function-config.yaml | awk -F": \"" '{print
$2}' | sed 's/.$//')
# Unfortunately, Pulsar uses the same exit code for both a missing
function and a connection error
check_for_function=$($PULSAR_DIR/bin/pulsar-admin functions get
--namespace "$K8S_NAMESPACE" --tenant "$PULSAR_TENANT" --name "$pf_name"
2>&1)
if printf "%s" "$check_for_function" | grep -q "HTTP 404 Not Found";
then
echo "Creating pulsar function $pf_name"
cmd="create"
elif printf "%s" "$check_for_function" | grep -q
"java.net.ConnectException"; then
echo "Failed to connect to Pulsar"
exit 2
else
echo "Updating pulsar function $pf_name"
cmd="update"
fi
if ! $PULSAR_DIR/bin/pulsar-admin functions $cmd --functionConfigFile
"$(realpath function-config.yaml)"; then
echo "Failed to put $pf_name"
exit 3
fi
popd || exit 4
done
}
delete_pfs()
{
$PULSAR_DIR/bin/pulsar-admin functions list --namespace "$K8S_NAMESPACE"
--tenant "$PULSAR_TENANT" | while read -r pf_name; do
echo "Deleting pulsar function $pf_name"
$PULSAR_DIR/bin/pulsar-admin functions delete --namespace
"$K8S_NAMESPACE" --tenant "$PULSAR_TENANT" --name "$pf_name"
done
}
```
----
2019-01-31 00:09:55 UTC - Grant Wu: this is what it looks like…
----
2019-01-31 00:10:25 UTC - Grant Wu: I did have a typo in this script that was
breaking the stopping part
----
2019-01-31 00:10:34 UTC - Matteo Merli: Actually, even though we don’t know how
it got there, we can add additional check to prevent failing to recover that
topic
----
2019-01-31 00:10:50 UTC - Grant Wu: Is there any instrumentation we can add on
our end to help figure this out?
----
2019-01-31 00:13:52 UTC - Grant Wu: Just so you know
<https://apache-pulsar.slack.com/archives/C5Z4T36F7/p1547154563359500> is from
our company as well
----
2019-01-31 00:14:06 UTC - Grant Wu: We were using 2.1.1 then, but we’ve
upgraded to 2.2.1 now
----
2019-01-31 00:29:41 UTC - Matteo Merli: The fix to avoid the bad failure is
straighforward. I’m trying to have unit test working to cover that .
----
2019-01-31 00:55:23 UTC - Matteo Merli: @Grant Wu
<https://github.com/apache/pulsar/pull/3487>
----
2019-01-31 00:56:36 UTC - Ambud Sharma: are there other policies we can
configure: ```--policy producer_request_hold``` for `set-backlog-quota` ?
----
2019-01-31 00:57:19 UTC - Emma Pollum: @Ambud Sharma
<http://pulsar.apache.org/docs/en/cookbooks-retention-expiry.html#backlog-quotas>
----
2019-01-31 00:57:22 UTC - Matteo Merli:
<http://pulsar.apache.org/docs/en/admin-api-namespaces.html#set-backlog-quota-policies>
----
2019-01-31 00:57:23 UTC - Matteo Merli: :slightly_smiling_face:
----
2019-01-31 00:58:29 UTC - Ambud Sharma: thanks @Matteo Merli should we add a
reference here:
<https://pulsar.apache.org/docs/en/cookbooks-retention-expiry/#pulsar-admin-2> ?
----
2019-01-31 01:00:23 UTC - Ambud Sharma: thanks @Emma Pollum
----
2019-01-31 03:08:03 UTC - bossbaby: my Bookie and my broker connect 2
zookeeper, I tried to turn off any zookeeper and the result was a broker to
stop. So the question is: Will zookeeper never die to ensure the system works
normally?
----
2019-01-31 04:21:35 UTC - David Kjerrumgaard: @bossbaby Zookeeper is just
another java process, so there is no way to guarantee that it never dies.
However, if you configure a quorum of 3 zookeeper nodes, you can survive up to
2 ZK failures without any issue. If you are properly monitoring your
environment, you should have sufficient time to restart or replace the failed
ZK nodes
+1 : bossbaby
----
2019-01-31 04:27:35 UTC - Ali Ahmed: @bossbaby there is no such nothing as a
system not dying you have increase the cluster size of zookeeper to have
acceptable uptime for you application, important production environments can ZK
clusters to have 5 -9 nodes to satisfy quorum requirements and to survive mutli
rack / multi AZ failures.
+1 : bossbaby
----
2019-01-31 04:28:34 UTC - Matteo Merli: Do you have 2 ZK servers in your
cluster?
----
2019-01-31 04:29:41 UTC - Matteo Merli: ZK is based on quorum, so you’d have to
deploy an odd number of servers
----
2019-01-31 04:29:56 UTC - Matteo Merli: 1, 3 or 5
----
2019-01-31 04:30:28 UTC - Matteo Merli: Deploying 2 servers gives you worse
availability than having 1 single node
----
2019-01-31 04:30:50 UTC - Matteo Merli: Any one node failure and the system is
unavailable
----
2019-01-31 04:35:29 UTC - bossbaby: thanks you, i understand it
----
2019-01-31 04:37:39 UTC - bossbaby: so, if i have 3 zk, i can stop 1 zk or 5 zk
and stop 2 zk
----
2019-01-31 04:37:42 UTC - bossbaby: it right?
----
2019-01-31 04:39:28 UTC - Matteo Merli: Correct
----
2019-01-31 04:41:18 UTC - bossbaby: If I have 5zk, can I stop 3 zk?
----
2019-01-31 04:41:36 UTC - Matteo Merli: Only 2
----
2019-01-31 04:41:51 UTC - Matteo Merli: You need a majority of them available
----
2019-01-31 04:42:18 UTC - Matteo Merli: You can deploy 7 and then you can
afford to lose 3
----
2019-01-31 04:42:57 UTC - bossbaby: thanks you @Matteo Merli
----
2019-01-31 04:48:55 UTC - bossbaby: There was also a problem I encountered when
setting up geo-replication, when a new cluster involved in the clusters was
available. How can consumers in this new cluster read all messages sent
previously from topic-global?
----
2019-01-31 04:49:50 UTC - Matteo Merli: The data is geo-replicated only from
the point where you enable it
----
2019-01-31 04:50:45 UTC - Matteo Merli: You can always rollback a replicator
cursor, in the same same way you can reset a subscription to an earlier point
in time
----
2019-01-31 04:51:42 UTC - Matteo Merli: The name of the replicator
“subscription” would be `pulsar.repl.REGION`
----
2019-01-31 05:00:02 UTC - bossbaby: sorry @Matteo Merli, how to rollback a
replicator cursor, I did not find documentation about it
----
2019-01-31 05:06:03 UTC - Matteo Merli: eg: if you have 2 pulsar clusters
(`west` and `east`), if you want to replicate all the old data from `west`
-> `east`, you would have to do:
In region `west`:
```
bin/pulsar-admin topics reset-cursor $MY_TOPIC -s pulsar.repl.east --time 24h
```
To roll it back by 24 hours
clock12 : Ali Ahmed
----
2019-01-31 05:10:47 UTC - bossbaby: Thanks you very much
+1 : Matteo Merli
----
2019-01-31 05:18:01 UTC - bossbaby: Why need a majority of them available.
My answer is that if the number of zookeeper is less than the number of dead,
no one will vote as the leader so the system will not work.
is it right?
----
2019-01-31 05:19:02 UTC - Matteo Merli: knowing whether a node is dead or just
unreachable (or slow) is actually an impossible problem to solve
:slightly_smiling_face:
+1 : bossbaby
----
2019-01-31 06:19:05 UTC - bossbaby: clusters
```
["region_1","region_2","region_3","region_4"]
```
Error occur when run
```
bin/pulsar-admin topics reset-cursor <persistent://public/default/tx> -s
pulsar.repl.region_4 --time 24h
```
```
Reason: Subscription not found
```
----
2019-01-31 06:29:58 UTC - Matteo Merli: Which region are you executing this
command ?
----
2019-01-31 06:30:19 UTC - Matteo Merli: Also, which region is this topic being
replicated into?
----
2019-01-31 06:31:27 UTC - Matteo Merli: `pulsar-admin namespaces get-clusters
public/default`
----
2019-01-31 06:34:52 UTC - bossbaby: topic being replicated in :
region_3
region_4
region_1
region_2
----
2019-01-31 06:35:12 UTC - bossbaby: i executing command in broker in region_1
----
2019-01-31 06:39:11 UTC - Vincent Ngan: I cannot find anywhere in the
documentation describing how Pulsar protects data loss. Supposing I have a
multi-node cluster with 3 bookies, how many bookie nodes can fail without
causing loss of data?
----
2019-01-31 06:40:52 UTC - bossbaby: @Vincent Ngan
<https://jack-vanlightly.com/blog/2018/10/21/how-to-not-lose-messages-on-an-apache-pulsar-cluster>
----
2019-01-31 06:45:27 UTC - Matteo Merli: it depends on the replication factor
(eg: how many guaranteed copy of the data you have)
----
2019-01-31 06:46:04 UTC - Vincent Ngan: Where is this configured?
----
2019-01-31 06:46:21 UTC - Matteo Merli: that’s configurable per each namespace.
Defaults per brokers are set in `broker.conf` :
```
# Number of bookies to use when creating a ledger
managedLedgerDefaultEnsembleSize=2
# Number of copies to store for each message
managedLedgerDefaultWriteQuorum=2
# Number of guaranteed copies (acks to wait before write is complete)
managedLedgerDefaultAckQuorum=2
```
----
2019-01-31 06:47:03 UTC - Matteo Merli: To be able to write, you need to have
at least `EnsembleSize` bookies available
----
2019-01-31 06:47:45 UTC - Matteo Merli: to not lose data, you can lose up to
`AckQuorum -1` bookies
----
2019-01-31 06:48:51 UTC - Matteo Merli: can you show internal stats for topic
in region_1 ?
----
2019-01-31 06:49:09 UTC - Matteo Merli: `pulsar-admin topics internal-stats
<persistent://public/default/tx>`
----
2019-01-31 06:57:46 UTC - bossbaby: intarnal-stats was wrong
i was run stats and result
```
{
"msgRateIn" : 0.0,
"msgThroughputIn" : 0.0,
"msgRateOut" : 0.0,
"msgThroughputOut" : 0.0,
"averageMsgSize" : 0.0,
"storageSize" : 1616,
"publishers" : [ ],
"subscriptions" : {
"consumer-test-2" : {
"msgRateOut" : 0.0,
"msgThroughputOut" : 0.0,
"msgRateRedeliver" : 0.0,
"msgBacklog" : 31,
"blockedSubscriptionOnUnackedMsgs" : false,
"unackedMessages" : 0,
"type" : "Exclusive",
"msgRateExpired" : 0.0,
"consumers" : [ ]
},
"consumer-test-1" : {
"msgRateOut" : 0.0,
"msgThroughputOut" : 0.0,
"msgRateRedeliver" : 0.0,
"msgBacklog" : 31,
"blockedSubscriptionOnUnackedMsgs" : false,
"unackedMessages" : 0,
"msgRateExpired" : 0.0,
"consumers" : [ ]
}
},
"replication" : {
"region_2" : {
"msgRateIn" : 0.0,
"msgThroughputIn" : 0.0,
"msgRateOut" : 0.0,
"msgThroughputOut" : 0.0,
"msgRateExpired" : 0.0,
"replicationBacklog" : 0,
"connected" : true,
"replicationDelayInSeconds" : 0,
"inboundConnection" : "/192.168.1.133:54498",
"inboundConnectedSince" : "2019-01-31T12:19:44.47+07:00",
"outboundConnection" : "[id: 0x68d6d230, L:/192.168.1.133:54505 -
R:pros-mbp/192.168.1.133:6652]",
"outboundConnectedSince" : "2019-01-31T12:19:44.529+07:00"
},
"region_3" : {
"msgRateIn" : 0.0,
"msgThroughputIn" : 0.0,
"msgRateOut" : 0.0,
"msgThroughputOut" : 0.0,
"msgRateExpired" : 0.0,
"replicationBacklog" : 0,
"connected" : true,
"replicationDelayInSeconds" : 0,
"outboundConnection" : "[id: 0x27d7afed, L:/192.168.1.133:54506 -
R:pros-mbp/192.168.1.133:6654]",
"outboundConnectedSince" : "2019-01-31T12:19:44.587+07:00"
},
"region_4" : {
"msgRateIn" : 0.0,
"msgThroughputIn" : 0.0,
"msgRateOut" : 0.0,
"msgThroughputOut" : 0.0,
"msgRateExpired" : 0.0,
"replicationBacklog" : 0,
"connected" : true,
"replicationDelayInSeconds" : 0,
"inboundConnection" : "/192.168.1.133:54516",
"inboundConnectedSince" : "2019-01-31T12:20:09.017+07:00",
"outboundConnection" : "[id: 0x34765280, L:/192.168.1.133:54507 -
R:pros-mbp/192.168.1.133:6656]",
"outboundConnectedSince" : "2019-01-31T12:19:44.587+07:00"
}
},
"deduplicationStatus" : "Disabled"
}
```
----
2019-01-31 07:00:48 UTC - Matteo Merli: sorry, the command was `stats-internal`
----
2019-01-31 07:01:29 UTC - Vincent Ngan: So, a 3-node cluster with the above
settings can survive a failure of 1 node. Is that right?
----
2019-01-31 07:03:12 UTC - Matteo Merli: Correct, and in this case availability
(`3 - ensemble = 1`) and durability `(ack - 1 = 1)` are the same
----
2019-01-31 07:03:27 UTC - bossbaby: result:
```
bin/pulsar initialize-cluster-metadata \
--cluster region_4 \
--zookeeper localhost:2184 \
--configuration-store localhost:2190 \
--web-service-url <http://localhost:8083> \
--web-service-url-tls <https://localhost:8446> \
--broker-service-url <pulsar://localhost:6656> \
--broker-service-url-tls <pulsar+ssl://localhost:6657>
{
"entriesAddedCounter" : 1,
"numberOfEntries" : 31,
"totalSize" : 1616,
"currentLedgerEntries" : 1,
"currentLedgerSize" : 59,
"lastLedgerCreatedTimestamp" : "2019-01-31T12:19:44.312+07:00",
"waitingCursorsCount" : 7,
"pendingAddEntriesCount" : 0,
"lastConfirmedEntry" : "44:0",
"state" : "LedgerOpened",
"ledgers" : [ { ],
"cursors" : {
"consumer-test-1" : {
"markDeletePosition" : "0:-1",
"readPosition" : "0:0",
"waitingReadOp" : false,
"pendingReadOps" : 0,
"messagesConsumedCounter" : -30,
"cursorLedger" : 49,
"cursorLedgerLastEntry" : 2,
"individuallyDeletedMessages" : "[]",
"lastLedgerSwitchTimestamp" : "2019-01-31T12:19:44.316+07:00",
"state" : "Open",
"numberOfEntriesSinceFirstNotAckedMessage" : 1,
"totalNonContiguousDeletedMessagesRange" : 0,
"properties" : { }
},
"consumer-test-2" : {
"markDeletePosition" : "0:-1",
"readPosition" : "0:0",
"waitingReadOp" : false,
"pendingReadOps" : 0,
"messagesConsumedCounter" : -30,
"cursorLedger" : 45,
"cursorLedgerLastEntry" : 13,
"individuallyDeletedMessages" : "[]",
"lastLedgerSwitchTimestamp" : "2019-01-31T12:19:44.315+07:00",
"state" : "Open",
"numberOfEntriesSinceFirstNotAckedMessage" : 1,
"totalNonContiguousDeletedMessagesRange" : 0,
"properties" : { }
},
"pulsar.repl.region_2" : {
"markDeletePosition" : "44:0",
"readPosition" : "44:1",
"waitingReadOp" : true,
"pendingReadOps" : 0,
"messagesConsumedCounter" : 1,
"cursorLedger" : 46,
"cursorLedgerLastEntry" : 1,
"individuallyDeletedMessages" : "[]",
"lastLedgerSwitchTimestamp" : "2019-01-31T12:19:44.317+07:00",
"state" : "Open",
"numberOfEntriesSinceFirstNotAckedMessage" : 1,
"totalNonContiguousDeletedMessagesRange" : 0,
"properties" : { }
},
"pulsar.repl.region_3" : {
"markDeletePosition" : "44:0",
"readPosition" : "44:1",
"waitingReadOp" : true,
"pendingReadOps" : 0,
"messagesConsumedCounter" : 1,
"cursorLedger" : 48,
"cursorLedgerLastEntry" : 1,
"individuallyDeletedMessages" : "[]",
"lastLedgerSwitchTimestamp" : "2019-01-31T12:19:44.317+07:00",
"state" : "Open",
"numberOfEntriesSinceFirstNotAckedMessage" : 1,
"totalNonContiguousDeletedMessagesRange" : 0,
"properties" : { }
},
"pulsar.repl.region_4" : {
"markDeletePosition" : "44:0",
"readPosition" : "44:1",
"waitingReadOp" : true,
"pendingReadOps" : 0,
"messagesConsumedCounter" : 1,
"cursorLedger" : 47,
"cursorLedgerLastEntry" : 1,
"individuallyDeletedMessages" : "[]",
"lastLedgerSwitchTimestamp" : "2019-01-31T12:19:44.317+07:00",
"state" : "Open",
"numberOfEntriesSinceFirstNotAckedMessage" : 1,
"totalNonContiguousDeletedMessagesRange" : 0,
"properties" : { }
}
}
}
```
----
2019-01-31 07:04:53 UTC - Matteo Merli: Uhm, then the reset-cursor command
should have worked
----
2019-01-31 07:04:55 UTC - Matteo Merli: :confused:
----
2019-01-31 07:05:51 UTC - Matteo Merli: Maybe there’s a check in there that
only allows “regular” subscription, though it would be better for this to work
too.
----
2019-01-31 07:06:02 UTC - Matteo Merli: Can you open an issue so that I don’t
forget?
----
2019-01-31 07:10:44 UTC - bossbaby: if i use subscription: consumer-test-2, not
error but data no replicated
----
2019-01-31 07:11:04 UTC - bossbaby: ```
bin/pulsar-admin persistent reset-cursor <persistent://public/default/tx> -s
consumer-test-2 --time 24h
```
----
2019-01-31 07:12:21 UTC - Matteo Merli: sure that’s just a local subscription
----
2019-01-31 07:13:35 UTC - bossbaby: i will open an issue and hope it will work
----
2019-01-31 07:16:02 UTC - Matteo Merli: in the meantime, any new message
published should be getting replicated
----
2019-01-31 07:16:27 UTC - bossbaby: thanks you @Matteo Merli
----
2019-01-31 07:33:25 UTC - Vincent Ngan: Regarding message retention, what does
`delete` means in the following paragraph quoted from:
<https://pulsar.apache.org/docs/en/concepts-messaging/#message-retention-and-expiry>
> *Message retention and expiry*
> By default, Pulsar message brokers:
>
> • immediately `delete` all messages that have been acknowledged by a
consumer, and
> • persistently store all unacknowledged messages in a message backlog.
Does it mean physically deleted, or logically deleted with regard to the
subscription the consumer is subscribing?
----
2019-01-31 07:35:16 UTC - Vincent Ngan: Can I start another consumer using a
different subscription to receive the messages previously acknowledged?
----
2019-01-31 07:40:38 UTC - bossbaby: you can
----
2019-01-31 07:42:26 UTC - Vincent Ngan: My own tests show that acknowledgement
of messages by a consumer is only effective to the related subscription used by
the consumer. I can start another consumer with a different subscription to
read the messages again.
----
2019-01-31 07:45:52 UTC - Vincent Ngan: This comes down to my question about
when messages are actually and physically deleted?
----
2019-01-31 07:48:26 UTC - Vincent Ngan: It looks to me that messages are
perpetually persisted!
----
2019-01-31 08:02:35 UTC - jia zhai: It should be logically delete
----
2019-01-31 08:03:40 UTC - jia zhai: @Vincent Ngan yes, you are right.
----
2019-01-31 08:34:23 UTC - Vincent Ngan: If this is the case, how can we
housekeep the physical storage used by the bookies?
----