2019-07-25 09:27:20 UTC - divyasree: @Sijie Guo Can you guide me, how global configuration store, helps in geo replication concept? ---- 2019-07-25 09:31:25 UTC - Sijie Guo: oh sorry I missed your question. let me comment in the original thread ---- 2019-07-25 09:34:31 UTC - Sijie Guo: so the global configuration store is the storage used for storing all the namespace policies. The namespace policies can be propgated to all clusters.
The current implementation of global configuration store is based on zookeeper. With that being said, you need to do so: 1) setup a zookeeper cluster for global configuration store. (it should be accessible by all the clusters, ideally it should also be spreading across multiple data centers). 2) update all your clusters (brokers) to point to the global configuration store. ``` # Configuration Store connection string configurationStoreServers= ``` 3) restart the brokers. ---- 2019-07-25 09:48:34 UTC - Alexandre DUVAL: hi, can share the thread? didn't find it ---- 2019-07-25 09:49:32 UTC - Sijie Guo: <https://apache-pulsar.slack.com/archives/C5Z4T36F7/p1564047271114200> ---- 2019-07-25 09:49:55 UTC - Alexandre DUVAL: thx ---- 2019-07-25 10:21:26 UTC - Richard Sherman: We currently have a cluster of 4 bookies with ``` { "bookkeeperEnsemble" : 2, "bookkeeperWriteQuorum" : 2, "bookkeeperAckQuorum" : 2, "managedLedgerMaxMarkDeleteRate" : 0.0 } ``` If 2 bookies die could we lose data? ---- 2019-07-25 10:23:52 UTC - Sijie Guo: @Richard Sherman you can continue writing. but the old data are not available. ---- 2019-07-25 10:24:21 UTC - Richard Sherman: That was my understanding thanks. ---- 2019-07-25 11:51:07 UTC - vikash: @Sijie Guo How to View messages in pulsar ,as of now i am not able to view Messages using <http://Pulsardashboard.Is|Pulsardashboard.Is> it any Rest api there to see All Messages for Particualr subscription ---- 2019-07-25 11:53:00 UTC - Alexandre DUVAL: What is the reason about topics stats storageSize value is 0 even if topics stats-internal is not? ---- 2019-07-25 12:32:10 UTC - Richard Sherman: <https://pulsar.apache.org/admin-rest-api/#operation/peekNthMessage> ---- 2019-07-25 12:33:31 UTC - Richard Sherman: You can view one message at a time with this. There is an update to the dashboard that will allow viewing messages through it although it is limited to json messages at the moment. ---- 2019-07-25 12:37:24 UTC - vikash: ok thanks it worked ---- 2019-07-25 14:59:26 UTC - Matteo Merli: Ledgers are rolled over after some time/size threshold is reached, therefore the ledgers are not deleted immediately, even if consumer has already moved on. Rather, you have to wait until the ledger is rolled over, then the whole ledger is dropped ---- 2019-07-25 15:37:27 UTC - Jerry Peng: Another Kafka --> Pulsar conversion user story: <https://medium.com/streamnative/build-a-priority-based-push-notification-system-using-apache-pulsar-at-getui-40252f4beae9> +1 : Karthik Ramasamy, Ryan Samo, Yuvaraj Loganathan grinning : Yuvaraj Loganathan ---- 2019-07-25 16:24:49 UTC - Ben: Does anyone have any ideas of why a Java consumer, which implements `MessageListener`, would seem to not be reading all the messages in a topic? To be more exact, I produce a file full of 106k tweets to a topic via a Python producer. My Java consumer then reads from this topic but will randomly stop consuming between 90k and 106k messages (sometimes it will consume all 106k messages and sometimes it will not). This issue occurs when using the following settings on my 2.4.0 consumer and producer: persistent or non-persistent topic, async or sync acknowledgement, and shared subscription type. It should be noted that I am also using the `topicsPattern` on my consumer. ---- 2019-07-25 16:25:45 UTC - Matteo Merli: Can you share the topic stats? ---- 2019-07-25 16:32:48 UTC - Ben: Here are the stats at the end of the demo when the consumer stopped around 93k messages. ``` { "msgRateIn" : 0.0, "msgThroughputIn" : 0.0, "msgRateOut" : 775.2862140655906, "msgThroughputOut" : 4991697.812397025, "averageMsgSize" : 0.0, "storageSize" : 12911, "publishers" : [ ], "subscriptions" : { "tweet-parsing" : { "msgRateOut" : 775.2862140655906, "msgThroughputOut" : 4991697.812397025, "msgRateRedeliver" : 0.0, "msgBacklog" : 0, "blockedSubscriptionOnUnackedMsgs" : false, "msgDelayed" : 0, "unackedMessages" : 0, "type" : "Shared", "msgRateExpired" : 0.0, "consumers" : [ { "msgRateOut" : 775.2862140655906, "msgThroughputOut" : 4991697.812397025, "msgRateRedeliver" : 0.0, "consumerName" : "7dbeb", "availablePermits" : 948, "unackedMessages" : 0, "blockedConsumerOnUnackedMsgs" : false, "metadata" : { }, "connectedSince" : "2019-07-25T16:29:13.694Z", "clientVersion" : "2.4.0", "address" : "/192.168.176.7:45334" } ], "isReplicated" : false } }, "replication" : { }, "deduplicationStatus" : "Disabled" }```` ---- 2019-07-25 17:19:50 UTC - Matteo Merli: Sorry, just saw this. Though it seems the consumer is consuming 775 msg/s from the stats ---- 2019-07-25 17:20:34 UTC - Matteo Merli: Also ` "msgBacklog" : 0,`, so it’s already caught up ---- 2019-07-25 17:42:22 UTC - Ben: Right, I noticed that too. Well, this helps narrow the search further down to my end. I appreciate the help! ---- 2019-07-25 18:00:33 UTC - Yuvaraj Loganathan: We have set retention to `pulsar-admin namespaces set-retention my-tenant/my-ns --size -1 --time -1` and `brokerDeleteInactiveTopicsEnabled=true` Will pulsar still deletes the data if the topic is inactive ? We don't want to expire or delete the data . ---- 2019-07-25 18:04:43 UTC - Ben: I have a counter in the `received` method for `MessageListener`. It reports a differnet amount of consumed messages vs what is reported in `stats-internal`. I'm assuming `messagesConsumedCounter` is tracking the number of consumer messages of my subscription. ---- 2019-07-25 19:00:41 UTC - Bruno Panuto: @Bruno Panuto has joined the channel ---- 2019-07-25 20:12:45 UTC - Grant Wu: No ---- 2019-07-25 20:12:56 UTC - Grant Wu: <https://github.com/apache/pulsar/issues/1416> this is documented in, for some reason, the geo-replication docs ---- 2019-07-25 20:36:52 UTC - Igor Zubchenok: When you use pulsar functions, Pulsar internally caches every producer you use to publish other messages and it never cleans the cache. So such behaviour leads to OOM. Any thoughts? ---- 2019-07-25 20:45:54 UTC - Grant Wu: Can you post an error message? ---- 2019-07-25 21:09:55 UTC - Igor Zubchenok: I just learned the Pulsar code. ---- 2019-07-25 21:10:17 UTC - Grant Wu: I’m not following. Are you saying that you’re deriving this just by reading the code? ---- 2019-07-25 21:11:48 UTC - Igor Zubchenok: Right. I also did some tests. If you have a lot of topics to write to, you will finally get OOM, cause there will be too many Publishers cached. ---- 2019-07-25 21:27:27 UTC - Igor Zubchenok: It is very easy, just check implementation of ContextImpl, there is a HashMap publishProducers, which is never cleared. <https://github.com/apache/pulsar/blob/master/pulsar-functions/instance/src/main/java/org/apache/pulsar/functions/instance/ContextImpl.java> ---- 2019-07-25 21:27:58 UTC - Grant Wu: Right, I’ve actually glanced through this before, just felt that posting more information generally better ---- 2019-07-25 21:36:15 UTC - Peace: @Peace has joined the channel ---- 2019-07-25 21:41:41 UTC - Igor Zubchenok: That's the reason why we terminated an idea to use functions for dispatching messages and we're doing it now by using Publisher last access time based cache at our side. ---- 2019-07-25 21:44:04 UTC - Ali Ahmed: can you create an issue this probably needs to be profiled ---- 2019-07-25 21:54:20 UTC - Igor Zubchenok: sure <https://github.com/apache/pulsar/issues/4816> ---- 2019-07-25 21:56:01 UTC - Igor Zubchenok: came out clear enough? ---- 2019-07-25 21:56:26 UTC - Ali Ahmed: yes it’s fine ---- 2019-07-26 08:34:52 UTC - divyasree: ok.. i will test and post the status... ---- 2019-07-26 08:52:05 UTC - divyasree: @Sijie Guo I have geo replication enabled setup which have two clusters running in different DC. I have enabled authentication and authorization in both clusters.. ---- 2019-07-26 08:52:31 UTC - divyasree: I need to know how authorisation using roles is working in geo replicated setup.. ---- 2019-07-26 08:56:04 UTC - divyasree: Assuming few things: Authorisation (creating roles - produce, consume and granting permission to a namespace) is at namespace level which means, there will be separate token provided when creating each role. But in geo replication it was mentioned in the link <https://pulsar.apache.org/docs/en/security-authorization/> as ``` When using geo-replication, every broker needs to be able to publish to all the other clusters' topics. ``` ---- 2019-07-26 08:57:06 UTC - divyasree: Can you explain me how authorization works in geo replication. And do we need to provide the token in client.conf of all the broker node? ---- 2019-07-26 08:58:19 UTC - divyasree: If so, if we are having 100 namespace, then we will have 100 token (assuming as such, correct me if i am wrong), how so we need to provide it in the client.conf ----
