Slack digest for #general - 2019-07-26

Apache Pulsar Slack Fri, 26 Jul 2019 02:11:44 -0700

2019-07-25 09:27:20 UTC - divyasree: @Sijie Guo Can you guide me, how global 
configuration store, helps in geo replication concept?
----
2019-07-25 09:31:25 UTC - Sijie Guo: oh sorry I missed your question. let me 
comment in the original thread
----
2019-07-25 09:34:31 UTC - Sijie Guo: so the global configuration store is the 
storage used for storing all the namespace policies. The namespace policies can 
be propgated to all clusters.


The current implementation of global configuration store is based on zookeeper.

With that being said, you need to do so:

1) setup a zookeeper cluster for global configuration store. (it should be 
accessible by all the clusters, ideally it should also be spreading across 
multiple data centers).
2) update all your clusters (brokers) to point to the global configuration 
store.
```
# Configuration Store connection string
configurationStoreServers=
```
3) restart the brokers.
----
2019-07-25 09:48:34 UTC - Alexandre DUVAL: hi, can share the thread? didn't 
find it
----
2019-07-25 09:49:32 UTC - Sijie Guo: 
<https://apache-pulsar.slack.com/archives/C5Z4T36F7/p1564047271114200>
----
2019-07-25 09:49:55 UTC - Alexandre DUVAL: thx
----
2019-07-25 10:21:26 UTC - Richard Sherman: We currently have a cluster of 4 
bookies with
```
{
  "bookkeeperEnsemble" : 2,
  "bookkeeperWriteQuorum" : 2,
  "bookkeeperAckQuorum" : 2,
  "managedLedgerMaxMarkDeleteRate" : 0.0
}
```
If 2 bookies die could we lose data?
----
2019-07-25 10:23:52 UTC - Sijie Guo: @Richard Sherman you can continue writing. 
but the old data are not available.
----
2019-07-25 10:24:21 UTC - Richard Sherman: That was my understanding thanks.
----
2019-07-25 11:51:07 UTC - vikash: @Sijie Guo How  to  View  messages  in  
pulsar  ,as  of   now  i  am  not  able  to  view Messages  using 
<http://Pulsardashboard.Is|Pulsardashboard.Is>  it  any  Rest   api  there  to  
see  All  Messages  for  Particualr  subscription
----
2019-07-25 11:53:00 UTC - Alexandre DUVAL: What is the reason about topics 
stats storageSize value is 0 even if topics stats-internal is not?
----
2019-07-25 12:32:10 UTC - Richard Sherman: 
<https://pulsar.apache.org/admin-rest-api/#operation/peekNthMessage>
----
2019-07-25 12:33:31 UTC - Richard Sherman: You can view one message at a time 
with this. There is an update to the dashboard that will allow viewing messages 
through it although it is limited to json messages at the moment.
----
2019-07-25 12:37:24 UTC - vikash: ok  thanks  it  worked
----
2019-07-25 14:59:26 UTC - Matteo Merli: Ledgers are rolled over after some 
time/size threshold is reached, therefore the ledgers are not deleted 
immediately, even if consumer has already moved on. Rather, you have to wait 
until the ledger is rolled over, then the whole ledger is dropped
----
2019-07-25 15:37:27 UTC - Jerry Peng: Another Kafka --&gt; Pulsar conversion 
user story:
<https://medium.com/streamnative/build-a-priority-based-push-notification-system-using-apache-pulsar-at-getui-40252f4beae9>
+1 : Karthik Ramasamy, Ryan Samo, Yuvaraj Loganathan
grinning : Yuvaraj Loganathan
----
2019-07-25 16:24:49 UTC - Ben: Does anyone have any ideas of why a Java 
consumer, which implements `MessageListener`, would seem to not be reading all 
the messages in a topic? To be more exact, I produce a file full of 106k tweets 
to a topic via a Python producer.  My Java consumer then reads from this topic 
but will randomly stop consuming between 90k and 106k messages (sometimes it 
will consume all 106k messages and sometimes it will not). This issue occurs 
when using the following settings on my 2.4.0 consumer and producer: persistent 
or non-persistent topic, async or sync acknowledgement, and shared subscription 
type. It should be noted that I am also using the `topicsPattern` on my 
consumer.
----
2019-07-25 16:25:45 UTC - Matteo Merli: Can you share the topic stats?
----
2019-07-25 16:32:48 UTC - Ben: Here are the stats at the end of the demo when 
the consumer stopped around 93k messages. ``` {
  "msgRateIn" : 0.0,
  "msgThroughputIn" : 0.0,
  "msgRateOut" : 775.2862140655906,
  "msgThroughputOut" : 4991697.812397025,
  "averageMsgSize" : 0.0,
  "storageSize" : 12911,
  "publishers" : [ ],
  "subscriptions" : {
    "tweet-parsing" : {
      "msgRateOut" : 775.2862140655906,
      "msgThroughputOut" : 4991697.812397025,
      "msgRateRedeliver" : 0.0,
      "msgBacklog" : 0,
      "blockedSubscriptionOnUnackedMsgs" : false,
      "msgDelayed" : 0,
      "unackedMessages" : 0,
      "type" : "Shared",
      "msgRateExpired" : 0.0,
      "consumers" : [ {
        "msgRateOut" : 775.2862140655906,
        "msgThroughputOut" : 4991697.812397025,
        "msgRateRedeliver" : 0.0,
        "consumerName" : "7dbeb",
        "availablePermits" : 948,
        "unackedMessages" : 0,
        "blockedConsumerOnUnackedMsgs" : false,
        "metadata" : { },
        "connectedSince" : "2019-07-25T16:29:13.694Z",
        "clientVersion" : "2.4.0",
        "address" : "/192.168.176.7:45334"
      } ],
      "isReplicated" : false
    }
  },
  "replication" : { },
  "deduplicationStatus" : "Disabled"
}````
----
2019-07-25 17:19:50 UTC - Matteo Merli: Sorry, just saw this. Though it seems 
the consumer is consuming 775 msg/s from the stats
----
2019-07-25 17:20:34 UTC - Matteo Merli: Also `      "msgBacklog" : 0,`,  so 
it’s already caught up
----
2019-07-25 17:42:22 UTC - Ben: Right, I noticed that too. Well, this helps 
narrow the search further down to my end. I appreciate the help!
----
2019-07-25 18:00:33 UTC - Yuvaraj Loganathan: We have set retention  to 
`pulsar-admin namespaces set-retention my-tenant/my-ns --size -1   --time -1` 
and `brokerDeleteInactiveTopicsEnabled=true` Will pulsar still deletes the data 
if the topic is inactive  ? We don't want to expire or delete the data .
----
2019-07-25 18:04:43 UTC - Ben: I have a counter in the  `received` method for 
`MessageListener`. It reports a differnet amount of consumed messages vs what 
is reported in `stats-internal`. I'm assuming `messagesConsumedCounter` is 
tracking the number of consumer messages of my subscription.
----
2019-07-25 19:00:41 UTC - Bruno Panuto: @Bruno Panuto has joined the channel
----
2019-07-25 20:12:45 UTC - Grant Wu: No
----
2019-07-25 20:12:56 UTC - Grant Wu: 
<https://github.com/apache/pulsar/issues/1416> this is documented in, for some 
reason, the geo-replication docs
----
2019-07-25 20:36:52 UTC - Igor Zubchenok: When you use pulsar functions, Pulsar 
internally caches every producer you use to publish other messages and it never 
cleans the cache. So such behaviour leads to OOM. Any thoughts?
----
2019-07-25 20:45:54 UTC - Grant Wu: Can you post an error message?
----
2019-07-25 21:09:55 UTC - Igor Zubchenok: I just learned the Pulsar code.
----
2019-07-25 21:10:17 UTC - Grant Wu: I’m not following.  Are you saying that 
you’re deriving this just by reading the code?
----
2019-07-25 21:11:48 UTC - Igor Zubchenok: Right.
I also did some tests. If you have a lot of topics to write to, you will 
finally get OOM, cause there will be too many Publishers cached.
----
2019-07-25 21:27:27 UTC - Igor Zubchenok: It is very easy, just check 
implementation of ContextImpl, there is a HashMap publishProducers, which is 
never cleared.

<https://github.com/apache/pulsar/blob/master/pulsar-functions/instance/src/main/java/org/apache/pulsar/functions/instance/ContextImpl.java>
----
2019-07-25 21:27:58 UTC - Grant Wu: Right, I’ve actually glanced through this 
before, just felt that posting more information generally better
----
2019-07-25 21:36:15 UTC - Peace: @Peace has joined the channel
----
2019-07-25 21:41:41 UTC - Igor Zubchenok: That's the reason why we terminated 
an idea to use functions for dispatching messages and we're doing it now by 
using Publisher last access time based cache at our side.
----
2019-07-25 21:44:04 UTC - Ali Ahmed: can you create an issue this probably 
needs to be profiled
----
2019-07-25 21:54:20 UTC - Igor Zubchenok: sure
<https://github.com/apache/pulsar/issues/4816>
----
2019-07-25 21:56:01 UTC - Igor Zubchenok: came out clear enough?
----
2019-07-25 21:56:26 UTC - Ali Ahmed: yes it’s fine
----
2019-07-26 08:34:52 UTC - divyasree: ok.. i will test and post the status...
----
2019-07-26 08:52:05 UTC - divyasree: @Sijie Guo I have geo replication enabled 
setup which have two clusters running in different DC. I have enabled 
authentication and authorization in both clusters..
----
2019-07-26 08:52:31 UTC - divyasree: I need to know how authorisation using 
roles is working in geo replicated setup..
----
2019-07-26 08:56:04 UTC - divyasree: Assuming few things:
      Authorisation (creating roles - produce, consume and granting permission 
to a namespace) is at namespace level which means, there will be separate token 
provided when creating each role.
      But in geo replication it was mentioned in the link 
<https://pulsar.apache.org/docs/en/security-authorization/> as ``` When using 
geo-replication, every broker needs to be able to publish to all the other 
clusters' topics. ```
----
2019-07-26 08:57:06 UTC - divyasree: Can you explain me how authorization works 
in geo replication. And do we need to provide the token in client.conf of all 
the broker node?
----
2019-07-26 08:58:19 UTC - divyasree: If so, if we are having 100 namespace, 
then we will have 100 token (assuming as such, correct me if i am wrong), how 
so we need to provide it in the client.conf
----

Slack digest for #general - 2019-07-26

Reply via email to