Slack digest for #general - 2019-11-20

Apache Pulsar Slack Wed, 20 Nov 2019 01:11:35 -0800

2019-11-19 12:27:05 UTC - Sijie Guo: the broker failure time relies on the 
zookeeper session expire time. You can tune the sesssion timeout to be around a 
second or less a second, which can improve the failure time.


Alternatively, supporting readonly topic ownership and having a broker standby 
to become a write/read topic owner  will also reduce the failover time. Folks 
from Tencent has already worked out a change for readonly topic ownership. They 
will contribute that change back some time.
----
2019-11-19 13:04:27 UTC - Fabien LD: JVM view of bookkeeper instances : a few 
big GCs, but "low" memory consumption
----
2019-11-19 13:12:28 UTC - Rafael Figueiredo: @Rafael Figueiredo has joined the 
channel
----
2019-11-19 15:27:00 UTC - Pedro Cardoso: Hello, is the latency reported via the 
[stats endpoint](<https://pulsar.apache.org/docs/en/pulsar-admin/#stats>) of 
the pulsar admin tool in milliseconds or seconds?

```$ bin/pulsar-admin functions stats --name rollingsum
{
  "receivedTotal" : 20127,
  "processedSuccessfullyTotal" : 20127,
  "systemExceptionsTotal" : 0,
  "userExceptionsTotal" : 0,
  "avgProcessLatency" : 0.11114769483777973,
  "1min" : {
    "receivedTotal" : 0,
    "processedSuccessfullyTotal" : 0,
    "systemExceptionsTotal" : 0,
    "userExceptionsTotal" : 0,
    "avgProcessLatency" : null
  },
  "lastInvocation" : 1574176900705,
  "instances" : [ {
    "instanceId" : 0,
    "metrics" : {
      "receivedTotal" : 20127,
      "processedSuccessfullyTotal" : 20127,
      "systemExceptionsTotal" : 0,
      "userExceptionsTotal" : 0,
      "avgProcessLatency" : 0.11114769483777973,
      "1min" : {
        "receivedTotal" : 0,
        "processedSuccessfullyTotal" : 0,
        "systemExceptionsTotal" : 0,
        "userExceptionsTotal" : 0,
        "avgProcessLatency" : null
      },
      "lastInvocation" : 1574176900705,
      "userMetrics" : { }
    }
  } ]
}```
----
2019-11-19 16:12:55 UTC - Kresh107: @Kresh107 has joined the channel
----
2019-11-19 16:17:13 UTC - Fernando: @Fernando has joined the channel
----
2019-11-19 16:44:22 UTC - Fernando: can someone help me debug why I get

```presto> show schemas in pulsar;
Query 20191119_164350_00001_xz299 failed: Failed to get schemas from pulsar: 
Connection refused```
 I’m using the helm kubernetes deploy of pulsar 2.4.1
----
2019-11-19 16:54:40 UTC - Fabien LD: Anything in presto logs ?
----
2019-11-19 16:54:58 UTC - Fernando: I found that it’s a problem of the presto 
configuration in `${project.root}/conf/presto/catalog/pulsar.properties` 
Question: how can I add the configuration in my kubernetes deployment? should I 
mount a configMap to the pod?
----
2019-11-19 16:57:46 UTC - Fabien LD: This is a presto question, not a pulsar one
----
2019-11-19 16:58:06 UTC - Fernando: it’s a question of pulsar deployment in k8s
----
2019-11-19 16:59:05 UTC - Fabien LD: IMHO it is a question of presto 
configuration in k8s. I use pulsar in k8s and I never got that issue ... 
because I do not use presto.
----
2019-11-19 16:59:14 UTC - Fabien LD: But yes, a configmap will do the job
----
2019-11-19 17:01:59 UTC - David Kjerrumgaard: @Sunil Sattiraju Do you know the 
cardinality of the key , i.e. how many distinct values of the key?
----
2019-11-19 17:22:54 UTC - Pedro Cardoso: <!here> does anyone know where pulsar 
in standalone mode stores information? I want to reset a standalone pulsar 
process to default settings and forget past operations.
----
2019-11-19 17:23:49 UTC - Jerry Peng: @Pedro Cardoso &lt;PROJECT_ROOT&gt;/data
+1 : David Kjerrumgaard, Pedro Cardoso
----
2019-11-19 17:24:03 UTC - Pedro Cardoso: Thanks!
----
2019-11-19 17:24:07 UTC - Jerry Peng: you can just remove the directory
----
2019-11-19 17:28:57 UTC - Jerry Peng: in milliseconds
----
2019-11-19 17:33:25 UTC - Pedro Cardoso: So `"avgProcessLatency" : 
0.11114769483777973,` we are talking 110 micro-seconds?
----
2019-11-19 17:33:38 UTC - Jerry Peng: yes
----
2019-11-19 17:33:52 UTC - Pedro Cardoso: Thank you very much
----
2019-11-19 17:34:32 UTC - Matteo Merli: @Logan B There have been lot of 
improvements on that front since 2016. We made sure the topics are transferred 
to other brokers in bundles (whose size is capped) to ensure that we don't 
overwhelm the metadata store with too many concurrent requests). That was 
combined with several improvements in the load manager itself (with a 
completely new implementation that came after that blog post).

Another recent improvement is coming with the upgrade of ZooKeeper to 3.5. 
There is a big perf improvement in the ZK client that avoid blocking the async 
calls on mutexes (for extended amount of time) while doing network IO. That was 
one of the biggest source of latency spikes.
----
2019-11-19 17:35:51 UTC - Matteo Merli: ..also `pulsar standalone --wipe-data` 
:slightly_smiling_face:
heart : Pedro Cardoso
----
2019-11-19 17:36:41 UTC - Jerry Peng: Failure tolerance is taken care of in 
pulsar functions.  Whether events are replayed because of failure will be 
determined by the processing guarantees set for the function i.e. at-most-once, 
at-least-once, or exactly-once
----
2019-11-19 19:55:30 UTC - Fabien LD: Back with my JVM memory issue (see my 
message earlier today -&gt; 
<https://apache-pulsar.slack.com/archives/C5Z4T36F7/p1574152972201700>) : I 
actually have the same issue with local zookeeper. A lot of out of JVM memory 
consumption. Any clue of what could be the issue? Or if there is a Dockerfile 
for an alternative docker image with a different JVM?
----
2019-11-19 19:59:24 UTC - David Kjerrumgaard: @Fabien LD What does the data 
flow look like at the point in time? How many messages are being published? 
consumed?
----
2019-11-19 20:15:46 UTC - Fabien LD: We have 5 datacenters, with geo 
replication of a few namespaces. Most of the messages are sent to local 
namespaces. TTL are between 30s and 60 secs for most data. 300 and 1800 for a 
couple global namespaces. Retention is 0 except for one namespace with little 
data. All are persistent, a few topics are partitionned (x12)
----
2019-11-19 20:23:58 UTC - David Kjerrumgaard: Has you tired adding bookies to 
the clusters to see if some of the load gets distributed onto the newly added 
bookies?
----
2019-11-20 00:11:09 UTC - Sunil Sattiraju: @David Kjerrumgaard It varies from 
key to key, on average i will have between 20-60 values per key
----
2019-11-20 00:15:32 UTC - David Kjerrumgaard: I meant how many different keys 
do you have?
----
2019-11-20 00:16:15 UTC - Sunil Sattiraju: quite a few, 10-20 million keys
----
2019-11-20 00:16:38 UTC - Sunil Sattiraju: are you thinking about having a 
topic for each key?
----
2019-11-20 00:20:27 UTC - David Kjerrumgaard: No, actually if you had a 
relatively small number of keys, then you could have an equal number of 
consumers. In order to establish a 1-to-1 ratio of keys to consumers. That 
would ensure that each consumer would only process messages with the same key.  
However, it looks like this isn't an option for your use case
----
2019-11-20 00:20:54 UTC - Sunil Sattiraju: hmm.. no
----
2019-11-20 04:21:18 UTC - Jared Mackey: Do retention policies only apply to 
current subscriptions? Or can I set it on a topic that has 0 subscribers and 
when they start up can start at the first message within the policy window? 
----
2019-11-20 05:29:12 UTC - Matteo Merli: Retention applies to topics, so even if 
there are no subscription (and even no producers or readers) the data and the 
topic will be retained for the specified amount of time.
----
2019-11-20 06:19:11 UTC - leonidv: Hello! I think, you should add some links to 
the bookkeeper documenation.
----
2019-11-20 06:25:21 UTC - Ali Ahmed: <https://github.com/archfish/pulsar_sdk>
+1 : xiaolong.ran
----
2019-11-20 08:23:39 UTC - Penghui Li: Thanks, as you said, we need a compete 
documentation for data deletion(include how bookie delete data ).
----

Slack digest for #general - 2019-11-20

Reply via email to