Slack digest for #general - 2018-06-02

Apache Pulsar Slack Sat, 02 Jun 2018 02:12:02 -0700

2018-06-01 09:25:32 UTC - Mate Varga: 20 ms is eternity :slightly_smiling_face:
----
2018-06-01 09:52:48 UTC - Kishore Karunakaran: :+1:
----
2018-06-01 12:54:41 UTC - Matti-Pekka Laaksonen: Wonderful! Thank you for the 
update
----
2018-06-01 20:01:12 UTC - Christopher Burke: @Christopher Burke has joined the 
channel
----
2018-06-02 01:03:35 UTC - ben gordon: @ben gordon has joined the channel
----
2018-06-02 02:28:32 UTC - Nozomi Kurihara: There will be Apache Pulsar talk at 
OSCON, July 18, 2018, PORTLAND, OR.


“Apache Pulsar and its enterprise use case”
<https://conferences.oreilly.com/oscon/oscon-or/public/schedule/detail/69704>

Also Yahoo! JAPAN will have an exhibition booth where we will explain about 
what Apache Pulsar is, its features and case examples at Yahoo! JAPAN, etc.

I will be very happy if a lot of people will come to the talk and the booth!
thumbsup_all : jia zhai, Matteo Merli, Sijie Guo, Ali Ahmed
fire : Sijie Guo, Ali Ahmed
----
2018-06-02 03:04:15 UTC - jia zhai: :+1:
----
2018-06-02 05:54:13 UTC - Vasily Yanov: Hi! Guys, we have really depressing 
performance issue. When publish rate is increasing storage write latency 
extremely falling down. Currently we have about ~20k msg/s for publish rate and 
our storage latency graph looks like:
----
2018-06-02 05:54:43 UTC - Vasily Yanov: @Vasily Yanov uploaded a file: 
<https://apache-pulsar.slack.com/files/UA6617JD6/FAZU4MC1X/image.png|image.png>
----
2018-06-02 05:55:01 UTC - Vasily Yanov: all peaks a connected with publish 
rates:
----
2018-06-02 05:55:23 UTC - Ali Ahmed: what is your batch size ?
----
2018-06-02 05:55:30 UTC - Vasily Yanov: @Vasily Yanov uploaded a file: 
<https://apache-pulsar.slack.com/files/UA6617JD6/FAZPNJZ2L/image.png|image.png>
----
2018-06-02 05:56:32 UTC - Vasily Yanov: ```
# The number of max entries to keep in fragment for re-replication
rereplicationEntryBatchSize=5000
```
----
2018-06-02 05:57:14 UTC - Vasily Yanov: btw: cluster consist of 4 nodes with 
3/3/2
----
2018-06-02 05:58:34 UTC - Ali Ahmed: do you any jvm memory monitoring ?
----
2018-06-02 05:58:50 UTC - Vasily Yanov: each node: CPU: 8xCore, 32RAM, 2x500SSD 
with RAID0
----
2018-06-02 05:58:53 UTC - Vasily Yanov: yes
----
2018-06-02 05:59:14 UTC - Vasily Yanov: oops
----
2018-06-02 05:59:15 UTC - Vasily Yanov: no
----
2018-06-02 05:59:19 UTC - Ali Ahmed: maybe the jvm memory flags are too low
----
2018-06-02 05:59:44 UTC - Ali Ahmed: seems unusual for relatively low volume
----
2018-06-02 06:00:01 UTC - Ali Ahmed: I think there might be aggressive memory 
pressure somewhere
----
2018-06-02 06:01:36 UTC - Vasily Yanov: 5 minutes - I'll put jvm graphs to 
pulsar dashboard
----
2018-06-02 06:01:52 UTC - Ali Ahmed: ok
----
2018-06-02 06:02:36 UTC - Ali Ahmed: you can check logs to see what settings 
were picked up at the jvm startup
----
2018-06-02 06:04:11 UTC - Vasily Yanov: what exactly I need to see in logs? for 
memory I have: -Xms4g -Xmx8g -XX:MaxDirectMemorySize=8g
----
2018-06-02 06:04:30 UTC - Vasily Yanov: for pulsar and same values for 
bookkeeper
----
2018-06-02 06:06:08 UTC - Ali Ahmed: @Matteo Merli @Sijie Guo how do me make 
sure that pulsar picked up the right jvm flags ?
----
2018-06-02 06:12:23 UTC - Sijie Guo: @Vasily Yanov which version of pulsar? 
1.22 or 2.0?  what kind of disks you have?
----
2018-06-02 06:13:26 UTC - Ali Ahmed: ```2x500SSD with RAID0```
----
2018-06-02 06:16:53 UTC - Sijie Guo: okay, the jvm settings seem to be fine. I 
am not sure what type of ssds you have. hard to tell what happen. it would be 
good to get jvm stats and bookie stats, so I can take a look.
----
2018-06-02 06:40:22 UTC - Vasily Yanov: @Sijie Guo 1.22
----
2018-06-02 06:41:11 UTC - Vasily Yanov: jvm stat for pulsar:
----
2018-06-02 06:41:20 UTC - Vasily Yanov: @Vasily Yanov uploaded a file: 
<https://apache-pulsar.slack.com/files/UA6617JD6/FAZU9UD41/image.png|image.png>
----
2018-06-02 06:44:02 UTC - Sijie Guo: gc looks fine.

next couple of things to check:

- what is your publish settings? how many topics?
- can you get bookie stats as well? most useful metrics would be around 
journal. suspected that the latency comes from disk.
----
2018-06-02 06:44:08 UTC - Sijie Guo: @Vasily Yanov
----
2018-06-02 06:45:02 UTC - Vasily Yanov: @Igor Zubchenok could you answer first 
question?
----
2018-06-02 06:47:18 UTC - Igor Zubchenok: 44K topics
28K producers
40k subscriptions
27K consumers
I noted that performance is better when I have fewer producers around 20K, 
while all other metrics are on the same level.
----
2018-06-02 06:47:53 UTC - Igor Zubchenok: last hour:
<https://www.dropbox.com/s/0ziypvhglnoprhv/Screenshot%202018-06-02%2009.47.31.png?dl=0>
----
2018-06-02 06:47:57 UTC - Igor Zubchenok: @Igor Zubchenok shared a file: 
<https://apache-pulsar.slack.com/files/U9T9CC2P2/FB1BR02LE/screenshot_2018-06-02_09.47.31.png|Screenshot
 2018-06-02 09.47.31.png>
----
2018-06-02 06:48:08 UTC - Vasily Yanov: @Vasily Yanov uploaded a file: 
<https://apache-pulsar.slack.com/files/UA6617JD6/FAZPUFMJ4/image.png|image.png>
----
2018-06-02 06:48:09 UTC - Igor Zubchenok: last 12hours:
<https://www.dropbox.com/s/ctp4clnm0xkihvr/Screenshot%202018-06-02%2009.48.01.png?dl=0>
----
2018-06-02 06:48:12 UTC - Igor Zubchenok: @Igor Zubchenok shared a file: 
<https://apache-pulsar.slack.com/files/U9T9CC2P2/FB0JZQ7K7/screenshot_2018-06-02_09.48.01.png|Screenshot
 2018-06-02 09.48.01.png>
----
2018-06-02 06:48:43 UTC - Vasily Yanov: @Vasily Yanov uploaded a file: 
<https://apache-pulsar.slack.com/files/UA6617JD6/FAZUANHNV/image.png|image.png>
----
2018-06-02 06:49:07 UTC - Vasily Yanov: @Vasily Yanov uploaded a file: 
<https://apache-pulsar.slack.com/files/UA6617JD6/FB0EXNKJN/pasted_image_at_2018_06_02_09_49_am.png|Pasted
 image at 2018-06-02, 9:49 AM>
----
2018-06-02 06:49:32 UTC - Vasily Yanov: @Vasily Yanov uploaded a file: 
<https://apache-pulsar.slack.com/files/UA6617JD6/FB0BZECGK/image.png|image.png>
----
2018-06-02 06:59:16 UTC - Sijie Guo: @Igor Zubchenok - when you have more 
producers, does it mean it writes to more topics or?

@Vasily Yanov: if I aligned the graphs together, it seems that the latency up 
aligned with ledger count and entry count is up. do you have the throughput 
like diagram with same time range, so I can compare the metrics. but my initial 
thought from these graphs is the latency comes from bookies/disks. if looks 
like the read/write latency is a bit too high.
----
2018-06-02 07:00:27 UTC - Vasily Yanov: 5 sec. will check zabbix
----
2018-06-02 07:04:21 UTC - Vasily Yanov: for instance for 1st pulsar node:
----
2018-06-02 07:04:27 UTC - Vasily Yanov: @Vasily Yanov uploaded a file: 
<https://apache-pulsar.slack.com/files/UA6617JD6/FB1GD4F4P/image.png|image.png>
----
2018-06-02 07:05:36 UTC - Vasily Yanov: @Vasily Yanov uploaded a file: 
<https://apache-pulsar.slack.com/files/UA6617JD6/FB1GD8XB9/image.png|image.png>
----
2018-06-02 07:06:39 UTC - Sijie Guo: oh I mean the throughput diagram in pulsar 
dashboard, at the same time range as the other bookie diagrams.
----
2018-06-02 07:06:51 UTC - Sijie Guo: sorry I didn’t say it clearly
----
2018-06-02 07:07:06 UTC - Sijie Guo: but the disk metrics are good though 
:slightly_smiling_face:
----
2018-06-02 07:08:23 UTC - Vasily Yanov: @Vasily Yanov uploaded a file: 
<https://apache-pulsar.slack.com/files/UA6617JD6/FAZUD5UJV/image.png|image.png>
----
2018-06-02 07:20:42 UTC - Sijie Guo: actually one thing I am not clear from 
those graphs, @Igor Zubchenok said that there are around 44k topics? are those 
topics static or it changes quickly (e.g. create/delete topics very often)? I 
am asking this, I see the topics metrics are kind of stable, while the ledger 
count goes up to 500k - 1millon. can I get more clarifications about this part?
----
2018-06-02 07:21:42 UTC - Igor Zubchenok: We create/delete topics quite often.
----
2018-06-02 07:23:32 UTC - Sijie Guo: how do you create topics? explicitly? or 
implicitly with first publish or first consumption?
----
2018-06-02 07:27:13 UTC - Igor Zubchenok: Implicitly via Pulsar Client API: 
create consumer, close consumer.
----
2018-06-02 07:29:05 UTC - Igor Zubchenok: Then we do the topic initialization, 
and then create consumer again, when our app connects to our arbitrary server 
and the app is ready for receiving messages.
----
2018-06-02 07:33:13 UTC - Sijie Guo: gotcha.

one more question, is broker and bookie co-run in the same machines?

can you run “free” on one machine to get your system memory usage (assume you 
are using a linux box)?
----
2018-06-02 07:33:20 UTC - Sijie Guo: @Vasily Yanov
----
2018-06-02 07:34:47 UTC - Vasily Yanov: yes, now they shares same server. what 
do you mean "free"? split bookie and broker to other machines?
----
2018-06-02 07:36:07 UTC - Sijie Guo: sorry linux has  command “free” to show 
used and available memory
----
2018-06-02 07:36:38 UTC - Vasily Yanov: ah, gotcha
----
2018-06-02 07:37:42 UTC - Vasily Yanov: ```
root@prod-pulsar-01 ~ # free
              total        used        free      shared  buff/cache   available
Mem:       32666544    20651220      258036        1508    11757288    11542700
Swap:      33521660           0    33521660
```
----
2018-06-02 07:38:45 UTC - Vasily Yanov: ```
root@prod-pulsar-02 ~ # free
              total        used        free      shared  buff/cache   available
Mem:       32666536    21445608      292344        1512    10928584    10748200
Swap:      33521660           0    33521660
```
----
2018-06-02 07:39:00 UTC - Vasily Yanov: ```
root@prod-pulsar-03 ~ # free
              total        used        free      shared  buff/cache   available
Mem:       32664332    22783900     2350256        1508     7530176     9407936
Swap:      33521660           0    33521660
```
----
2018-06-02 07:39:14 UTC - Vasily Yanov: ```
root@prod-pulsar-04 ~ # free
              total        used        free      shared  buff/cache   available
Mem:       32664600    11444024     2317948        1520    18902628    20748188
Swap:      33521660           0    33521660
```
----
2018-06-02 07:39:27 UTC - Sijie Guo: one last thing - the jvm graph that you 
show is that for brokers or bookies?
----
2018-06-02 07:41:24 UTC - Vasily Yanov: for brokers
----
2018-06-02 07:41:40 UTC - Vasily Yanov: for bookies we haven't it
----
2018-06-02 07:41:58 UTC - Sijie Guo: oh okay.
----
2018-06-02 08:05:51 UTC - Sijie Guo: @Igor Zubchenok @Vasily Yanov here is my 
suggestion: 

since your pattern is you create/delete topic often, so client batching won’t 
help. this traffic pattern is connections/request dominated. those objects are 
using heap memory. that can be confirmed from the broker jvm stats, heap is 
used around 6GB and non-heap is around 100MB. 

from the bookie’s metric, bookie write cache is never beyond 200mb, however 
read cache is constantly at 256MB limitation. and both write and read latency 
are kind of high at p99, so I am suspecting this comes from small read cache.

so what I would suggest is to tune the jvm settings for heap (due to your 
traffic pattern):

for broker: increase the heap size and reduce the direct memory -&gt; -Xms8g 
-Xmx10g -XX:MaxDirectMemorySize=2g

for bookie: increase the heap size and reduce the direct memory -&gt; -Xms8g 
-Xmx10g -XX:MaxDirectMemorySize=4g

for this, the max amount memory used for jvm would be 10+10+2+4 = 26, since you 
have 36 gb ram, you will have 6gb for os and fs (which is good enough since 
your traffic is not bandwidth dominate).

for bookie, adjust following settings:

increase `dbStorage_readAheadCacheMaxSizeMb` from 256 to 512.
reduce `dbStorage_readAheadCacheBatchSize` from 1000 to 10 (since you have many 
ledgers)
+1 : Ali Ahmed
----
2018-06-02 08:07:42 UTC - Sijie Guo: those default settings are kind of 
optimized for small/medium number of ledgers and each ledger has high 
throughput. in your case, it is kind of reverse, so what we need to optimize is 
giving more heap memory rather than direct memory and tune some cache behaviors.
----
2018-06-02 08:08:36 UTC - Vasily Yanov: @Sijie Guo thank you for such detailed 
reply. will try your advice right now
----
2018-06-02 08:13:46 UTC - Ali Ahmed: one general comment creating deleting 
large number of topics frequently is not consistent with general pub sub 
patterns, why was such a design choses ?
----
2018-06-02 08:24:49 UTC - Vasily Yanov: another one crazy issue: we have 4 
nodes with 3.3.2 managed ledger configuration and when I try to restart at 
least one node all cluster becomes unresponsive and our application cannot work 
anymore. is it by design or something wrong with our cluster?
----
2018-06-02 08:36:02 UTC - Sijie Guo: no. I think it might be related to your 
traffic pattern.  how many namespace do you have? one or many? and how many 
topics per namespace?

what I am suspeting here is you don’t override the number bundles of a 
namespace, so by default it has 4 bundles, but since your traffic is probably 
not high enough to trigger bundle split. then each bundle has almost 10k 
topics, when you kick restart one node, you are shifting one huge bundle from 
one broker to the other broker, it has to close topics and open topics, which 
might be a churn to metadata store (zk).

let me know the number of namespaces and topics/namespace, so I can help how to 
optimize for your use case.
----
2018-06-02 08:39:37 UTC - Igor Zubchenok: We use 192 bundles per namespace.
----
2018-06-02 08:41:08 UTC - Igor Zubchenok: 3 namespaces: 1 has a few topics, two 
other have similar number of topics: ~20K + ~20K
----
2018-06-02 08:53:06 UTC - Tom Liversidge: @Tom Liversidge has joined the channel
----

Slack digest for #general - 2018-06-02

Reply via email to