2018-06-01 09:25:32 UTC - Mate Varga: 20 ms is eternity :slightly_smiling_face: ---- 2018-06-01 09:52:48 UTC - Kishore Karunakaran: :+1: ---- 2018-06-01 12:54:41 UTC - Matti-Pekka Laaksonen: Wonderful! Thank you for the update ---- 2018-06-01 20:01:12 UTC - Christopher Burke: @Christopher Burke has joined the channel ---- 2018-06-02 01:03:35 UTC - ben gordon: @ben gordon has joined the channel ---- 2018-06-02 02:28:32 UTC - Nozomi Kurihara: There will be Apache Pulsar talk at OSCON, July 18, 2018, PORTLAND, OR.
“Apache Pulsar and its enterprise use case” <https://conferences.oreilly.com/oscon/oscon-or/public/schedule/detail/69704> Also Yahoo! JAPAN will have an exhibition booth where we will explain about what Apache Pulsar is, its features and case examples at Yahoo! JAPAN, etc. I will be very happy if a lot of people will come to the talk and the booth! thumbsup_all : jia zhai, Matteo Merli, Sijie Guo, Ali Ahmed fire : Sijie Guo, Ali Ahmed ---- 2018-06-02 03:04:15 UTC - jia zhai: :+1: ---- 2018-06-02 05:54:13 UTC - Vasily Yanov: Hi! Guys, we have really depressing performance issue. When publish rate is increasing storage write latency extremely falling down. Currently we have about ~20k msg/s for publish rate and our storage latency graph looks like: ---- 2018-06-02 05:54:43 UTC - Vasily Yanov: @Vasily Yanov uploaded a file: <https://apache-pulsar.slack.com/files/UA6617JD6/FAZU4MC1X/image.png|image.png> ---- 2018-06-02 05:55:01 UTC - Vasily Yanov: all peaks a connected with publish rates: ---- 2018-06-02 05:55:23 UTC - Ali Ahmed: what is your batch size ? ---- 2018-06-02 05:55:30 UTC - Vasily Yanov: @Vasily Yanov uploaded a file: <https://apache-pulsar.slack.com/files/UA6617JD6/FAZPNJZ2L/image.png|image.png> ---- 2018-06-02 05:56:32 UTC - Vasily Yanov: ``` # The number of max entries to keep in fragment for re-replication rereplicationEntryBatchSize=5000 ``` ---- 2018-06-02 05:57:14 UTC - Vasily Yanov: btw: cluster consist of 4 nodes with 3/3/2 ---- 2018-06-02 05:58:34 UTC - Ali Ahmed: do you any jvm memory monitoring ? ---- 2018-06-02 05:58:50 UTC - Vasily Yanov: each node: CPU: 8xCore, 32RAM, 2x500SSD with RAID0 ---- 2018-06-02 05:58:53 UTC - Vasily Yanov: yes ---- 2018-06-02 05:59:14 UTC - Vasily Yanov: oops ---- 2018-06-02 05:59:15 UTC - Vasily Yanov: no ---- 2018-06-02 05:59:19 UTC - Ali Ahmed: maybe the jvm memory flags are too low ---- 2018-06-02 05:59:44 UTC - Ali Ahmed: seems unusual for relatively low volume ---- 2018-06-02 06:00:01 UTC - Ali Ahmed: I think there might be aggressive memory pressure somewhere ---- 2018-06-02 06:01:36 UTC - Vasily Yanov: 5 minutes - I'll put jvm graphs to pulsar dashboard ---- 2018-06-02 06:01:52 UTC - Ali Ahmed: ok ---- 2018-06-02 06:02:36 UTC - Ali Ahmed: you can check logs to see what settings were picked up at the jvm startup ---- 2018-06-02 06:04:11 UTC - Vasily Yanov: what exactly I need to see in logs? for memory I have: -Xms4g -Xmx8g -XX:MaxDirectMemorySize=8g ---- 2018-06-02 06:04:30 UTC - Vasily Yanov: for pulsar and same values for bookkeeper ---- 2018-06-02 06:06:08 UTC - Ali Ahmed: @Matteo Merli @Sijie Guo how do me make sure that pulsar picked up the right jvm flags ? ---- 2018-06-02 06:12:23 UTC - Sijie Guo: @Vasily Yanov which version of pulsar? 1.22 or 2.0? what kind of disks you have? ---- 2018-06-02 06:13:26 UTC - Ali Ahmed: ```2x500SSD with RAID0``` ---- 2018-06-02 06:16:53 UTC - Sijie Guo: okay, the jvm settings seem to be fine. I am not sure what type of ssds you have. hard to tell what happen. it would be good to get jvm stats and bookie stats, so I can take a look. ---- 2018-06-02 06:40:22 UTC - Vasily Yanov: @Sijie Guo 1.22 ---- 2018-06-02 06:41:11 UTC - Vasily Yanov: jvm stat for pulsar: ---- 2018-06-02 06:41:20 UTC - Vasily Yanov: @Vasily Yanov uploaded a file: <https://apache-pulsar.slack.com/files/UA6617JD6/FAZU9UD41/image.png|image.png> ---- 2018-06-02 06:44:02 UTC - Sijie Guo: gc looks fine. next couple of things to check: - what is your publish settings? how many topics? - can you get bookie stats as well? most useful metrics would be around journal. suspected that the latency comes from disk. ---- 2018-06-02 06:44:08 UTC - Sijie Guo: @Vasily Yanov ---- 2018-06-02 06:45:02 UTC - Vasily Yanov: @Igor Zubchenok could you answer first question? ---- 2018-06-02 06:47:18 UTC - Igor Zubchenok: 44K topics 28K producers 40k subscriptions 27K consumers I noted that performance is better when I have fewer producers around 20K, while all other metrics are on the same level. ---- 2018-06-02 06:47:53 UTC - Igor Zubchenok: last hour: <https://www.dropbox.com/s/0ziypvhglnoprhv/Screenshot%202018-06-02%2009.47.31.png?dl=0> ---- 2018-06-02 06:47:57 UTC - Igor Zubchenok: @Igor Zubchenok shared a file: <https://apache-pulsar.slack.com/files/U9T9CC2P2/FB1BR02LE/screenshot_2018-06-02_09.47.31.png|Screenshot 2018-06-02 09.47.31.png> ---- 2018-06-02 06:48:08 UTC - Vasily Yanov: @Vasily Yanov uploaded a file: <https://apache-pulsar.slack.com/files/UA6617JD6/FAZPUFMJ4/image.png|image.png> ---- 2018-06-02 06:48:09 UTC - Igor Zubchenok: last 12hours: <https://www.dropbox.com/s/ctp4clnm0xkihvr/Screenshot%202018-06-02%2009.48.01.png?dl=0> ---- 2018-06-02 06:48:12 UTC - Igor Zubchenok: @Igor Zubchenok shared a file: <https://apache-pulsar.slack.com/files/U9T9CC2P2/FB0JZQ7K7/screenshot_2018-06-02_09.48.01.png|Screenshot 2018-06-02 09.48.01.png> ---- 2018-06-02 06:48:43 UTC - Vasily Yanov: @Vasily Yanov uploaded a file: <https://apache-pulsar.slack.com/files/UA6617JD6/FAZUANHNV/image.png|image.png> ---- 2018-06-02 06:49:07 UTC - Vasily Yanov: @Vasily Yanov uploaded a file: <https://apache-pulsar.slack.com/files/UA6617JD6/FB0EXNKJN/pasted_image_at_2018_06_02_09_49_am.png|Pasted image at 2018-06-02, 9:49 AM> ---- 2018-06-02 06:49:32 UTC - Vasily Yanov: @Vasily Yanov uploaded a file: <https://apache-pulsar.slack.com/files/UA6617JD6/FB0BZECGK/image.png|image.png> ---- 2018-06-02 06:59:16 UTC - Sijie Guo: @Igor Zubchenok - when you have more producers, does it mean it writes to more topics or? @Vasily Yanov: if I aligned the graphs together, it seems that the latency up aligned with ledger count and entry count is up. do you have the throughput like diagram with same time range, so I can compare the metrics. but my initial thought from these graphs is the latency comes from bookies/disks. if looks like the read/write latency is a bit too high. ---- 2018-06-02 07:00:27 UTC - Vasily Yanov: 5 sec. will check zabbix ---- 2018-06-02 07:04:21 UTC - Vasily Yanov: for instance for 1st pulsar node: ---- 2018-06-02 07:04:27 UTC - Vasily Yanov: @Vasily Yanov uploaded a file: <https://apache-pulsar.slack.com/files/UA6617JD6/FB1GD4F4P/image.png|image.png> ---- 2018-06-02 07:05:36 UTC - Vasily Yanov: @Vasily Yanov uploaded a file: <https://apache-pulsar.slack.com/files/UA6617JD6/FB1GD8XB9/image.png|image.png> ---- 2018-06-02 07:06:39 UTC - Sijie Guo: oh I mean the throughput diagram in pulsar dashboard, at the same time range as the other bookie diagrams. ---- 2018-06-02 07:06:51 UTC - Sijie Guo: sorry I didn’t say it clearly ---- 2018-06-02 07:07:06 UTC - Sijie Guo: but the disk metrics are good though :slightly_smiling_face: ---- 2018-06-02 07:08:23 UTC - Vasily Yanov: @Vasily Yanov uploaded a file: <https://apache-pulsar.slack.com/files/UA6617JD6/FAZUD5UJV/image.png|image.png> ---- 2018-06-02 07:20:42 UTC - Sijie Guo: actually one thing I am not clear from those graphs, @Igor Zubchenok said that there are around 44k topics? are those topics static or it changes quickly (e.g. create/delete topics very often)? I am asking this, I see the topics metrics are kind of stable, while the ledger count goes up to 500k - 1millon. can I get more clarifications about this part? ---- 2018-06-02 07:21:42 UTC - Igor Zubchenok: We create/delete topics quite often. ---- 2018-06-02 07:23:32 UTC - Sijie Guo: how do you create topics? explicitly? or implicitly with first publish or first consumption? ---- 2018-06-02 07:27:13 UTC - Igor Zubchenok: Implicitly via Pulsar Client API: create consumer, close consumer. ---- 2018-06-02 07:29:05 UTC - Igor Zubchenok: Then we do the topic initialization, and then create consumer again, when our app connects to our arbitrary server and the app is ready for receiving messages. ---- 2018-06-02 07:33:13 UTC - Sijie Guo: gotcha. one more question, is broker and bookie co-run in the same machines? can you run “free” on one machine to get your system memory usage (assume you are using a linux box)? ---- 2018-06-02 07:33:20 UTC - Sijie Guo: @Vasily Yanov ---- 2018-06-02 07:34:47 UTC - Vasily Yanov: yes, now they shares same server. what do you mean "free"? split bookie and broker to other machines? ---- 2018-06-02 07:36:07 UTC - Sijie Guo: sorry linux has command “free” to show used and available memory ---- 2018-06-02 07:36:38 UTC - Vasily Yanov: ah, gotcha ---- 2018-06-02 07:37:42 UTC - Vasily Yanov: ``` root@prod-pulsar-01 ~ # free total used free shared buff/cache available Mem: 32666544 20651220 258036 1508 11757288 11542700 Swap: 33521660 0 33521660 ``` ---- 2018-06-02 07:38:45 UTC - Vasily Yanov: ``` root@prod-pulsar-02 ~ # free total used free shared buff/cache available Mem: 32666536 21445608 292344 1512 10928584 10748200 Swap: 33521660 0 33521660 ``` ---- 2018-06-02 07:39:00 UTC - Vasily Yanov: ``` root@prod-pulsar-03 ~ # free total used free shared buff/cache available Mem: 32664332 22783900 2350256 1508 7530176 9407936 Swap: 33521660 0 33521660 ``` ---- 2018-06-02 07:39:14 UTC - Vasily Yanov: ``` root@prod-pulsar-04 ~ # free total used free shared buff/cache available Mem: 32664600 11444024 2317948 1520 18902628 20748188 Swap: 33521660 0 33521660 ``` ---- 2018-06-02 07:39:27 UTC - Sijie Guo: one last thing - the jvm graph that you show is that for brokers or bookies? ---- 2018-06-02 07:41:24 UTC - Vasily Yanov: for brokers ---- 2018-06-02 07:41:40 UTC - Vasily Yanov: for bookies we haven't it ---- 2018-06-02 07:41:58 UTC - Sijie Guo: oh okay. ---- 2018-06-02 08:05:51 UTC - Sijie Guo: @Igor Zubchenok @Vasily Yanov here is my suggestion: since your pattern is you create/delete topic often, so client batching won’t help. this traffic pattern is connections/request dominated. those objects are using heap memory. that can be confirmed from the broker jvm stats, heap is used around 6GB and non-heap is around 100MB. from the bookie’s metric, bookie write cache is never beyond 200mb, however read cache is constantly at 256MB limitation. and both write and read latency are kind of high at p99, so I am suspecting this comes from small read cache. so what I would suggest is to tune the jvm settings for heap (due to your traffic pattern): for broker: increase the heap size and reduce the direct memory -> -Xms8g -Xmx10g -XX:MaxDirectMemorySize=2g for bookie: increase the heap size and reduce the direct memory -> -Xms8g -Xmx10g -XX:MaxDirectMemorySize=4g for this, the max amount memory used for jvm would be 10+10+2+4 = 26, since you have 36 gb ram, you will have 6gb for os and fs (which is good enough since your traffic is not bandwidth dominate). for bookie, adjust following settings: increase `dbStorage_readAheadCacheMaxSizeMb` from 256 to 512. reduce `dbStorage_readAheadCacheBatchSize` from 1000 to 10 (since you have many ledgers) +1 : Ali Ahmed ---- 2018-06-02 08:07:42 UTC - Sijie Guo: those default settings are kind of optimized for small/medium number of ledgers and each ledger has high throughput. in your case, it is kind of reverse, so what we need to optimize is giving more heap memory rather than direct memory and tune some cache behaviors. ---- 2018-06-02 08:08:36 UTC - Vasily Yanov: @Sijie Guo thank you for such detailed reply. will try your advice right now ---- 2018-06-02 08:13:46 UTC - Ali Ahmed: one general comment creating deleting large number of topics frequently is not consistent with general pub sub patterns, why was such a design choses ? ---- 2018-06-02 08:24:49 UTC - Vasily Yanov: another one crazy issue: we have 4 nodes with 3.3.2 managed ledger configuration and when I try to restart at least one node all cluster becomes unresponsive and our application cannot work anymore. is it by design or something wrong with our cluster? ---- 2018-06-02 08:36:02 UTC - Sijie Guo: no. I think it might be related to your traffic pattern. how many namespace do you have? one or many? and how many topics per namespace? what I am suspeting here is you don’t override the number bundles of a namespace, so by default it has 4 bundles, but since your traffic is probably not high enough to trigger bundle split. then each bundle has almost 10k topics, when you kick restart one node, you are shifting one huge bundle from one broker to the other broker, it has to close topics and open topics, which might be a churn to metadata store (zk). let me know the number of namespaces and topics/namespace, so I can help how to optimize for your use case. ---- 2018-06-02 08:39:37 UTC - Igor Zubchenok: We use 192 bundles per namespace. ---- 2018-06-02 08:41:08 UTC - Igor Zubchenok: 3 namespaces: 1 has a few topics, two other have similar number of topics: ~20K + ~20K ---- 2018-06-02 08:53:06 UTC - Tom Liversidge: @Tom Liversidge has joined the channel ----