Re: How to maximize the throughput

Hiroyuki Yamada Thu, 30 Jan 2020 18:25:17 -0800

Hello Sijie and Penghui,

Thank you very much for the prompt response and useful information.
Let me look through them and try out again.


Thanks,
Hiro

On Fri, Jan 31, 2020 at 11:09 AM PengHui Li <[email protected]> wrote:
>
> Hi @Hiro,
>
> I've written 2 post for Apache Pulsar performance tuning.
> However, it has not been officially released yet, and it should be released 
> soon.
>
> Here is the post links:
>
> https://docs.google.com/document/d/1BTLSMRCP13CCOWbGe_Nre82HuotWStM91nfe9RK46kI/edit?usp=sharing
> https://docs.google.com/document/d/11EQOXReu4DNSNRL9rn1DD0yzXN1cWWcEYsGoCKSrJdM/edit?usp=sharing
>
> Hope that can help you.
>
> Thanks,
> Penghui
>
> Sijie Guo <[email protected]> 于2020年1月31日周五 上午7:50写道：
>>
>> + Penghui Li
>>
>> Penghui did a lot of performance tests before. Based on my memory, he was 
>> able to do 1.5 million messages/second (100 bytes) with batching and 500k 
>> messages/second (100 bytes) without batching.
>>
>> --
>>
>> Took a quick glance at your tests. It seems that you are using synchronous 
>> send for testing. This doesn't leverage batching in an efficient way.
>>
>> I would recommend using pulsar's `pulsar-perf` tool to get a baseline result 
>> to understand your environment (especially your disks) first.
>>
>> Thanks,
>> Sijie
>>
>>
>> On Thu, Jan 30, 2020 at 2:48 AM Hiroyuki Yamada <[email protected]> wrote:
>>>
>>> Hello,
>>>
>>> I've been doing some micro-benchmarking with Pulsar and
>>> wondering how to maximize the throughput.
>>>
>>> I can get about 30000 messages/s with 9 ms latency with 256 client
>>> threads and 1K message size,
>>> but  can not get more than that even though I increased the number of
>>> client threads
>>> even though it doesn't fully utilize CPU resource nor disk bandwidth or 
>>> IOPS.
>>> (about 50% CPU usage, 34MB/s out of 120 MB/s for journal disk, ledger
>>> disk is mostly not active)
>>>
>>> Is it an expected result under the environment shown below ?
>>> I'm wondering if there is some configuration to get a better
>>> throughput, or I'm doing something wrong.
>>>
>>> The environment I used is as follows.
>>> node: 1 Standard E8s v3 (8 vcpus, 64 GiB memory) in Azure
>>> mode: standalone
>>> disk: 2 disks (1 for ledger, 1 for journal) each can achieve 5000 IOPS
>>> for random I/O and 120 MB/s for sequential I/O.
>>> topic: persistent partitioned topic, # of partitions: 32
>>> config: default
>>>
>>> The program I used is here.
>>> https://github.com/feeblefakie/misc/blob/master/pulsar/src/main/java/PulsarProducerBenchmark.java
>>> (This program basically concurrently produces a specified sized record
>>> to the broker,
>>> and measures the throughput and an average latency.)
>>>
>>> It can be easily re-run if you follow the README.
>>> https://github.com/feeblefakie/misc/blob/master/pulsar/
>>>
>>> It would be great if someone can help me.
>>>
>>> Thanks,
>>> Hiro

Re: How to maximize the throughput

Reply via email to