Re: Performance of Multi-Lang protocol

Jungtaek Lim Sun, 14 May 2017 21:15:26 -0700

Zhechao,

Could you please link the mail regarding python shell bolt performance
issue if you can find it from archive?


Thanks in advance!
Jungtaek Lim (HeartSaVioR)

2017년 5월 15일 (월) 오후 12:37, Zhechao Ma <[email protected]>님이 작성:

> I started to use storm with python since storm 0.9.2, and I'm concerned
> about multi-lang performance improvement.
>
> There is a pull request (https://github.com/apache/storm/pull/1136) for
> multi-lang perfromance improvements opened a year ago, but has not been
> merged yet. It uses MessagePackSerializer to repalce the default JSON
> Serializer.
>
> Also, there was  a mail mentioning python shell bolt performance issue on
> 2016/1/3. A benchmark result of Msgpack was given out in that mail.
>
> I agree with @HeartSaVioR to do python optimization first.
>
>
> 2017-05-13 13:23 GMT+08:00 Jungtaek Lim <[email protected]>:
>
>> I'd like to see other multi-lang users' voice as well.
>>
>> I guess many users are using Streamparse, so the users of Streamparse may
>> be able report how much the performance difference is. If Streamparse uses
>> non-default serde to reduce the performance hit, Storm could even use it to
>> the default serde, but that requires breaking backward compatibility.
>>
>> Btw, IMHO, it might be considerable to focus less languages for
>> optimization, like supporting only Python (as data scientists are familiar
>> with it) as second language and trying to apply python-specific
>> optimization. We also may need to support non-Java language for new Streams
>> API, and it might not easy to support it with current multi-lang approach.
>> PySpark-like approach would be reasonable.
>>
>> We could still support multi-lang, but without outstanding improvement.
>>
>> Would like to hear opinions on my proposal, too.
>>
>> - Jungtaek Lim (HeartSaVioR)
>>
>> 2017년 5월 13일 (토) 오전 9:46, Mauro Giusti <[email protected]>님이 작성:
>>
>>> *My PC:*
>>>
>>> My PC is a 8 Core Xeon E5 with 16 GB of RAM, when the test starts, I
>>> only have 8 GB of memory occupied.
>>>
>>> I increased the memory of the Java VM to 4 GB and it only uses 1 GB when
>>> the test runs.
>>>
>>>
>>>
>>> *The Topology:*
>>>
>>> On my PC, I have three Spouts in mono, and one Bolt in mono.
>>>
>>> The topology is described in Flux – so I have basically zero code in
>>> Java, all in Flux .yaml + .Net with mono.
>>>
>>> All the messages use SHUFFLE and there is one worker only (my PC)
>>>
>>>
>>>
>>> I run in local mode and I also have a Docker container where I deployed
>>> this.
>>>
>>>
>>>
>>> *Topology details:*
>>>
>>> The Spouts read from an internal service, I collect about 60/70,000
>>> records each minute.
>>>
>>>
>>>
>>> The Bolt reads from the three Spouts and makes aggregation in memory
>>> using SqlLite, the records are added to SqlLite as they arrive, then every
>>> 30 seconds SqlLite runs an aggregation and emits the data to an instance of
>>> Redis cache (via another Bolt hop).
>>>
>>>
>>>
>>> To test with Java, I replaced the Bolt with a simple Java Bolt that was
>>> only logging every 10,000 records.
>>>
>>> To compare with Mono, I created an empty .net Bolt and did the same.
>>>
>>>
>>>
>>> *My Tests:*
>>>
>>> The Flux topology is attached.
>>>
>>> The Java class I used to test and the .Net Bolt are as well.
>>>
>>> Again, the Spouts are .Net classes that emits 65K rows per minute.
>>>
>>>
>>>
>>> The log files are attached, you can see how much time it takes for the
>>> Bolt to consume 10,000 records –
>>>
>>> Inter-Language.txt is on my PC using the mono debug bolt, each 10,000
>>> records takes around 4.5 seconds.
>>>
>>> The Java.txt is on my PC using Java (TransformEchoBolt.Java), each
>>> 10,000 records takes around 0.7 seconds.
>>>
>>> The Linux.txt is on the Docker container (still on my PC but using
>>> Docker for Windows in Linux Dockers mode), using mono but on Linux this
>>> time - the results are compatible with Mono on Windows (4.5 seconds per
>>> 10.000 records).
>>>
>>> I also tried calling directly the Windows exe on Windows in local mode,
>>> bypassing mono – the results were not pretty: 15 seconds per 10,000 records
>>> (NetExe.txt)
>>>
>>>
>>>
>>> *Results:*
>>>
>>> I know I can scale out and partition the data, but the amount of
>>> processing did not seem to require that –
>>>
>>>
>>>
>>> Maybe one issue is that the object I am moving has 11 fields?
>>>
>>>
>>>
>>> I can try to create a mini-repro if the dev team is interested –
>>> hopefully this might find what the bottleneck is -
>>>
>>>
>>>
>>> Thanks for your attention -
>>>
>>> Mauro.
>>>
>>>
>>>
>>> *From:* P. Taylor Goetz [mailto:[email protected]]
>>> *Sent:* Friday, May 12, 2017 4:55 PM
>>> *To:* [email protected]; [email protected]
>>> *Subject:* Re: Performance of Multi-Lang protocol
>>>
>>>
>>>
>>> Adding dev@ mailing list...
>>>
>>>
>>>
>>> There is definitely a performance hit. But it shouldn't be as drastic as
>>> you describe.
>>>
>>>
>>>
>>> Can you share some of your environment characteristics?
>>>
>>>
>>>
>>> I've been looking at the Apache Arrow project (full disclosure: I'm a
>>> PMC member) as a means for improved performance (it essentially would
>>> remove the performance hit for serialize/deserialize operations). This is
>>> particularly relevant to multi-lang, but could also apply to same-machine
>>> inter-worker communication.
>>>
>>>
>>>
>>> At this point I don't feel Arrow is at a production level maturity, but
>>> is getting close. I definitely feel it's worth exploring at PoC level.
>>>
>>>
>>>
>>> -Taylor
>>>
>>>
>>> On May 12, 2017, at 6:56 PM, Mauro Giusti <[email protected]> wrote:
>>>
>>> Hi –
>>>
>>> We are using multi-lang to pass data between storm and mono –
>>>
>>>
>>>
>>> We observe a 6x time increase when messages go from spout to bolt if the
>>> bolt is in mono vs. being in Java –
>>>
>>>
>>>
>>> Java can process 10,000 records in 0.7 seconds, while mono requires 4.5
>>> seconds.
>>>
>>> The mono bolt was an empty one created with Storm.Net.Adapter
>>> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fziyunhx%2Fstorm-net-adapter&data=02%7C01%7Cmaurgi%40microsoft.com%7Cc1d9c2b13bab4297b2b508d499924f9d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636302300991869578&sdata=kaE4OjEttJv0KuGcwdUoJA%2BBDXIO1qvyv65S%2BBpMM%2F0%3D&reserved=0>
>>> library
>>>
>>>
>>>
>>> This is on a single machine topology – we are still in dev phase and
>>> using this solution for now -
>>>
>>>
>>>
>>> Is this expected?
>>>
>>> Should we try to minimize multi-lang and inter-process or is this a
>>> problem with my specific scenario (mono and/or single machine) ?
>>>
>>>
>>>
>>> Thank you –
>>>
>>> Mauro.
>>>
>>>
>
>
> --
> Thanks
> Zhechao Ma
>

Re: Performance of Multi-Lang protocol

Reply via email to