Re: Acking of anchor tuple list decreases throughput?

Nick R. Katsipoulakis Sat, 30 Jan 2016 19:12:24 -0800

Hello all,

There is a back pressure mechanism in v1.0? Other than the max spout
pending mechanism?
I did not know that and I will be glad to put it to a test.


Nick

On Saturday, January 30, 2016, P. Taylor Goetz <[email protected]> wrote:

> Interesting conversation.
>
> The back pressure mechanism in 1.0 should help.
>
> Do you guys have environments that you could test that in?
>
> Better yet, do you have code to share?
>
> -Taylor
>
> On Jan 30, 2016, at 9:05 PM, [email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');> wrote:
>
> Hey Kashyap,
>
> Excellent points, especially regarding compression. I've thought about
> trying compression, and your results indicate that's worth a shot.
>
> Also, I concur on fields grouping, especially with a dramatic fan-out
> followed by a fan-in, which is what I am currently working with.
>
> Sure glad I started this thread today because both you and Nick have
> shared lots of excellent thoughts--much appreciated, and thanks to you both!
>
> --John
>
> Sent from my iPhone
>
> On Jan 30, 2016, at 7:34 PM, Kashyap Mhaisekar <[email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>
> John, Nick
> I don't have direct answers but here is one test I did based on which I
> concluded that tuple size does matter.
> My use case was like this -
> Spout S emits a number *X* (say 1 or 100 or 1024 etc) -> Bolt A (Which
> generates a string of *X*kb and emits it out 200 times) -> Bolt C (Bolt
> see just prints the the length of the string). All are shuffle grouped and
> no limits on max spout pending.
>
> As you notice, this is a pretty straight topology with really nothing much
> in this except emitting out Strings of varying sizes.
>
> With increase in the size, i notice that the throughput (No. of acks on
> spout divided by total time taken) decreases. The test was done on 1
> machine so that network can be ruled out. The only things in play here are
> the LMAX and Kryo (de)serialization.
>
> Another test - if Bolt C was field grouped on X, then i see that the
> performance drops much further, probably because all the desrialization is
> being done on instance of the bolt AND also because the queues are filled
> up.
>
> This being said, when I compressed the emits from Bolt A (Use Snappy
> compression), I see that the throuput increases drastically. - I interpret
> this as the reduction in size due to compression has improved throughput).
>
> I unfortunately have not checked VisualVM at the time..
>
> Hope this helps.
>
> Thanks
> Kashyap
> On Sat, Jan 30, 2016 at 4:54 PM, John Yost <[email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>
>> Also, I am wondering if this issue is actually fixed in 0.10.0:
>> https://issues.apache.org/jira/browse/STORM-292  What do you guys think?
>>
>> --John
>>
>> On Sat, Jan 30, 2016 at 5:53 PM, John Yost <[email protected]
>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>>
>>> Hi Kashyap,
>>>
>>> Question--what percentage of time is spent in Kryo deserialization and
>>> how much in LMAX disruptor?
>>>
>>> --John
>>>
>>> On Sat, Jan 30, 2016 at 5:18 PM, Kashyap Mhaisekar <[email protected]
>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>>>
>>>> That is right. But for a decently well written code, disruptor is
>>>> almost always the CPU hogger. That said, on the issue b of emits taking
>>>> time, we found that the size of emitted object matters. Kryo times for
>>>> serializing and deserialization increases with size.
>>>>
>>>> But does size have a correlation with disruptor showing up big time in
>>>> profiling?
>>>>
>>>> Thanks
>>>> Kashyap
>>>> Kashyap,
>>>>
>>>> It is only expected to see the Disruptor dominating CPU time. It is the
>>>> object responsible for sending/receiving tuples (at least when you have
>>>> tuples produced by one executor thread for another executor thread on the
>>>> same machine). Therefore, it is expected to see Disruptor having something
>>>> like ~80% of the time.
>>>>
>>>> A nice experiment to check my statement above is to create a Bolt that
>>>> for every tuple it receives, it performs a random CPU task (like nested for
>>>> loops) and it emits a tuple only after receiving X number of tuples, where
>>>> X > 1. Then, I expect that you will see the percentage of CPU time for the
>>>> Disruptor object to drop.
>>>>
>>>> Cheers,
>>>> Nick
>>>>
>>>> On Sat, Jan 30, 2016 at 3:40 PM, Kashyap Mhaisekar <[email protected]
>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>>>>
>>>>> John, Nick
>>>>> Thanks for broaching this topic. In my case, 1 tuple from spout gives
>>>>> out 200 more tuples. I too see the same class listed in VisualVM
>>>>> profiling... And tried bringing this down... I reduced parallelism hints,
>>>>> played with buffers, changed lmax strategies, changed max spout pending...
>>>>> Nothing seems to have an impact
>>>>>
>>>>> Any ideas on what could be done for this?
>>>>>
>>>>> Thanks
>>>>> Kashyap
>>>>> Hello John,
>>>>>
>>>>> First off, let us agree on your definition of throughput. Do you
>>>>> define throughput as the average number of tuples each of your last bolts
>>>>> (sinks) emit per second? If yes, then OK. Otherwise, please provide us 
>>>>> with
>>>>> more details.
>>>>>
>>>>> Going back to the BlockingWaitStrategy observation you have, it (most
>>>>> probably) means that since you are producing a large number of tuples
>>>>> (15-20 tuples) the outgoing Disruptor queue gets full, and the emit()
>>>>> function blocks. Also, since you are anchoring tuples (that might mean
>>>>> exactly-once semantics), it basically takes more time to place something 
>>>>> in
>>>>> the queue, in order to guarantee deliver of all tuples to a downstream
>>>>> bolt.
>>>>>
>>>>> Therefore, it makes sense to see so much time spent in the LMAX
>>>>> messaging layer. A good experiment to verify your hypothesis, is to not
>>>>> anchor tuples, and profile your topology again. However, I am not sure 
>>>>> that
>>>>> you will see a much different percentage, since for every tuple you are
>>>>> receiving, you have at least one call to the Disruptor layer. Maybe in 
>>>>> your
>>>>> case (if I got it correctly from your description), you should have one
>>>>> call every N tuples, where N is the size of your bin in tuples. Right?
>>>>>
>>>>> I hope I helped with my comments.
>>>>>
>>>>> Cheers,
>>>>> Nick
>>>>>
>>>>> On Sat, Jan 30, 2016 at 12:16 PM, John Yost <[email protected]
>>>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>>>>>
>>>>>> Hi Everyone,
>>>>>>
>>>>>> I have a large fan-out that I've posted questions about before with
>>>>>> the following new, updated info:
>>>>>>
>>>>>> 1. Incoming tuple to Bolt A produces 15-20 tuples
>>>>>> 2. Bolt A emits to Bolt B via fieldsGrouping
>>>>>> 3. I cache outgoing tuples in bins within Bolt A and then emit
>>>>>> anchored tuples to Bolt B with the OutputCollector *emit
>>>>>> <http://storm.apache.org/apidocs/backtype/storm/task/OutputCollector.html#emit(java.util.Collection,%20java.util.List)>*
>>>>>> (Collection
>>>>>> <http://docs.oracle.com/javase/6/docs/api/java/util/Collection.html?is-external=true>
>>>>>> <Tuple
>>>>>> <http://storm.apache.org/apidocs/backtype/storm/tuple/Tuple.html>
>>>>>> > anchors, List
>>>>>> <http://docs.oracle.com/javase/6/docs/api/java/util/List.html?is-external=true>
>>>>>> <Object
>>>>>> <http://docs.oracle.com/javase/6/docs/api/java/lang/Object.html?is-external=true>
>>>>>> > tuple) method
>>>>>> 4. I have throughput where I need it to be if I just receive tuples
>>>>>> in Bolt B, ack, and drop. If I do actual processing in Bolt B, throughput
>>>>>> degrades a bunch.
>>>>>> 5. I profiled the Bolt B worker yesterday and see that over 90% is
>>>>>> spent in com.lmax.disruptor.BlockingWaitStrategy--irrespective if I
>>>>>> drop the tuples or process in Bolt B
>>>>>>
>>>>>> I am wondering if the acking of the anchor tuples is what's resulting
>>>>>> in so much time spent in the LMAX messaging layer.  What do y'all think?
>>>>>> Any ideas appreciated as always.
>>>>>>
>>>>>> Thanks! :)
>>>>>>
>>>>>> --John
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Nick R. Katsipoulakis,
>>>>> Department of Computer Science
>>>>> University of Pittsburgh
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Nick R. Katsipoulakis,
>>>> Department of Computer Science
>>>> University of Pittsburgh
>>>>
>>>
>>>
>>
>

-- 
Nick R. Katsipoulakis,
Department of Computer Science
University of Pittsburgh

Re: Acking of anchor tuple list decreases throughput?

Reply via email to