Also, I am wondering if this issue is actually fixed in 0.10.0:
https://issues.apache.org/jira/browse/STORM-292  What do you guys think?

--John

On Sat, Jan 30, 2016 at 5:53 PM, John Yost <[email protected]> wrote:

> Hi Kashyap,
>
> Question--what percentage of time is spent in Kryo deserialization and how
> much in LMAX disruptor?
>
> --John
>
> On Sat, Jan 30, 2016 at 5:18 PM, Kashyap Mhaisekar <[email protected]>
> wrote:
>
>> That is right. But for a decently well written code, disruptor is almost
>> always the CPU hogger. That said, on the issue b of emits taking time, we
>> found that the size of emitted object matters. Kryo times for serializing
>> and deserialization increases with size.
>>
>> But does size have a correlation with disruptor showing up big time in
>> profiling?
>>
>> Thanks
>> Kashyap
>> Kashyap,
>>
>> It is only expected to see the Disruptor dominating CPU time. It is the
>> object responsible for sending/receiving tuples (at least when you have
>> tuples produced by one executor thread for another executor thread on the
>> same machine). Therefore, it is expected to see Disruptor having something
>> like ~80% of the time.
>>
>> A nice experiment to check my statement above is to create a Bolt that
>> for every tuple it receives, it performs a random CPU task (like nested for
>> loops) and it emits a tuple only after receiving X number of tuples, where
>> X > 1. Then, I expect that you will see the percentage of CPU time for the
>> Disruptor object to drop.
>>
>> Cheers,
>> Nick
>>
>> On Sat, Jan 30, 2016 at 3:40 PM, Kashyap Mhaisekar <[email protected]>
>> wrote:
>>
>>> John, Nick
>>> Thanks for broaching this topic. In my case, 1 tuple from spout gives
>>> out 200 more tuples. I too see the same class listed in VisualVM
>>> profiling... And tried bringing this down... I reduced parallelism hints,
>>> played with buffers, changed lmax strategies, changed max spout pending...
>>> Nothing seems to have an impact
>>>
>>> Any ideas on what could be done for this?
>>>
>>> Thanks
>>> Kashyap
>>> Hello John,
>>>
>>> First off, let us agree on your definition of throughput. Do you define
>>> throughput as the average number of tuples each of your last bolts (sinks)
>>> emit per second? If yes, then OK. Otherwise, please provide us with more
>>> details.
>>>
>>> Going back to the BlockingWaitStrategy observation you have, it (most
>>> probably) means that since you are producing a large number of tuples
>>> (15-20 tuples) the outgoing Disruptor queue gets full, and the emit()
>>> function blocks. Also, since you are anchoring tuples (that might mean
>>> exactly-once semantics), it basically takes more time to place something in
>>> the queue, in order to guarantee deliver of all tuples to a downstream
>>> bolt.
>>>
>>> Therefore, it makes sense to see so much time spent in the LMAX
>>> messaging layer. A good experiment to verify your hypothesis, is to not
>>> anchor tuples, and profile your topology again. However, I am not sure that
>>> you will see a much different percentage, since for every tuple you are
>>> receiving, you have at least one call to the Disruptor layer. Maybe in your
>>> case (if I got it correctly from your description), you should have one
>>> call every N tuples, where N is the size of your bin in tuples. Right?
>>>
>>> I hope I helped with my comments.
>>>
>>> Cheers,
>>> Nick
>>>
>>> On Sat, Jan 30, 2016 at 12:16 PM, John Yost <[email protected]>
>>> wrote:
>>>
>>>> Hi Everyone,
>>>>
>>>> I have a large fan-out that I've posted questions about before with the
>>>> following new, updated info:
>>>>
>>>> 1. Incoming tuple to Bolt A produces 15-20 tuples
>>>> 2. Bolt A emits to Bolt B via fieldsGrouping
>>>> 3. I cache outgoing tuples in bins within Bolt A and then emit anchored
>>>> tuples to Bolt B with the OutputCollector *emit
>>>> <http://storm.apache.org/apidocs/backtype/storm/task/OutputCollector.html#emit(java.util.Collection,%20java.util.List)>*
>>>> (Collection
>>>> <http://docs.oracle.com/javase/6/docs/api/java/util/Collection.html?is-external=true>
>>>> <Tuple
>>>> <http://storm.apache.org/apidocs/backtype/storm/tuple/Tuple.html>
>>>> > anchors, List
>>>> <http://docs.oracle.com/javase/6/docs/api/java/util/List.html?is-external=true>
>>>> <Object
>>>> <http://docs.oracle.com/javase/6/docs/api/java/lang/Object.html?is-external=true>
>>>> > tuple) method
>>>> 4. I have throughput where I need it to be if I just receive tuples in
>>>> Bolt B, ack, and drop. If I do actual processing in Bolt B, throughput
>>>> degrades a bunch.
>>>> 5. I profiled the Bolt B worker yesterday and see that over 90% is
>>>> spent in com.lmax.disruptor.BlockingWaitStrategy--irrespective if I
>>>> drop the tuples or process in Bolt B
>>>>
>>>> I am wondering if the acking of the anchor tuples is what's resulting
>>>> in so much time spent in the LMAX messaging layer.  What do y'all think?
>>>> Any ideas appreciated as always.
>>>>
>>>> Thanks! :)
>>>>
>>>> --John
>>>>
>>>
>>>
>>>
>>> --
>>> Nick R. Katsipoulakis,
>>> Department of Computer Science
>>> University of Pittsburgh
>>>
>>
>>
>>
>> --
>> Nick R. Katsipoulakis,
>> Department of Computer Science
>> University of Pittsburgh
>>
>
>

Reply via email to