Also, I am wondering if this issue is actually fixed in 0.10.0: https://issues.apache.org/jira/browse/STORM-292 What do you guys think?
--John On Sat, Jan 30, 2016 at 5:53 PM, John Yost <[email protected]> wrote: > Hi Kashyap, > > Question--what percentage of time is spent in Kryo deserialization and how > much in LMAX disruptor? > > --John > > On Sat, Jan 30, 2016 at 5:18 PM, Kashyap Mhaisekar <[email protected]> > wrote: > >> That is right. But for a decently well written code, disruptor is almost >> always the CPU hogger. That said, on the issue b of emits taking time, we >> found that the size of emitted object matters. Kryo times for serializing >> and deserialization increases with size. >> >> But does size have a correlation with disruptor showing up big time in >> profiling? >> >> Thanks >> Kashyap >> Kashyap, >> >> It is only expected to see the Disruptor dominating CPU time. It is the >> object responsible for sending/receiving tuples (at least when you have >> tuples produced by one executor thread for another executor thread on the >> same machine). Therefore, it is expected to see Disruptor having something >> like ~80% of the time. >> >> A nice experiment to check my statement above is to create a Bolt that >> for every tuple it receives, it performs a random CPU task (like nested for >> loops) and it emits a tuple only after receiving X number of tuples, where >> X > 1. Then, I expect that you will see the percentage of CPU time for the >> Disruptor object to drop. >> >> Cheers, >> Nick >> >> On Sat, Jan 30, 2016 at 3:40 PM, Kashyap Mhaisekar <[email protected]> >> wrote: >> >>> John, Nick >>> Thanks for broaching this topic. In my case, 1 tuple from spout gives >>> out 200 more tuples. I too see the same class listed in VisualVM >>> profiling... And tried bringing this down... I reduced parallelism hints, >>> played with buffers, changed lmax strategies, changed max spout pending... >>> Nothing seems to have an impact >>> >>> Any ideas on what could be done for this? >>> >>> Thanks >>> Kashyap >>> Hello John, >>> >>> First off, let us agree on your definition of throughput. Do you define >>> throughput as the average number of tuples each of your last bolts (sinks) >>> emit per second? If yes, then OK. Otherwise, please provide us with more >>> details. >>> >>> Going back to the BlockingWaitStrategy observation you have, it (most >>> probably) means that since you are producing a large number of tuples >>> (15-20 tuples) the outgoing Disruptor queue gets full, and the emit() >>> function blocks. Also, since you are anchoring tuples (that might mean >>> exactly-once semantics), it basically takes more time to place something in >>> the queue, in order to guarantee deliver of all tuples to a downstream >>> bolt. >>> >>> Therefore, it makes sense to see so much time spent in the LMAX >>> messaging layer. A good experiment to verify your hypothesis, is to not >>> anchor tuples, and profile your topology again. However, I am not sure that >>> you will see a much different percentage, since for every tuple you are >>> receiving, you have at least one call to the Disruptor layer. Maybe in your >>> case (if I got it correctly from your description), you should have one >>> call every N tuples, where N is the size of your bin in tuples. Right? >>> >>> I hope I helped with my comments. >>> >>> Cheers, >>> Nick >>> >>> On Sat, Jan 30, 2016 at 12:16 PM, John Yost <[email protected]> >>> wrote: >>> >>>> Hi Everyone, >>>> >>>> I have a large fan-out that I've posted questions about before with the >>>> following new, updated info: >>>> >>>> 1. Incoming tuple to Bolt A produces 15-20 tuples >>>> 2. Bolt A emits to Bolt B via fieldsGrouping >>>> 3. I cache outgoing tuples in bins within Bolt A and then emit anchored >>>> tuples to Bolt B with the OutputCollector *emit >>>> <http://storm.apache.org/apidocs/backtype/storm/task/OutputCollector.html#emit(java.util.Collection,%20java.util.List)>* >>>> (Collection >>>> <http://docs.oracle.com/javase/6/docs/api/java/util/Collection.html?is-external=true> >>>> <Tuple >>>> <http://storm.apache.org/apidocs/backtype/storm/tuple/Tuple.html> >>>> > anchors, List >>>> <http://docs.oracle.com/javase/6/docs/api/java/util/List.html?is-external=true> >>>> <Object >>>> <http://docs.oracle.com/javase/6/docs/api/java/lang/Object.html?is-external=true> >>>> > tuple) method >>>> 4. I have throughput where I need it to be if I just receive tuples in >>>> Bolt B, ack, and drop. If I do actual processing in Bolt B, throughput >>>> degrades a bunch. >>>> 5. I profiled the Bolt B worker yesterday and see that over 90% is >>>> spent in com.lmax.disruptor.BlockingWaitStrategy--irrespective if I >>>> drop the tuples or process in Bolt B >>>> >>>> I am wondering if the acking of the anchor tuples is what's resulting >>>> in so much time spent in the LMAX messaging layer. What do y'all think? >>>> Any ideas appreciated as always. >>>> >>>> Thanks! :) >>>> >>>> --John >>>> >>> >>> >>> >>> -- >>> Nick R. Katsipoulakis, >>> Department of Computer Science >>> University of Pittsburgh >>> >> >> >> >> -- >> Nick R. Katsipoulakis, >> Department of Computer Science >> University of Pittsburgh >> > >
