Likewise here, I was not aware of this planned feature. On Sun, Jan 31, 2016 at 5:12 AM Nick R. Katsipoulakis <[email protected]> wrote:
> Hello all, > > There is a back pressure mechanism in v1.0? Other than the max spout > pending mechanism? > I did not know that and I will be glad to put it to a test. > > Nick > > > On Saturday, January 30, 2016, P. Taylor Goetz <[email protected]> wrote: > >> Interesting conversation. >> >> The back pressure mechanism in 1.0 should help. >> >> Do you guys have environments that you could test that in? >> >> Better yet, do you have code to share? >> >> -Taylor >> >> On Jan 30, 2016, at 9:05 PM, [email protected] wrote: >> >> Hey Kashyap, >> >> Excellent points, especially regarding compression. I've thought about >> trying compression, and your results indicate that's worth a shot. >> >> Also, I concur on fields grouping, especially with a dramatic fan-out >> followed by a fan-in, which is what I am currently working with. >> >> Sure glad I started this thread today because both you and Nick have >> shared lots of excellent thoughts--much appreciated, and thanks to you both! >> >> --John >> >> Sent from my iPhone >> >> On Jan 30, 2016, at 7:34 PM, Kashyap Mhaisekar <[email protected]> >> wrote: >> >> John, Nick >> I don't have direct answers but here is one test I did based on which I >> concluded that tuple size does matter. >> My use case was like this - >> Spout S emits a number *X* (say 1 or 100 or 1024 etc) -> Bolt A (Which >> generates a string of *X*kb and emits it out 200 times) -> Bolt C (Bolt >> see just prints the the length of the string). All are shuffle grouped and >> no limits on max spout pending. >> >> As you notice, this is a pretty straight topology with really nothing >> much in this except emitting out Strings of varying sizes. >> >> With increase in the size, i notice that the throughput (No. of acks on >> spout divided by total time taken) decreases. The test was done on 1 >> machine so that network can be ruled out. The only things in play here are >> the LMAX and Kryo (de)serialization. >> >> Another test - if Bolt C was field grouped on X, then i see that the >> performance drops much further, probably because all the desrialization is >> being done on instance of the bolt AND also because the queues are filled >> up. >> >> This being said, when I compressed the emits from Bolt A (Use Snappy >> compression), I see that the throuput increases drastically. - I interpret >> this as the reduction in size due to compression has improved throughput). >> >> I unfortunately have not checked VisualVM at the time.. >> >> Hope this helps. >> >> Thanks >> Kashyap >> On Sat, Jan 30, 2016 at 4:54 PM, John Yost <[email protected]> wrote: >> >>> Also, I am wondering if this issue is actually fixed in 0.10.0: >>> https://issues.apache.org/jira/browse/STORM-292 What do you guys think? >>> >>> --John >>> >>> On Sat, Jan 30, 2016 at 5:53 PM, John Yost <[email protected]> wrote: >>> >>>> Hi Kashyap, >>>> >>>> Question--what percentage of time is spent in Kryo deserialization and >>>> how much in LMAX disruptor? >>>> >>>> --John >>>> >>>> On Sat, Jan 30, 2016 at 5:18 PM, Kashyap Mhaisekar <[email protected] >>>> > wrote: >>>> >>>>> That is right. But for a decently well written code, disruptor is >>>>> almost always the CPU hogger. That said, on the issue b of emits taking >>>>> time, we found that the size of emitted object matters. Kryo times for >>>>> serializing and deserialization increases with size. >>>>> >>>>> But does size have a correlation with disruptor showing up big time in >>>>> profiling? >>>>> >>>>> Thanks >>>>> Kashyap >>>>> Kashyap, >>>>> >>>>> It is only expected to see the Disruptor dominating CPU time. It is >>>>> the object responsible for sending/receiving tuples (at least when you >>>>> have >>>>> tuples produced by one executor thread for another executor thread on the >>>>> same machine). Therefore, it is expected to see Disruptor having something >>>>> like ~80% of the time. >>>>> >>>>> A nice experiment to check my statement above is to create a Bolt that >>>>> for every tuple it receives, it performs a random CPU task (like nested >>>>> for >>>>> loops) and it emits a tuple only after receiving X number of tuples, where >>>>> X > 1. Then, I expect that you will see the percentage of CPU time for the >>>>> Disruptor object to drop. >>>>> >>>>> Cheers, >>>>> Nick >>>>> >>>>> On Sat, Jan 30, 2016 at 3:40 PM, Kashyap Mhaisekar < >>>>> [email protected]> wrote: >>>>> >>>>>> John, Nick >>>>>> Thanks for broaching this topic. In my case, 1 tuple from spout gives >>>>>> out 200 more tuples. I too see the same class listed in VisualVM >>>>>> profiling... And tried bringing this down... I reduced parallelism hints, >>>>>> played with buffers, changed lmax strategies, changed max spout >>>>>> pending... >>>>>> Nothing seems to have an impact >>>>>> >>>>>> Any ideas on what could be done for this? >>>>>> >>>>>> Thanks >>>>>> Kashyap >>>>>> Hello John, >>>>>> >>>>>> First off, let us agree on your definition of throughput. Do you >>>>>> define throughput as the average number of tuples each of your last bolts >>>>>> (sinks) emit per second? If yes, then OK. Otherwise, please provide us >>>>>> with >>>>>> more details. >>>>>> >>>>>> Going back to the BlockingWaitStrategy observation you have, it (most >>>>>> probably) means that since you are producing a large number of tuples >>>>>> (15-20 tuples) the outgoing Disruptor queue gets full, and the emit() >>>>>> function blocks. Also, since you are anchoring tuples (that might mean >>>>>> exactly-once semantics), it basically takes more time to place something >>>>>> in >>>>>> the queue, in order to guarantee deliver of all tuples to a downstream >>>>>> bolt. >>>>>> >>>>>> Therefore, it makes sense to see so much time spent in the LMAX >>>>>> messaging layer. A good experiment to verify your hypothesis, is to not >>>>>> anchor tuples, and profile your topology again. However, I am not sure >>>>>> that >>>>>> you will see a much different percentage, since for every tuple you are >>>>>> receiving, you have at least one call to the Disruptor layer. Maybe in >>>>>> your >>>>>> case (if I got it correctly from your description), you should have one >>>>>> call every N tuples, where N is the size of your bin in tuples. Right? >>>>>> >>>>>> I hope I helped with my comments. >>>>>> >>>>>> Cheers, >>>>>> Nick >>>>>> >>>>>> On Sat, Jan 30, 2016 at 12:16 PM, John Yost <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Everyone, >>>>>>> >>>>>>> I have a large fan-out that I've posted questions about before with >>>>>>> the following new, updated info: >>>>>>> >>>>>>> 1. Incoming tuple to Bolt A produces 15-20 tuples >>>>>>> 2. Bolt A emits to Bolt B via fieldsGrouping >>>>>>> 3. I cache outgoing tuples in bins within Bolt A and then emit >>>>>>> anchored tuples to Bolt B with the OutputCollector *emit >>>>>>> <http://storm.apache.org/apidocs/backtype/storm/task/OutputCollector.html#emit(java.util.Collection,%20java.util.List)>* >>>>>>> (Collection >>>>>>> <http://docs.oracle.com/javase/6/docs/api/java/util/Collection.html?is-external=true> >>>>>>> <Tuple >>>>>>> <http://storm.apache.org/apidocs/backtype/storm/tuple/Tuple.html> >>>>>>> > anchors, List >>>>>>> <http://docs.oracle.com/javase/6/docs/api/java/util/List.html?is-external=true> >>>>>>> <Object >>>>>>> <http://docs.oracle.com/javase/6/docs/api/java/lang/Object.html?is-external=true> >>>>>>> > tuple) method >>>>>>> 4. I have throughput where I need it to be if I just receive tuples >>>>>>> in Bolt B, ack, and drop. If I do actual processing in Bolt B, >>>>>>> throughput >>>>>>> degrades a bunch. >>>>>>> 5. I profiled the Bolt B worker yesterday and see that over 90% is >>>>>>> spent in com.lmax.disruptor.BlockingWaitStrategy--irrespective if I >>>>>>> drop the tuples or process in Bolt B >>>>>>> >>>>>>> I am wondering if the acking of the anchor tuples is what's >>>>>>> resulting in so much time spent in the LMAX messaging layer. What do >>>>>>> y'all >>>>>>> think? Any ideas appreciated as always. >>>>>>> >>>>>>> Thanks! :) >>>>>>> >>>>>>> --John >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Nick R. Katsipoulakis, >>>>>> Department of Computer Science >>>>>> University of Pittsburgh >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Nick R. Katsipoulakis, >>>>> Department of Computer Science >>>>> University of Pittsburgh >>>>> >>>> >>>> >>> >> > > -- > Nick R. Katsipoulakis, > Department of Computer Science > University of Pittsburgh > >
