I'm not very surprised. See for example published single machine benchmarks (iirc 1.6 million tuples / s is the official figure from Nathan Marz though that figure is a little old). This is spout to bolt and matches my observations for trivial cases. With some processing logic and only one spout I can see how it's lower.
You can reduce the overhead by batching your work differently, eg by doing more work in each call to nextTuple. On May 12, 2015 4:56 AM, "Matthias J. Sax" <[email protected]> wrote: > Can you share your code? > > Do you process a single tuple each time nextTuple() is called? If a > spout does not emit anything, Storm applies a waiting-penalty to avoid > busy waiting. That might slow down your code. > > You can configure the waiting strategy: > https://storm.apache.org/2012/09/06/storm081-released.html > > -Matthias > > > On 05/12/2015 09:31 AM, Daniel Compton wrote: > > I'm also interested on the answers to this question, but to add to the > > discussion, take a look at > > > http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html > . > > I suspect Storm is still introducing coordination overhead even running > > on a single machine. > > On Tue, 12 May 2015 at 1:39 pm [email protected] > > <mailto:[email protected]> <[email protected] > > <mailto:[email protected]>> wrote: > > > > __ > > Hi and thanks . > > > > I'm working on a parrallel algorithm, which is to count massive > > items in data streams. The previous researches on the parallelism of > > this algorithm were focusing on muti-core CPU, however, I want to > > take advantage of Storm. > > > > Processing latency is extremly important for this algorithm, and I > > did some evaluation of the perfomance. > > > > Firstly, I implemented the algorithm in java(one thread, with no > > parallelism) and I get the performance : it could process 3 million > > items per second. > > > > Secondly, I wrapped this implement of the algorithm into Storm(just > > one Spout to process) and I get the perfomance: it could process > > only 0.75 million items per second. I changes a little bit of my > > impletment to adapt Storm structure, but in the end the perfomance > > is still not good.... > > > > ps. I didn't take the network overhead into consideration because I > > just run the program in the single Spout node so that there is no > > emit or transfer.(so I don't care how storm emits messages between > > nodes for now ) The program on Spout is actually doing the same > > thing as the former one.(I just copy the program into the > > NextTuple() method with some necessary changes) > > > > 1. The degration(1/4 of the speed) is inevitable? > > 2. What incurred the degration? > > 3. How can I reduce the degration? > > > > Thank you all. > > > > > ------------------------------------------------------------------------ > > [email protected] <mailto:[email protected]> > > > >
