I'm also interested on the answers to this question, but to add to the discussion, take a look at http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html. I suspect Storm is still introducing coordination overhead even running on a single machine. On Tue, 12 May 2015 at 1:39 pm [email protected] <[email protected]> wrote:
> Hi and thanks . > > I'm working on a parrallel algorithm, which is to count massive items in > data streams. The previous researches on the parallelism of this algorithm > were focusing on muti-core CPU, however, I want to take advantage of Storm. > > Processing latency is extremly important for this algorithm, and I did > some evaluation of the perfomance. > > Firstly, I implemented the algorithm in java(one thread, with no > parallelism) and I get the performance : it could process 3 million items > per second. > > Secondly, I wrapped this implement of the algorithm into Storm(just one > Spout to process) and I get the perfomance: it could process only 0.75 > million items per second. I changes a little bit of my impletment to adapt > Storm structure, but in the end the perfomance is still not good.... > > ps. I didn't take the network overhead into consideration because I just > run the program in the single Spout node so that there is no emit or > transfer.(so I don't care how storm emits messages between nodes for now > ) The program on Spout is actually doing the same thing as the former > one.(I just copy the program into the NextTuple() method with some > necessary changes) > > 1. The degration(1/4 of the speed) is inevitable? > 2. What incurred the degration? > 3. How can I reduce the degration? > > Thank you all. > > ------------------------------ > [email protected] >
