here you go: https://gist.github.com/cynosureabu/b317646d5c475d0d2e42 Its actually pretty straight forward. The only thing worth of mention is that I use another thread in the ES bolt to do the actual query and tuple emit. Thanks for looking. Chen
On Fri, Jul 11, 2014 at 1:18 PM, Sam Goodwin <[email protected]> wrote: > Can you show some code? 200 seconds for 15K puts sounds like you're not > batching. > > > On Fri, Jul 11, 2014 at 12:47 PM, Chen Wang <[email protected]> > wrote: > >> typo in previous email >> The emit method in the query bolt takes about 200(instead of 20) seconds.. >> >> >> On Fri, Jul 11, 2014 at 11:58 AM, Chen Wang <[email protected]> >> wrote: >> >>> Hi, Guys, >>> I have a storm topology, with a single thread bolt querying large amount >>> of data (From elasticsearch), and emit to a HBase bolt(10 threads), doing >>> some filtering, then emit to Arvo bolt.(10threads) The arvo bolt simply >>> emit the tuple to arvo client, which will be received by two flume node and >>> then sink into hdfs. I am testing in local mode. >>> >>> In the query bolt, i am getting around 15000 entries in a batch, the >>> query itself takes about 4second, however, he emit method in the query bolt >>> takes about 20 seconds. Does it mean that >>> the downstream bolt(HBaseBolt and Avro bolt) cannot catch up with the >>> query bolt? >>> >>> How can I tune my topology to make this process as fast as possible? I >>> tried to increase the HBase thread to 20 but it does not seem to help. >>> >>> I use shuffleGrouping from query bolt to hbase bolt, and from hbase bolt >>> to avro. >>> >>> Thanks for any advice. >>> Chen >>> >> >> >
