Re: writing huge amount of data to HDFS

Chen Wang Fri, 11 Jul 2014 14:16:32 -0700

here you go:
https://gist.github.com/cynosureabu/b317646d5c475d0d2e42
Its actually pretty straight forward. The only thing worth of mention is
that I use another thread in the ES bolt to do the actual query and tuple
emit.
Thanks for looking.
Chen




On Fri, Jul 11, 2014 at 1:18 PM, Sam Goodwin <[email protected]>
wrote:

> Can you show some code? 200 seconds for 15K puts sounds like you're not
> batching.
>
>
> On Fri, Jul 11, 2014 at 12:47 PM, Chen Wang <[email protected]>
> wrote:
>
>> typo in previous email
>> The emit method in the query bolt takes about 200(instead of 20) seconds..
>>
>>
>> On Fri, Jul 11, 2014 at 11:58 AM, Chen Wang <[email protected]>
>> wrote:
>>
>>> Hi, Guys,
>>> I have a storm topology, with a single thread bolt querying large amount
>>> of data (From elasticsearch), and emit to a HBase bolt(10 threads), doing
>>> some filtering, then emit to Arvo bolt.(10threads) The arvo bolt simply
>>> emit the tuple to arvo client, which will be received by two flume node and
>>> then sink into hdfs. I am testing in local mode.
>>>
>>> In the query bolt, i am  getting around 15000 entries in a batch, the
>>> query itself takes about 4second, however, he emit method in the query bolt
>>> takes about 20 seconds. Does it mean that
>>> the downstream bolt(HBaseBolt and Avro bolt) cannot catch up with the
>>> query bolt?
>>>
>>> How can I tune my topology to make this process as fast as possible? I
>>> tried to increase the HBase thread to 20 but it does not seem to help.
>>>
>>> I use shuffleGrouping from query bolt to hbase bolt, and from hbase bolt
>>> to avro.
>>>
>>> Thanks for any advice.
>>> Chen
>>>
>>
>>
>

Re: writing huge amount of data to HDFS

Reply via email to