writing huge amount of data to HDFS

Chen Wang Fri, 11 Jul 2014 11:59:37 -0700

Hi, Guys,
I have a storm topology, with a single thread bolt querying large amount of
data (From elasticsearch), and emit to a HBase bolt(10 threads), doing some
filtering, then emit to Arvo bolt.(10threads) The arvo bolt simply emit the
tuple to arvo client, which will be received by two flume node and then
sink into hdfs. I am testing in local mode.


In the query bolt, i am  getting around 15000 entries in a batch, the query
itself takes about 4second, however, he emit method in the query bolt takes
about 20 seconds. Does it mean that
the downstream bolt(HBaseBolt and Avro bolt) cannot catch up with the query
bolt?

How can I tune my topology to make this process as fast as possible? I
tried to increase the HBase thread to 20 but it does not seem to help.

I use shuffleGrouping from query bolt to hbase bolt, and from hbase bolt to
avro.

Thanks for any advice.
Chen

writing huge amount of data to HDFS

Reply via email to