Re: Bulk-load data into HBase using Storm

Stack Wed, 04 Feb 2015 21:45:55 -0800

On Wed, Feb 4, 2015 at 10:49 AM, Jaime Solano <[email protected]> wrote:


> For a proof of concept we'll be working on, we want to bulk-load data into
> HBase, following a similar approach to the one explained here
> <
> http://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/
> >,
> but with the difference that for the HFile creation (step 2 in the
> mentioned article), we want to use Storm instead of MapReduce. That is, we
> want to bulk load data not sitting in HDFS, but probably in memory.
>
>    1. What are your thoughts about this? Is it feasible?
>    2. What challenges do you foresee?
>

Notice how the bulk load mapreduce job maintains a total order (the hfiles
have to be sorted inside themselves but then also relative to each other).
Can you have Storm do a similar total order partitioning?


>    3. What other approaches would you suggest?
>
>
Write HFiles and bulk load them if you can. Study carefully our bulk
loader. It is able to do some shoehorning if the product does not exactly
match the running hbase instance but only if all is properly sorted.

Good luck,
St.Ack



> Thanks in advance,
> -Jaime
>

Re: Bulk-load data into HBase using Storm

Reply via email to