On Wed, Feb 4, 2015 at 10:49 AM, Jaime Solano <[email protected]> wrote:
> For a proof of concept we'll be working on, we want to bulk-load data into > HBase, following a similar approach to the one explained here > < > http://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/ > >, > but with the difference that for the HFile creation (step 2 in the > mentioned article), we want to use Storm instead of MapReduce. That is, we > want to bulk load data not sitting in HDFS, but probably in memory. > > 1. What are your thoughts about this? Is it feasible? > 2. What challenges do you foresee? > Notice how the bulk load mapreduce job maintains a total order (the hfiles have to be sorted inside themselves but then also relative to each other). Can you have Storm do a similar total order partitioning? > 3. What other approaches would you suggest? > > Write HFiles and bulk load them if you can. Study carefully our bulk loader. It is able to do some shoehorning if the product does not exactly match the running hbase instance but only if all is properly sorted. Good luck, St.Ack > Thanks in advance, > -Jaime >
