On Tue, Sep 1, 2015 at 3:04 AM, Behdad Forghani <beh...@exapackets.com> wrote:
> In my experience the fastest way to load data is directly write to HFile. I > have measured a performance gain of 10x. Also, if you have binary data or > need to escape characters HBase bulk loader does not escape characters. For > my use case, I create HFiles and load the HFIle. Then, I create a view on > HBase table. The CSV bulk import tool[1] does write to HFiles in a MapReduce job. Are you saying that you've gotten 10x better performance than this tool? If so, it would certainly be interesting to hear about how you able to get such good performance. Or were you comparing to bulk loading via PSQL? 1. http://phoenix.apache.org/bulk_dataload.html