Re: Help Tuning CsvBulkImport MapReduce

Gabriel Reid Tue, 01 Sep 2015 00:15:06 -0700

On Tue, Sep 1, 2015 at 3:04 AM, Behdad Forghani <beh...@exapackets.com> wrote:


> In my experience the fastest way to load data is directly write to HFile. I
> have measured a performance gain of 10x. Also, if you have binary data or
> need to escape characters HBase bulk loader does not escape characters.  For
> my use case, I create HFiles and load the HFIle. Then, I create a view on
> HBase table.

The CSV bulk import tool[1] does write to HFiles in a MapReduce job.
Are you saying that you've gotten 10x better performance than this
tool? If so, it would certainly be interesting to hear about how you
able to get such good performance. Or were you comparing to bulk
loading via PSQL?

1. http://phoenix.apache.org/bulk_dataload.html

Re: Help Tuning CsvBulkImport MapReduce

Reply via email to