Bulk load using HFileOutputFormat.RecordWriter

Nanheng Wu Wed, 05 Jan 2011 15:55:23 -0800

Hi,


  I am new to HBase and Hadoop and I am trying to find the best way to
bulk load a table from HDFS to HBase. I don't mind creating a new
table for each batch and what I understand using HFileOutputFormat
directly in a MR job is the most efficient method. My input data set
is already in sorted order, it seems to me that I don't need to use
reducers, which require me to do a globally sort already sorted data.
I tried to use HFileOutputFormat.getRecordWriter in my mapper and 0
reducers but the output directory has a only a _temporary directory
with my outputs in each subdirectory. That doesn't seem be be what the
loadtable script expects  (a column family directory with HFiles). Can
someone tell me if what I am doing makes sense in general or how to do
this properly? Thanks!

Bulk load using HFileOutputFormat.RecordWriter

Reply via email to