HBase Bulk Load script

Marc Limotte Thu, 23 Dec 2010 13:12:50 -0800

Hi,

I'm using the HBase Bulk
Loader<http://archive.cloudera.com/cdh/3/hbase/bulk-loads.html>with
0.89.  Very easy to use.  I have a few of questions:


1) It seems importtsv will only accept one family at a time. It shows some
sort of security access error if I give it a column list with columns from
different families.  Is this a limitation of the bulk loader, or is this a
consequence of some security configuration somewhere?

2)  Does the bulk load process respect the hbase family's compression
setting?  If not, is there a way to trigger the compression after the fact
(major compaction, for example)?

3) Am I correct in thinking that the importtsv step can run on a separate
cluster from the hbase cluster (assuming you have an hbase client config and
libraries)?  And if so, for the completebulkload step, will I need to
manually copy the output of importtsv to the hbase cluster's HDFS?  Or can I
provide a remote hdfs path, or even an S3 path for the completebulkload
program?

Thanks for providing this tool.

Marc

HBase Bulk Load script

Reply via email to