Re: HBase Bulk Load script

Lars George Thu, 23 Dec 2010 13:52:21 -0800

Hi Marc,

> 1) It seems importtsv will only accept one family at a time. It shows some
> sort of security access error if I give it a column list with columns from
> different families.  Is this a limitation of the bulk loader, or is this a
> consequence of some security configuration somewhere?


This was what was implemented up until recently, see
https://issues.apache.org/jira/browse/HBASE-1861 for details.

> 2)  Does the bulk load process respect the hbase family's compression
> setting?  If not, is there a way to trigger the compression after the fact
> (major compaction, for example)?

You can specify the compression I believe as a configuration option
handled by the HFOF. Otherwise yes, switch it on (of not already done)
and do a major compaction to get all files compressed.

> 3) Am I correct in thinking that the importtsv step can run on a separate
> cluster from the hbase cluster (assuming you have an hbase client config and
> libraries)?  And if so, for the completebulkload step, will I need to
> manually copy the output of importtsv to the hbase cluster's HDFS?  Or can I
> provide a remote hdfs path, or even an S3 path for the completebulkload
> program?

Not sure if that would work, since the files are placed next to the
live ones and then moved into place from their temp location. Not sure
what happens if the local cluster has no /hbase etc.

Todd, could you help here?

> Thanks for providing this tool.
>
> Marc

Lars

Re: HBase Bulk Load script

Reply via email to