Re: Bulk loading disadvantages

Sever Fundatureanu Thu, 26 Jul 2012 15:46:52 -0700

On Thu, Jul 26, 2012 at 6:47 PM, Sateesh Lakkarsu <[email protected]> wrote:
>>
>>
>> For the bulkloading process, the HBase documentation mentions that in
>> a 2nd stage "the appropriate Region Server adopts the HFile, moving it
>> into its storage directory and making the data available to clients."
>> But from my experience the files also remain in the original location
>> from where they are "adopted". So I guess the data is actually copied
>> into the HBase directory right? This means that, compared to the
>> online importing, when bulk loading you essentially need twice the
>> disk space on HDFS, right?
>>
>
> Yes, if you are generating HFiles on one cluster and loading into a
> separate hbase cluster. If they are co-located, its just a hdfs mv.


Hmm, both the HFile generation and the HBase cluster runs on top of
the same HDFS cluster. I did a "du" on both the source HDFS directory
and the destination "/hbase" directory and I got the same sizes (+-
few bytes). I deleted the source directory from HDFS and then scanned
the table without any problems. Maybe there is a config parameter I'm
missing?

Sever

Re: Bulk loading disadvantages

Reply via email to