Re: Bulk loading disadvantages

Anil Gupta Thu, 26 Jul 2012 20:41:12 -0700

Hi Sever,

That's a very interesting thing. Which Hadoop and hbase version you are using? 
I am going to run bulk loads tomorrow. If you can tell me which directories in 
hdfs you compared with /hbase/$table then I will try to check the same.


Best Regards,
Anil

On Jul 26, 2012, at 3:46 PM, Sever Fundatureanu <[email protected]> 
wrote:

> On Thu, Jul 26, 2012 at 6:47 PM, Sateesh Lakkarsu <[email protected]> wrote:
>>> 
>>> 
>>> For the bulkloading process, the HBase documentation mentions that in
>>> a 2nd stage "the appropriate Region Server adopts the HFile, moving it
>>> into its storage directory and making the data available to clients."
>>> But from my experience the files also remain in the original location
>>> from where they are "adopted". So I guess the data is actually copied
>>> into the HBase directory right? This means that, compared to the
>>> online importing, when bulk loading you essentially need twice the
>>> disk space on HDFS, right?
>>> 
>> 
>> Yes, if you are generating HFiles on one cluster and loading into a
>> separate hbase cluster. If they are co-located, its just a hdfs mv.
> 
> Hmm, both the HFile generation and the HBase cluster runs on top of
> the same HDFS cluster. I did a "du" on both the source HDFS directory
> and the destination "/hbase" directory and I got the same sizes (+-
> few bytes). I deleted the source directory from HDFS and then scanned
> the table without any problems. Maybe there is a config parameter I'm
> missing?
> 
> Sever

Re: Bulk loading disadvantages

Reply via email to