Re: Bulk loading disadvantages

Sateesh Lakkarsu Thu, 26 Jul 2012 09:47:45 -0700

>
>
> For the bulkloading process, the HBase documentation mentions that in
> a 2nd stage "the appropriate Region Server adopts the HFile, moving it
> into its storage directory and making the data available to clients."
> But from my experience the files also remain in the original location
> from where they are "adopted". So I guess the data is actually copied
> into the HBase directory right? This means that, compared to the
> online importing, when bulk loading you essentially need twice the
> disk space on HDFS, right?
>


Yes, if you are generating HFiles on one cluster and loading into a
separate hbase cluster. If they are co-located, its just a hdfs mv.

Another problem is with data locality immediately after bulk loading
> through MR. I understand that the locality is obtained in time through
> compactions and splits. However you don't get this problem while
> importing online, right?
>
> Yes

Re: Bulk loading disadvantages

Reply via email to