> > > For the bulkloading process, the HBase documentation mentions that in > a 2nd stage "the appropriate Region Server adopts the HFile, moving it > into its storage directory and making the data available to clients." > But from my experience the files also remain in the original location > from where they are "adopted". So I guess the data is actually copied > into the HBase directory right? This means that, compared to the > online importing, when bulk loading you essentially need twice the > disk space on HDFS, right? >
Yes, if you are generating HFiles on one cluster and loading into a separate hbase cluster. If they are co-located, its just a hdfs mv. Another problem is with data locality immediately after bulk loading > through MR. I understand that the locality is obtained in time through > compactions and splits. However you don't get this problem while > importing online, right? > > Yes
