On Thu, Jul 26, 2012 at 6:47 PM, Sateesh Lakkarsu <[email protected]> wrote: >> >> >> For the bulkloading process, the HBase documentation mentions that in >> a 2nd stage "the appropriate Region Server adopts the HFile, moving it >> into its storage directory and making the data available to clients." >> But from my experience the files also remain in the original location >> from where they are "adopted". So I guess the data is actually copied >> into the HBase directory right? This means that, compared to the >> online importing, when bulk loading you essentially need twice the >> disk space on HDFS, right? >> > > Yes, if you are generating HFiles on one cluster and loading into a > separate hbase cluster. If they are co-located, its just a hdfs mv.
Hmm, both the HFile generation and the HBase cluster runs on top of the same HDFS cluster. I did a "du" on both the source HDFS directory and the destination "/hbase" directory and I got the same sizes (+- few bytes). I deleted the source directory from HDFS and then scanned the table without any problems. Maybe there is a config parameter I'm missing? Sever
