Hello,

I am trying to find a good way to import large amount of data into HBase from 
HDFS. I have a csv file about 135G originally, I put it into HDFS, then I use 
HBase's importtsv utility to do a bulkload, for that 135G original data, it 
took 40 mins. I have 10 nodes, each has 128G, and all disk is SSD, 10G network. 
So this speed is not very good from my humble opinion, since It took only 10 
mins for me to put that 135G data into HDFS. I assume Hive will be much faster 
, for external table, it even takes no time to load. I will test it later.
So I want to ask for help if anyone has some better ideas to do bulkload in 
HBase? or importtsv is already the best tool to do bulkload in HBase world?
If I have real big-data (Say > 50T), this seems not a practical loading speed, 
isn't it? Or it is ? In practice, how people load data into HBase normally?

Thanks in advance,
Ming

Reply via email to