Hi,
Problem: hive took 6 mins to load a data set, hbase took 1 hr 14 mins.
It's a 20 gb data set approx 230 million records. The data is in hdfs,
single text file. The cluster is 11 nodes, 8 cores.

I loaded this in hive, partitioned by date and bucketed into 32 and sorted.
Time taken is 6 mins.

I loaded the same data into hbase, in the same cluster by writing a map
reduce code. It took 1hr 14 mins. The cluster wasn't running anything else
and assuming that the code that i wrote is good enough, what is it that
makes hbase slower than hive in loading the data?

Thanks,
Austin

Reply via email to