The writes take longer in HBase. Just how much longer may depend on how well you tuned HBase.
Now, having said that... suppose you want to find a single record in either HBase or Hive. Which do you think will be faster? ;-) On Jan 17, 2013, at 10:44 AM, Austin Chungath <austi...@gmail.com> wrote: > Hi, > Problem: hive took 6 mins to load a data set, hbase took 1 hr 14 mins. > It's a 20 gb data set approx 230 million records. The data is in hdfs, > single text file. The cluster is 11 nodes, 8 cores. > > I loaded this in hive, partitioned by date and bucketed into 32 and sorted. > Time taken is 6 mins. > > I loaded the same data into hbase, in the same cluster by writing a map > reduce code. It took 1hr 14 mins. The cluster wasn't running anything else > and assuming that the code that i wrote is good enough, what is it that > makes hbase slower than hive in loading the data? > > Thanks, > Austin