Start by telling us your row key design. Check for pre splitting your table regions. I managed to get to 25mb/sec write throughput in Hbase using 1 region server. If your data is evenly spread you can get around 7 times that in a 10 regions server environment. Should mean that 1 gig should take 4 sec.
On Friday, January 18, 2013, praveenesh kumar wrote: > Hey, > Can someone throw some pointers on what would be the best practice for bulk > imports in hbase ? > That would be really helpful. > > Regards, > Praveenesh > > On Thu, Jan 17, 2013 at 11:16 PM, Mohammad Tariq > <[email protected]<javascript:;>> > wrote: > > > Just to add to whatever all the heavyweights have said above, your MR job > > may not be as efficient as the MR job corresponding to your Hive query. > You > > can enhance the performance by setting the mapred config parameters > wisely > > and by tuning your MR job. > > > > Warm Regards, > > Tariq > > https://mtariq.jux.com/ > > cloudfront.blogspot.com > > > > > > On Thu, Jan 17, 2013 at 10:39 PM, ramkrishna vasudevan < > > [email protected] <javascript:;>> wrote: > > > > > Hive is more for batch and HBase is for more of real time data. > > > > > > Regards > > > Ram > > > > > > On Thu, Jan 17, 2013 at 10:30 PM, Anoop John > > > <[email protected]<javascript:;> > > > > > wrote: > > > > > > > In case of Hive data insertion means placing the file under table > path > > in > > > > HDFS. HBase need to read the data and convert it into its format. > > > (HFiles) > > > > MR is doing this work.. So this makes it clear that HBase will be > > > slower. > > > > :) As Michael said the read operation... > > > > > > > > > > > > > > > > -Anoop- > > > > > > > > On Thu, Jan 17, 2013 at 10:14 PM, Austin Chungath < > [email protected] <javascript:;> > > > > >wrote: > > > > > > > > > Hi, > > > > > Problem: hive took 6 mins to load a data set, hbase took 1 hr 14 > > mins. > > > > > It's a 20 gb data set approx 230 million records. The data is in > > hdfs, > > > > > single text file. The cluster is 11 nodes, 8 cores. > > > > > > > > > > I loaded this in hive, partitioned by date and bucketed into 32 and > > > > sorted. > > > > > Time taken is 6 mins. > > > > > > > > > > I loaded the same data into hbase, in the same cluster by writing a > > map > > > > > reduce code. It took 1hr 14 mins. The cluster wasn't running > anything > > > > else > > > > > and assuming that the code that i wrote is good enough, what is it > > that > > > > > makes hbase slower than hive in loading the data? > > > > > > > > > > Thanks, > > > > > Austin > > > > > > > > > > > > > > >
