According to me HBase need to store more metadata than hive (For each value it stores seperately row key , col_family ,col_name,value) and file size of original hdfs file may increase in size
I also wondered this if anyone has got better result for hbase than hive let us know. Thank You On Sun, Jan 20, 2013 at 8:43 PM, Doug Meil <[email protected]>wrote: > > Hi there- > > On top of what everybody else said, for more info on rowkey design and > pre-splitting see http://hbase.apache.org/book.html#schema (as well as > other threads in this dist-list on that topic). > > > > > > On 1/19/13 4:12 PM, "Mohammad Tariq" <[email protected]> wrote: > > >Hello Austin, > > > > I am sorry for the late response. > > > >Asaf has made a very valid point. Rowkwey design is very crucial. > >Specially if the data is gonna be sequential(timeseries kinda thing). > >You may end up with hotspotting problem. Use pre-splitted tables > >or hash the keys to avoid that. It'll also allow you to fetch the results > >faster. > > > >Warm Regards, > >Tariq > >https://mtariq.jux.com/ > >cloudfront.blogspot.com > > > > > >On Sun, Jan 20, 2013 at 1:20 AM, Asaf Mesika <[email protected]> > >wrote: > > > >> Start by telling us your row key design. > >> Check for pre splitting your table regions. > >> I managed to get to 25mb/sec write throughput in Hbase using 1 region > >> server. If your data is evenly spread you can get around 7 times that > >>in a > >> 10 regions server environment. Should mean that 1 gig should take 4 sec. > >> > >> > >> On Friday, January 18, 2013, praveenesh kumar wrote: > >> > >> > Hey, > >> > Can someone throw some pointers on what would be the best practice for > >> bulk > >> > imports in hbase ? > >> > That would be really helpful. > >> > > >> > Regards, > >> > Praveenesh > >> > > >> > On Thu, Jan 17, 2013 at 11:16 PM, Mohammad Tariq <[email protected] > >> <javascript:;>> > >> > wrote: > >> > > >> > > Just to add to whatever all the heavyweights have said above, your > >>MR > >> job > >> > > may not be as efficient as the MR job corresponding to your Hive > >>query. > >> > You > >> > > can enhance the performance by setting the mapred config parameters > >> > wisely > >> > > and by tuning your MR job. > >> > > > >> > > Warm Regards, > >> > > Tariq > >> > > https://mtariq.jux.com/ > >> > > cloudfront.blogspot.com > >> > > > >> > > > >> > > On Thu, Jan 17, 2013 at 10:39 PM, ramkrishna vasudevan < > >> > > [email protected] <javascript:;>> wrote: > >> > > > >> > > > Hive is more for batch and HBase is for more of real time data. > >> > > > > >> > > > Regards > >> > > > Ram > >> > > > > >> > > > On Thu, Jan 17, 2013 at 10:30 PM, Anoop John > >><[email protected] > >> <javascript:;> > >> > > > >> > > > wrote: > >> > > > > >> > > > > In case of Hive data insertion means placing the file under > >>table > >> > path > >> > > in > >> > > > > HDFS. HBase need to read the data and convert it into its > >>format. > >> > > > (HFiles) > >> > > > > MR is doing this work.. So this makes it clear that HBase will > >>be > >> > > > slower. > >> > > > > :) As Michael said the read operation... > >> > > > > > >> > > > > > >> > > > > > >> > > > > -Anoop- > >> > > > > > >> > > > > On Thu, Jan 17, 2013 at 10:14 PM, Austin Chungath < > >> > [email protected] <javascript:;> > >> > > > > >wrote: > >> > > > > > >> > > > > > Hi, > >> > > > > > Problem: hive took 6 mins to load a data set, hbase took 1 hr > >>14 > >> > > mins. > >> > > > > > It's a 20 gb data set approx 230 million records. The data is > >>in > >> > > hdfs, > >> > > > > > single text file. The cluster is 11 nodes, 8 cores. > >> > > > > > > >> > > > > > I loaded this in hive, partitioned by date and bucketed into > >>32 > >> and > >> > > > > sorted. > >> > > > > > Time taken is 6 mins. > >> > > > > > > >> > > > > > I loaded the same data into hbase, in the same cluster by > >> writing a > >> > > map > >> > > > > > reduce code. It took 1hr 14 mins. The cluster wasn't running > >> > anything > >> > > > > else > >> > > > > > and assuming that the code that i wrote is good enough, what > >>is > >> it > >> > > that > >> > > > > > makes hbase slower than hive in loading the data? > >> > > > > > > >> > > > > > Thanks, > >> > > > > > Austin > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > > -- * * * Thanx and Regards* * Vikas Jadhav*
