Uhm Silly question... Why would you ever need a reduce step when you're writing to an HBase table?
Now I'm sure that there may be some fringe case, but in the past two years, I've never come across a case where you would need to do a reducer when you're writing to HBase. So what am I missing? > From: [email protected] > To: [email protected] > Date: Thu, 4 Aug 2011 11:18:57 -0400 > Subject: Re: loading data in HBase table using APIs > > > David, thanks for the tip on this. I just checked in a reorg to the > performance chapter and included this tip. > > Stack does the website updating so it's not visible yet, but this tip is > in there. > > Thanks! > > > > > On 7/18/11 6:18 PM, "Buttler, David" <[email protected]> wrote: > > >After a quick scan of the performance section, I didn't see what I > >consider to be a huge performance consideration: > >If at all possible, don't do a reduce on your puts. The shuffle/sort > >part of the map/reduce paradigm is often useless if all you are trying to > >do is insert/update data in HBase. From the OP's description it sounds > >like he doesn't need to have any kind of reduce phase [and may be a great > >candidate for bulk loading and the pre-creation of regions]. In any > >case, don't reduce if you can avoid it. > > > >Dave > > > >-----Original Message----- > >From: Doug Meil [mailto:[email protected]] > >Sent: Sunday, July 17, 2011 4:40 PM > >To: [email protected] > >Subject: Re: loading data in HBase table using APIs > > > > > >Hi there- > > > >Take a look at this for starters: > >http://hbase.apache.org/book.html#schema > > > >1) double-check your row-keys (sanity check), that's in the Schema Design > >chapter. > > > >http://hbase.apache.org/book.html#performance > > > > > >2) if not using bulk-load - re-create regions, do this regardless of > >using MR or non-MR. > > > >3) if not using MR job and are using multiple threads with the Java API, > >take a look at HTableUtil. It's on trunk, but that utility can help you. > > > > > > > > > > > > > >On 7/17/11 4:08 PM, "abhay ratnaparkhi" <[email protected]> > >wrote: > > > >>Hello, > >> > >>I am loading lots of data through API in HBase table. > >>I am using HBase Java API to do this. > >>If I convert this code to map-reduce task and use *TableOutputFormat* > >>class > >>then will I get any performance improvement? > >> > >>As I am not getting input data from existing HBase table or HDFS files > >>there > >>will not be any input to map task. > >>The only advantage is multiple map tasks running simultaneously might > >>make > >>processing faster. > >> > >>Thanks! > >>Regars, > >>Abhay > > >
