It's not obvious to a lot of newer folks that an MR job can exist minus the R.
On 8/4/11 5:52 PM, "Michael Segel" <[email protected]> wrote: > >Uhm Silly question... > >Why would you ever need a reduce step when you're writing to an HBase >table? > >Now I'm sure that there may be some fringe case, but in the past two >years, I've never come across a case where you would need to do a reducer >when you're writing to HBase. > >So what am I missing? > > > >> From: [email protected] >> To: [email protected] >> Date: Thu, 4 Aug 2011 11:18:57 -0400 >> Subject: Re: loading data in HBase table using APIs >> >> >> David, thanks for the tip on this. I just checked in a reorg to the >> performance chapter and included this tip. >> >> Stack does the website updating so it's not visible yet, but this tip is >> in there. >> >> Thanks! >> >> >> >> >> On 7/18/11 6:18 PM, "Buttler, David" <[email protected]> wrote: >> >> >After a quick scan of the performance section, I didn't see what I >> >consider to be a huge performance consideration: >> >If at all possible, don't do a reduce on your puts. The shuffle/sort >> >part of the map/reduce paradigm is often useless if all you are trying >>to >> >do is insert/update data in HBase. From the OP's description it sounds >> >like he doesn't need to have any kind of reduce phase [and may be a >>great >> >candidate for bulk loading and the pre-creation of regions]. In any >> >case, don't reduce if you can avoid it. >> > >> >Dave >> > >> >-----Original Message----- >> >From: Doug Meil [mailto:[email protected]] >> >Sent: Sunday, July 17, 2011 4:40 PM >> >To: [email protected] >> >Subject: Re: loading data in HBase table using APIs >> > >> > >> >Hi there- >> > >> >Take a look at this for starters: >> >http://hbase.apache.org/book.html#schema >> > >> >1) double-check your row-keys (sanity check), that's in the Schema >>Design >> >chapter. >> > >> >http://hbase.apache.org/book.html#performance >> > >> > >> >2) if not using bulk-load - re-create regions, do this regardless of >> >using MR or non-MR. >> > >> >3) if not using MR job and are using multiple threads with the Java >>API, >> >take a look at HTableUtil. It's on trunk, but that utility can help >>you. >> > >> > >> > >> > >> > >> > >> >On 7/17/11 4:08 PM, "abhay ratnaparkhi" <[email protected]> >> >wrote: >> > >> >>Hello, >> >> >> >>I am loading lots of data through API in HBase table. >> >>I am using HBase Java API to do this. >> >>If I convert this code to map-reduce task and use *TableOutputFormat* >> >>class >> >>then will I get any performance improvement? >> >> >> >>As I am not getting input data from existing HBase table or HDFS files >> >>there >> >>will not be any input to map task. >> >>The only advantage is multiple map tasks running simultaneously might >> >>make >> >>processing faster. >> >> >> >>Thanks! >> >>Regars, >> >>Abhay >> > >> >
