yes. I reviewed that. I want to insert data in HBase. (source is HBase table and sink is also HBase table) I do not need reduce step.
Previously I used *IdentityTableReducer*. Like below. TableMapReduceUtil.initTableMapperJob((inPutTableName), scan, SSIBulkLoaderMapper.class, ImmutableBytesWritable.class,Put.class, job); TableMapReduceUtil.initTableReducerJob((outPutTableName), IdentityTableReducer.class, job); I don't need to use reducer (as it is not necessary). I want to insert from map. One way to to use *HTable APIs *to insert data from map. (This is working) Another way is using *TableOutPutFormat*.(How to use this? I tried doing context.write(new ImmutableBytesWritable(Bytes.toBytes(OUTPUT_TABLE_NAME)), p); from map and it's not working. ) Can you give me some example where I can use TableOutPutFormat to insert data to HBase (which does not have reduce step)? Thank You! Abhay On Thu, Aug 18, 2011 at 5:56 PM, Doug Meil <[email protected]>wrote: > > Have you reviewed this? > > http://hbase.apache.org/book.html#mapreduce.example > > I'm planning to add more examples in this chapter, but there is some > sample code to review. > > > > On 8/18/11 4:18 AM, "abhay ratnaparkhi" <[email protected]> > wrote: > > >Thank you for all these information. > >Can you give me any example where I have only map task and I can put data > >in > >HBase from map? > >I tried following settings. > > > > job = new Job(conf, "Bulk Processing - Only Map."); > > job.setNumReduceTasks(0); > > job.setJarByClass(MyBulkDataLoader.class); > > //job.setMapOutputKeyClass(ImmutableBytesWritable.class); > > //job.setMapOutputValueClass(ImmutableBytesWritable.class); > > job.setOutputKeyClass(ImmutableBytesWritable.class); > > job.setOutputValueClass(Put.class); > > job.setOutputFormatClass(TableOutputFormat.class); > > Scan scan = new Scan(); > > TableMapReduceUtil.initTableMapperJob((INPUT_TABLE_NAME), > >scan,MyBulkLoaderMapper.class, ImmutableBytesWritable.class,Put.class, > >job); > > //TableMapReduceUtil.initTableReducerJob((OUTPUT_TABLE_NAME), > >IdentityTableReducer.class, job); > > LOG.info("Started " + INPUT_TABLE_NAME); > > job.waitForCompletion(true); > > > >From map class I am doing... > >context.write(new > >ImmutableBytesWritable(Bytes.toBytes(OUTPUT_TABLE_NAME)), > >p); //P is an instance of Put. > > > >Previously I was using "IdentityTableReducer". As reduce step is not > >required for bulk loading I only need to insert data in Hbase through Map > >phase. > >Where can I give the output table name? > > If you can give me any example that only has map task and HBase as a > >source > >and sink that will be helpful. > > > >Thank you. > >Abhay. > >On Tue, Aug 9, 2011 at 4:51 AM, Stack <[email protected]> wrote: > > > >> The doc here suggests avoiding reduce: > >> > >> > >> > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package > >>-summary.html#sink > >> St.Ack > >> > >> On Fri, Aug 5, 2011 at 2:19 AM, Doug Meil > >><[email protected]> > >> wrote: > >> > > >> > It's not obvious to a lot of newer folks that an MR job can exist > >>minus > >> > the R. > >> > > >> > > >> > > >> > > >> > > >> > On 8/4/11 5:52 PM, "Michael Segel" <[email protected]> wrote: > >> > > >> >> > >> >>Uhm Silly question... > >> >> > >> >>Why would you ever need a reduce step when you're writing to an HBase > >> >>table? > >> >> > >> >>Now I'm sure that there may be some fringe case, but in the past two > >> >>years, I've never come across a case where you would need to do a > >>reducer > >> >>when you're writing to HBase. > >> >> > >> >>So what am I missing? > >> >> > >> >> > >> >> > >> >>> From: [email protected] > >> >>> To: [email protected] > >> >>> Date: Thu, 4 Aug 2011 11:18:57 -0400 > >> >>> Subject: Re: loading data in HBase table using APIs > >> >>> > >> >>> > >> >>> David, thanks for the tip on this. I just checked in a reorg to the > >> >>> performance chapter and included this tip. > >> >>> > >> >>> Stack does the website updating so it's not visible yet, but this > >>tip > >> is > >> >>> in there. > >> >>> > >> >>> Thanks! > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> On 7/18/11 6:18 PM, "Buttler, David" <[email protected]> wrote: > >> >>> > >> >>> >After a quick scan of the performance section, I didn't see what I > >> >>> >consider to be a huge performance consideration: > >> >>> >If at all possible, don't do a reduce on your puts. The > >>shuffle/sort > >> >>> >part of the map/reduce paradigm is often useless if all you are > >>trying > >> >>>to > >> >>> >do is insert/update data in HBase. From the OP's description it > >> sounds > >> >>> >like he doesn't need to have any kind of reduce phase [and may be a > >> >>>great > >> >>> >candidate for bulk loading and the pre-creation of regions]. In > >>any > >> >>> >case, don't reduce if you can avoid it. > >> >>> > > >> >>> >Dave > >> >>> > > >> >>> >-----Original Message----- > >> >>> >From: Doug Meil [mailto:[email protected]] > >> >>> >Sent: Sunday, July 17, 2011 4:40 PM > >> >>> >To: [email protected] > >> >>> >Subject: Re: loading data in HBase table using APIs > >> >>> > > >> >>> > > >> >>> >Hi there- > >> >>> > > >> >>> >Take a look at this for starters: > >> >>> >http://hbase.apache.org/book.html#schema > >> >>> > > >> >>> >1) double-check your row-keys (sanity check), that's in the Schema > >> >>>Design > >> >>> >chapter. > >> >>> > > >> >>> >http://hbase.apache.org/book.html#performance > >> >>> > > >> >>> > > >> >>> >2) if not using bulk-load - re-create regions, do this regardless > >>of > >> >>> >using MR or non-MR. > >> >>> > > >> >>> >3) if not using MR job and are using multiple threads with the > >>Java > >> >>>API, > >> >>> >take a look at HTableUtil. It's on trunk, but that utility can > >>help > >> >>>you. > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> > > >> >>> >On 7/17/11 4:08 PM, "abhay ratnaparkhi" > >><[email protected]> > >> >>> >wrote: > >> >>> > > >> >>> >>Hello, > >> >>> >> > >> >>> >>I am loading lots of data through API in HBase table. > >> >>> >>I am using HBase Java API to do this. > >> >>> >>If I convert this code to map-reduce task and use > >>*TableOutputFormat* > >> >>> >>class > >> >>> >>then will I get any performance improvement? > >> >>> >> > >> >>> >>As I am not getting input data from existing HBase table or HDFS > >> files > >> >>> >>there > >> >>> >>will not be any input to map task. > >> >>> >>The only advantage is multiple map tasks running simultaneously > >>might > >> >>> >>make > >> >>> >>processing faster. > >> >>> >> > >> >>> >>Thanks! > >> >>> >>Regars, > >> >>> >>Abhay > >> >>> > > >> >>> > >> >> > >> > > >> > > >> > >
