Have you reviewed this? http://hbase.apache.org/book.html#mapreduce.example
I'm planning to add more examples in this chapter, but there is some sample code to review. On 8/18/11 4:18 AM, "abhay ratnaparkhi" <[email protected]> wrote: >Thank you for all these information. >Can you give me any example where I have only map task and I can put data >in >HBase from map? >I tried following settings. > > job = new Job(conf, "Bulk Processing - Only Map."); > job.setNumReduceTasks(0); > job.setJarByClass(MyBulkDataLoader.class); > //job.setMapOutputKeyClass(ImmutableBytesWritable.class); > //job.setMapOutputValueClass(ImmutableBytesWritable.class); > job.setOutputKeyClass(ImmutableBytesWritable.class); > job.setOutputValueClass(Put.class); > job.setOutputFormatClass(TableOutputFormat.class); > Scan scan = new Scan(); > TableMapReduceUtil.initTableMapperJob((INPUT_TABLE_NAME), >scan,MyBulkLoaderMapper.class, ImmutableBytesWritable.class,Put.class, >job); > //TableMapReduceUtil.initTableReducerJob((OUTPUT_TABLE_NAME), >IdentityTableReducer.class, job); > LOG.info("Started " + INPUT_TABLE_NAME); > job.waitForCompletion(true); > >From map class I am doing... >context.write(new >ImmutableBytesWritable(Bytes.toBytes(OUTPUT_TABLE_NAME)), >p); //P is an instance of Put. > >Previously I was using "IdentityTableReducer". As reduce step is not >required for bulk loading I only need to insert data in Hbase through Map >phase. >Where can I give the output table name? > If you can give me any example that only has map task and HBase as a >source >and sink that will be helpful. > >Thank you. >Abhay. >On Tue, Aug 9, 2011 at 4:51 AM, Stack <[email protected]> wrote: > >> The doc here suggests avoiding reduce: >> >> >>http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package >>-summary.html#sink >> St.Ack >> >> On Fri, Aug 5, 2011 at 2:19 AM, Doug Meil >><[email protected]> >> wrote: >> > >> > It's not obvious to a lot of newer folks that an MR job can exist >>minus >> > the R. >> > >> > >> > >> > >> > >> > On 8/4/11 5:52 PM, "Michael Segel" <[email protected]> wrote: >> > >> >> >> >>Uhm Silly question... >> >> >> >>Why would you ever need a reduce step when you're writing to an HBase >> >>table? >> >> >> >>Now I'm sure that there may be some fringe case, but in the past two >> >>years, I've never come across a case where you would need to do a >>reducer >> >>when you're writing to HBase. >> >> >> >>So what am I missing? >> >> >> >> >> >> >> >>> From: [email protected] >> >>> To: [email protected] >> >>> Date: Thu, 4 Aug 2011 11:18:57 -0400 >> >>> Subject: Re: loading data in HBase table using APIs >> >>> >> >>> >> >>> David, thanks for the tip on this. I just checked in a reorg to the >> >>> performance chapter and included this tip. >> >>> >> >>> Stack does the website updating so it's not visible yet, but this >>tip >> is >> >>> in there. >> >>> >> >>> Thanks! >> >>> >> >>> >> >>> >> >>> >> >>> On 7/18/11 6:18 PM, "Buttler, David" <[email protected]> wrote: >> >>> >> >>> >After a quick scan of the performance section, I didn't see what I >> >>> >consider to be a huge performance consideration: >> >>> >If at all possible, don't do a reduce on your puts. The >>shuffle/sort >> >>> >part of the map/reduce paradigm is often useless if all you are >>trying >> >>>to >> >>> >do is insert/update data in HBase. From the OP's description it >> sounds >> >>> >like he doesn't need to have any kind of reduce phase [and may be a >> >>>great >> >>> >candidate for bulk loading and the pre-creation of regions]. In >>any >> >>> >case, don't reduce if you can avoid it. >> >>> > >> >>> >Dave >> >>> > >> >>> >-----Original Message----- >> >>> >From: Doug Meil [mailto:[email protected]] >> >>> >Sent: Sunday, July 17, 2011 4:40 PM >> >>> >To: [email protected] >> >>> >Subject: Re: loading data in HBase table using APIs >> >>> > >> >>> > >> >>> >Hi there- >> >>> > >> >>> >Take a look at this for starters: >> >>> >http://hbase.apache.org/book.html#schema >> >>> > >> >>> >1) double-check your row-keys (sanity check), that's in the Schema >> >>>Design >> >>> >chapter. >> >>> > >> >>> >http://hbase.apache.org/book.html#performance >> >>> > >> >>> > >> >>> >2) if not using bulk-load - re-create regions, do this regardless >>of >> >>> >using MR or non-MR. >> >>> > >> >>> >3) if not using MR job and are using multiple threads with the >>Java >> >>>API, >> >>> >take a look at HTableUtil. It's on trunk, but that utility can >>help >> >>>you. >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> >On 7/17/11 4:08 PM, "abhay ratnaparkhi" >><[email protected]> >> >>> >wrote: >> >>> > >> >>> >>Hello, >> >>> >> >> >>> >>I am loading lots of data through API in HBase table. >> >>> >>I am using HBase Java API to do this. >> >>> >>If I convert this code to map-reduce task and use >>*TableOutputFormat* >> >>> >>class >> >>> >>then will I get any performance improvement? >> >>> >> >> >>> >>As I am not getting input data from existing HBase table or HDFS >> files >> >>> >>there >> >>> >>will not be any input to map task. >> >>> >>The only advantage is multiple map tasks running simultaneously >>might >> >>> >>make >> >>> >>processing faster. >> >>> >> >> >>> >>Thanks! >> >>> >>Regars, >> >>> >>Abhay >> >>> > >> >>> >> >> >> > >> > >>
