Re: loading data in HBase table using APIs

abhay ratnaparkhi Thu, 18 Aug 2011 01:19:33 -0700

Thank you for all these information.
Can you give me any example where I have only map task and I can put data in
HBase from map?
I tried following settings.


          job = new Job(conf, "Bulk Processing - Only Map.");
          job.setNumReduceTasks(0);
          job.setJarByClass(MyBulkDataLoader.class);
          //job.setMapOutputKeyClass(ImmutableBytesWritable.class);
          //job.setMapOutputValueClass(ImmutableBytesWritable.class);
          job.setOutputKeyClass(ImmutableBytesWritable.class);
          job.setOutputValueClass(Put.class);
          job.setOutputFormatClass(TableOutputFormat.class);
          Scan scan = new Scan();
          TableMapReduceUtil.initTableMapperJob((INPUT_TABLE_NAME),
scan,MyBulkLoaderMapper.class, ImmutableBytesWritable.class,Put.class, job);
          //TableMapReduceUtil.initTableReducerJob((OUTPUT_TABLE_NAME),
IdentityTableReducer.class,  job);
          LOG.info("Started " + INPUT_TABLE_NAME);
          job.waitForCompletion(true);

>From map class I am doing...
context.write(new ImmutableBytesWritable(Bytes.toBytes(OUTPUT_TABLE_NAME)),
p);   //P is an instance of Put.

Previously I was using "IdentityTableReducer". As reduce step is not
required for bulk loading I only need to insert data in Hbase through Map
phase.
Where can I give the output table name?
 If you can give me any example that only has map task and HBase as a source
and sink that will be helpful.

Thank you.
Abhay.
On Tue, Aug 9, 2011 at 4:51 AM, Stack <[email protected]> wrote:

> The doc here suggests avoiding reduce:
>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#sink
> St.Ack
>
> On Fri, Aug 5, 2011 at 2:19 AM, Doug Meil <[email protected]>
> wrote:
> >
> > It's not obvious to a lot of newer folks that an MR job can exist minus
> > the R.
> >
> >
> >
> >
> >
> > On 8/4/11 5:52 PM, "Michael Segel" <[email protected]> wrote:
> >
> >>
> >>Uhm Silly question...
> >>
> >>Why would you ever need a reduce step when you're writing to an HBase
> >>table?
> >>
> >>Now I'm sure that there may be some fringe case, but in the past two
> >>years, I've never come across a case where you would need to do a reducer
> >>when you're writing to HBase.
> >>
> >>So what am I missing?
> >>
> >>
> >>
> >>> From: [email protected]
> >>> To: [email protected]
> >>> Date: Thu, 4 Aug 2011 11:18:57 -0400
> >>> Subject: Re: loading data in HBase table using APIs
> >>>
> >>>
> >>> David, thanks for the tip on this.  I just checked in a reorg to the
> >>> performance chapter and included this tip.
> >>>
> >>> Stack does the website updating so it's not visible yet, but this tip
> is
> >>> in there.
> >>>
> >>> Thanks!
> >>>
> >>>
> >>>
> >>>
> >>> On 7/18/11 6:18 PM, "Buttler, David" <[email protected]> wrote:
> >>>
> >>> >After a quick scan of the performance section, I didn't see what I
> >>> >consider to be a huge performance consideration:
> >>> >If at all possible, don't do a reduce on your puts.  The shuffle/sort
> >>> >part of the map/reduce paradigm is often useless if all you are trying
> >>>to
> >>> >do is insert/update data in HBase.  From the OP's description it
> sounds
> >>> >like he doesn't need to have any kind of reduce phase [and may be a
> >>>great
> >>> >candidate for bulk loading and the pre-creation of regions].  In any
> >>> >case, don't reduce if you can avoid it.
> >>> >
> >>> >Dave
> >>> >
> >>> >-----Original Message-----
> >>> >From: Doug Meil [mailto:[email protected]]
> >>> >Sent: Sunday, July 17, 2011 4:40 PM
> >>> >To: [email protected]
> >>> >Subject: Re: loading data in HBase table using APIs
> >>> >
> >>> >
> >>> >Hi there-
> >>> >
> >>> >Take a look at this for starters:
> >>> >http://hbase.apache.org/book.html#schema
> >>> >
> >>> >1)  double-check your row-keys (sanity check), that's in the Schema
> >>>Design
> >>> >chapter.
> >>> >
> >>> >http://hbase.apache.org/book.html#performance
> >>> >
> >>> >
> >>> >2)  if not using bulk-load - re-create regions, do this regardless of
> >>> >using MR or non-MR.
> >>> >
> >>> >3)  if not using MR job and are using multiple threads with the Java
> >>>API,
> >>> >take a look at HTableUtil.  It's on trunk, but that utility can help
> >>>you.
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >On 7/17/11 4:08 PM, "abhay ratnaparkhi" <[email protected]>
> >>> >wrote:
> >>> >
> >>> >>Hello,
> >>> >>
> >>> >>I am loading lots of data through API in HBase table.
> >>> >>I am using HBase Java API to do this.
> >>> >>If I convert this code to map-reduce task and use *TableOutputFormat*
> >>> >>class
> >>> >>then will I get any performance improvement?
> >>> >>
> >>> >>As I am not getting input data from existing HBase table or HDFS
> files
> >>> >>there
> >>> >>will not be any input to map task.
> >>> >>The only advantage is multiple map tasks running simultaneously might
> >>> >>make
> >>> >>processing faster.
> >>> >>
> >>> >>Thanks!
> >>> >>Regars,
> >>> >>Abhay
> >>> >
> >>>
> >>
> >
> >
>

Re: loading data in HBase table using APIs

Reply via email to