Re: loading data in HBase table using APIs

abhay ratnaparkhi Mon, 22 Aug 2011 04:36:16 -0700

yes. I reviewed that.

I want to insert data in HBase. (source is HBase table and sink is also
HBase table)
I do not need reduce step.


Previously I used *IdentityTableReducer*. Like below.

 TableMapReduceUtil.initTableMapperJob((inPutTableName), scan,
SSIBulkLoaderMapper.class, ImmutableBytesWritable.class,Put.class, job);
 TableMapReduceUtil.initTableReducerJob((outPutTableName),
IdentityTableReducer.class,  job);

I don't need to use reducer (as it is not necessary). I want to insert from
map.
One way to to use *HTable APIs *to insert data from map. (This is working)
Another way is using *TableOutPutFormat*.(How to use this? I tried doing
context.write(new ImmutableBytesWritable(Bytes.toBytes(OUTPUT_TABLE_NAME)),
p);  from map and it's not working. )

Can you give me some example where I can use TableOutPutFormat to insert
data to HBase (which does not have reduce step)?

Thank You!
Abhay

On Thu, Aug 18, 2011 at 5:56 PM, Doug Meil <[email protected]>wrote:

>
> Have you reviewed this?
>
> http://hbase.apache.org/book.html#mapreduce.example
>
> I'm planning to add more examples in this chapter, but there is some
> sample code to review.
>
>
>
> On 8/18/11 4:18 AM, "abhay ratnaparkhi" <[email protected]>
> wrote:
>
> >Thank you for all these information.
> >Can you give me any example where I have only map task and I can put data
> >in
> >HBase from map?
> >I tried following settings.
> >
> >          job = new Job(conf, "Bulk Processing - Only Map.");
> >          job.setNumReduceTasks(0);
> >          job.setJarByClass(MyBulkDataLoader.class);
> >          //job.setMapOutputKeyClass(ImmutableBytesWritable.class);
> >          //job.setMapOutputValueClass(ImmutableBytesWritable.class);
> >          job.setOutputKeyClass(ImmutableBytesWritable.class);
> >          job.setOutputValueClass(Put.class);
> >          job.setOutputFormatClass(TableOutputFormat.class);
> >          Scan scan = new Scan();
> >          TableMapReduceUtil.initTableMapperJob((INPUT_TABLE_NAME),
> >scan,MyBulkLoaderMapper.class, ImmutableBytesWritable.class,Put.class,
> >job);
> >          //TableMapReduceUtil.initTableReducerJob((OUTPUT_TABLE_NAME),
> >IdentityTableReducer.class,  job);
> >          LOG.info("Started " + INPUT_TABLE_NAME);
> >          job.waitForCompletion(true);
> >
> >From map class I am doing...
> >context.write(new
> >ImmutableBytesWritable(Bytes.toBytes(OUTPUT_TABLE_NAME)),
> >p);   //P is an instance of Put.
> >
> >Previously I was using "IdentityTableReducer". As reduce step is not
> >required for bulk loading I only need to insert data in Hbase through Map
> >phase.
> >Where can I give the output table name?
> > If you can give me any example that only has map task and HBase as a
> >source
> >and sink that will be helpful.
> >
> >Thank you.
> >Abhay.
> >On Tue, Aug 9, 2011 at 4:51 AM, Stack <[email protected]> wrote:
> >
> >> The doc here suggests avoiding reduce:
> >>
> >>
> >>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package
> >>-summary.html#sink
> >> St.Ack
> >>
> >> On Fri, Aug 5, 2011 at 2:19 AM, Doug Meil
> >><[email protected]>
> >> wrote:
> >> >
> >> > It's not obvious to a lot of newer folks that an MR job can exist
> >>minus
> >> > the R.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On 8/4/11 5:52 PM, "Michael Segel" <[email protected]> wrote:
> >> >
> >> >>
> >> >>Uhm Silly question...
> >> >>
> >> >>Why would you ever need a reduce step when you're writing to an HBase
> >> >>table?
> >> >>
> >> >>Now I'm sure that there may be some fringe case, but in the past two
> >> >>years, I've never come across a case where you would need to do a
> >>reducer
> >> >>when you're writing to HBase.
> >> >>
> >> >>So what am I missing?
> >> >>
> >> >>
> >> >>
> >> >>> From: [email protected]
> >> >>> To: [email protected]
> >> >>> Date: Thu, 4 Aug 2011 11:18:57 -0400
> >> >>> Subject: Re: loading data in HBase table using APIs
> >> >>>
> >> >>>
> >> >>> David, thanks for the tip on this.  I just checked in a reorg to the
> >> >>> performance chapter and included this tip.
> >> >>>
> >> >>> Stack does the website updating so it's not visible yet, but this
> >>tip
> >> is
> >> >>> in there.
> >> >>>
> >> >>> Thanks!
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> On 7/18/11 6:18 PM, "Buttler, David" <[email protected]> wrote:
> >> >>>
> >> >>> >After a quick scan of the performance section, I didn't see what I
> >> >>> >consider to be a huge performance consideration:
> >> >>> >If at all possible, don't do a reduce on your puts.  The
> >>shuffle/sort
> >> >>> >part of the map/reduce paradigm is often useless if all you are
> >>trying
> >> >>>to
> >> >>> >do is insert/update data in HBase.  From the OP's description it
> >> sounds
> >> >>> >like he doesn't need to have any kind of reduce phase [and may be a
> >> >>>great
> >> >>> >candidate for bulk loading and the pre-creation of regions].  In
> >>any
> >> >>> >case, don't reduce if you can avoid it.
> >> >>> >
> >> >>> >Dave
> >> >>> >
> >> >>> >-----Original Message-----
> >> >>> >From: Doug Meil [mailto:[email protected]]
> >> >>> >Sent: Sunday, July 17, 2011 4:40 PM
> >> >>> >To: [email protected]
> >> >>> >Subject: Re: loading data in HBase table using APIs
> >> >>> >
> >> >>> >
> >> >>> >Hi there-
> >> >>> >
> >> >>> >Take a look at this for starters:
> >> >>> >http://hbase.apache.org/book.html#schema
> >> >>> >
> >> >>> >1)  double-check your row-keys (sanity check), that's in the Schema
> >> >>>Design
> >> >>> >chapter.
> >> >>> >
> >> >>> >http://hbase.apache.org/book.html#performance
> >> >>> >
> >> >>> >
> >> >>> >2)  if not using bulk-load - re-create regions, do this regardless
> >>of
> >> >>> >using MR or non-MR.
> >> >>> >
> >> >>> >3)  if not using MR job and are using multiple threads with the
> >>Java
> >> >>>API,
> >> >>> >take a look at HTableUtil.  It's on trunk, but that utility can
> >>help
> >> >>>you.
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >On 7/17/11 4:08 PM, "abhay ratnaparkhi"
> >><[email protected]>
> >> >>> >wrote:
> >> >>> >
> >> >>> >>Hello,
> >> >>> >>
> >> >>> >>I am loading lots of data through API in HBase table.
> >> >>> >>I am using HBase Java API to do this.
> >> >>> >>If I convert this code to map-reduce task and use
> >>*TableOutputFormat*
> >> >>> >>class
> >> >>> >>then will I get any performance improvement?
> >> >>> >>
> >> >>> >>As I am not getting input data from existing HBase table or HDFS
> >> files
> >> >>> >>there
> >> >>> >>will not be any input to map task.
> >> >>> >>The only advantage is multiple map tasks running simultaneously
> >>might
> >> >>> >>make
> >> >>> >>processing faster.
> >> >>> >>
> >> >>> >>Thanks!
> >> >>> >>Regars,
> >> >>> >>Abhay
> >> >>> >
> >> >>>
> >> >>
> >> >
> >> >
> >>
>
>

Re: loading data in HBase table using APIs

Reply via email to