Re: Bulk Load Sample Code

Shuja Rehman Wed, 10 Nov 2010 11:53:29 -0800

oh! I think u have not read the full post. The essay has 3 paragraphs  :)

*Should I need to add the following line also


  job.setPartitionerClass(TotalOrderPartitioner.class);

which book? OReilly.Hadoop.The.Definitive.Guide.Jun.2009?




On Thu, Nov 11, 2010 at 12:49 AM, Stack <[email protected]> wrote:

> Which two questions (you wrote an essay that looked like one big
> question -- smile).
> St.Ack
>
> On Wed, Nov 10, 2010 at 11:44 AM, Shuja Rehman <[email protected]>
> wrote:
> > yeah, I tried it and it did not fails. can u answer other 2 questions as
> > well?
> >
> >
> >
> > On Thu, Nov 11, 2010 at 12:15 AM, Stack <[email protected]> wrote:
> >
> >> All below looks reasonable (I did not do detailed review of your code
> >> posting).  Have you tried it?  Did it fail?
> >> St.Ack
> >>
> >> On Wed, Nov 10, 2010 at 11:12 AM, Shuja Rehman <[email protected]>
> >> wrote:
> >> > On Wed, Nov 10, 2010 at 9:20 PM, Stack <[email protected]> wrote:
> >> >
> >> >> What you need?  bulk-upload, in the scheme of things, is a well
> >> >> documented feature.  Its also one that has had some exercise and is
> >> >> known to work well.  For a 0.89 release and trunk, documentation is
> >> >> here: http://hbase.apache.org/docs/r0.89.20100924/bulk-loads.html.
> >> >> The unit test you refer to below is good for figuring how to run a
> job
> >> >> (Bulk-upload was redone for 0.89/trunk and is much improved over what
> >> >> was available in 0.20.x)
> >> >>
> >> >
> >> > *I need to load data into hbase using Hfiles.  *
> >> >
> >> > Ok, let me tell what I understand from all these things. Basically
> there
> >> are
> >> > two ways to bulk load into hbase.
> >> >
> >> > 1- Using Command Line tools (importtsv, completebulkload )
> >> > 2- Mapreduce job using HFileOutputFormat
> >> >
> >> > At the moment, I have generated the Hfiles using HFileOutputFormat and
> >> > loading these files into hbase using completebulkload command line
> tool.
> >> > here is my basic code skeleton. Correct me if I do anything wrong.
> >> >
> >> > Configuration conf = new Configuration();
> >> > Job job = new Job(conf, "myjob");
> >> >
> >> >    FileInputFormat.setInputPaths(job, input);
> >> >    job.setJarByClass(ParserDriver.class);
> >> >    job.setMapperClass(MyParserMapper.class);
> >> >    job.setNumReduceTasks(1);
> >> >    job.setInputFormatClass(XmlInputFormat.class);
> >> >    job.setOutputFormatClass(HFileOutputFormat.class);
> >> >    job.setOutputKeyClass(ImmutableBytesWritable.class);
> >> >    job.setOutputValueClass(Put.class);
> >> >    job.setReducerClass(PutSortReducer.class);
> >> >
> >> >    Path outPath = new Path(output);
> >> >    FileOutputFormat.setOutputPath(job, outPath);
> >> >          job.waitForCompletion(true);
> >> >
> >> > and here is mapper skeleton
> >> >
> >> > public class MyParserMapper   extends
> >> >    Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
> >> >  while(true)
> >> >   {
> >> >       Put put = new Put(rowId);
> >> >      put.add(...);
> >> >      context.write(rwId, put);
> >> >   }
> >> >
> >> > The link says:
> >> > *In order to function efficiently, HFileOutputFormat must be
> configured
> >> such
> >> > that each output HFile fits within a single region. In order to do
> this,
> >> > jobs use Hadoop's TotalOrderPartitioner class to partition the map
> output
> >> > into disjoint ranges of the key space, corresponding to the key ranges
> of
> >> > the regions in the table. *"
> >> >
> >> > Now according to my configuration above  where i need to set
> >> > *TotalOrderPartitioner
> >> > ? *Should I need to add the following line also
> >> >
> >> > job.setPartitionerClass(TotalOrderPartitioner.class);
> >> >
> >> >
> >> >
> >> > On totalorderpartition, this is a partitioner class from hadoop.  The
> >> >> MR partitioner -- the class that dictates which reducers get what map
> >> >> outputs -- is pluggable. The default partitioner does a hash of the
> >> >> output key to figure which reducer.  This won't work if you are
> >> >> looking to have your hfile output totally sorted.
> >> >>
> >> >>
> >> >
> >> >
> >> >> If you can't figure what its about, I'd suggest you check out the
> >> >> hadoop book where it gets a good explication.
> >> >>
> >> >>   which book? OReilly.Hadoop.The.Definitive.Guide.Jun.2009?
> >> >
> >> > On incremental upload, the doc. suggests you look at the output for
> >> >> LoadIncrementalHFiles command.  Have you done that?  You run the
> >> >> command and it'll add in whatever is ready for loading.
> >> >>
> >> >
> >> >   I just use the command line tool for bulk uplaod but not seen
> >> > LoadIncrementalHFiles  class yet to do it through program
> >> >
> >> >
> >> >  ------------------------------
> >> >
> >> >
> >> >>
> >> >> St.Ack
> >> >>
> >> >>
> >> >> On Wed, Nov 10, 2010 at 6:47 AM, Shuja Rehman <[email protected]
> >
> >> >> wrote:
> >> >> > Hey Community,
> >> >> >
> >> >> > Well...it seems that nobody has experienced with the bulk load
> option.
> >> I
> >> >> > have found one class which might help to write the code for it.
> >> >> >
> >> >> >
> >> >>
> >>
> https://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java
> >> >> >
> >> >> > From this, you can get the idea how to write map reduce job to
> output
> >> in
> >> >> > HFiles format. But There is a little confusion about these two
> things
> >> >> >
> >> >> > 1-TotalOrderPartitioner
> >> >> > 2-configureIncrementalLoad
> >> >> >
> >> >> > Does anybody have idea about how these things and how to configure
> it
> >> for
> >> >> > the job?
> >> >> >
> >> >> > Thanks
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Wed, Nov 10, 2010 at 1:02 AM, Shuja Rehman <
> [email protected]>
> >> >> wrote:
> >> >> >
> >> >> >> Hi
> >> >> >>
> >> >> >> I am trying to investigate the bulk load option as described in
> the
> >> >> >> following link.
> >> >> >>
> >> >> >> http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html
> >> >> >>
> >> >> >> Does anybody have sample code or have used it before?
> >> >> >> Can it be helpful to insert data into existing table. In my
> scenario,
> >> I
> >> >> >> have one table with 1 column family in which data will be inserted
> >> every
> >> >> 15
> >> >> >> minutes.
> >> >> >>
> >> >> >> Kindly share your experiences
> >> >> >>
> >> >> >> Thanks
> >> >> >> --
> >> >> >> Regards
> >> >> >> Shuja-ur-Rehman Baig
> >> >> >> <http://pk.linkedin.com/in/shujamughal>
> >> >> >>
> >> >> >>
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Regards
> >> >> > Shuja-ur-Rehman Baig
> >> >> > <http://pk.linkedin.com/in/shujamughal>
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Regards
> >> > Shuja-ur-Rehman Baig
> >> > <http://pk.linkedin.com/in/shujamughal>
> >> >
> >>
> >
> >
> >
> > --
> > Regards
> > Shuja-ur-Rehman Baig
> > <http://pk.linkedin.com/in/shujamughal>
> >
>



-- 
Regards
Shuja-ur-Rehman Baig
<http://pk.linkedin.com/in/shujamughal>

Re: Bulk Load Sample Code

Reply via email to