Re: Bulk Load Sample Code

Stack Wed, 10 Nov 2010 11:50:00 -0800

Which two questions (you wrote an essay that looked like one big
question -- smile).
St.Ack


On Wed, Nov 10, 2010 at 11:44 AM, Shuja Rehman <[email protected]> wrote:
> yeah, I tried it and it did not fails. can u answer other 2 questions as
> well?
>
>
>
> On Thu, Nov 11, 2010 at 12:15 AM, Stack <[email protected]> wrote:
>
>> All below looks reasonable (I did not do detailed review of your code
>> posting).  Have you tried it?  Did it fail?
>> St.Ack
>>
>> On Wed, Nov 10, 2010 at 11:12 AM, Shuja Rehman <[email protected]>
>> wrote:
>> > On Wed, Nov 10, 2010 at 9:20 PM, Stack <[email protected]> wrote:
>> >
>> >> What you need?  bulk-upload, in the scheme of things, is a well
>> >> documented feature.  Its also one that has had some exercise and is
>> >> known to work well.  For a 0.89 release and trunk, documentation is
>> >> here: http://hbase.apache.org/docs/r0.89.20100924/bulk-loads.html.
>> >> The unit test you refer to below is good for figuring how to run a job
>> >> (Bulk-upload was redone for 0.89/trunk and is much improved over what
>> >> was available in 0.20.x)
>> >>
>> >
>> > *I need to load data into hbase using Hfiles.  *
>> >
>> > Ok, let me tell what I understand from all these things. Basically there
>> are
>> > two ways to bulk load into hbase.
>> >
>> > 1- Using Command Line tools (importtsv, completebulkload )
>> > 2- Mapreduce job using HFileOutputFormat
>> >
>> > At the moment, I have generated the Hfiles using HFileOutputFormat and
>> > loading these files into hbase using completebulkload command line tool.
>> > here is my basic code skeleton. Correct me if I do anything wrong.
>> >
>> > Configuration conf = new Configuration();
>> > Job job = new Job(conf, "myjob");
>> >
>> >    FileInputFormat.setInputPaths(job, input);
>> >    job.setJarByClass(ParserDriver.class);
>> >    job.setMapperClass(MyParserMapper.class);
>> >    job.setNumReduceTasks(1);
>> >    job.setInputFormatClass(XmlInputFormat.class);
>> >    job.setOutputFormatClass(HFileOutputFormat.class);
>> >    job.setOutputKeyClass(ImmutableBytesWritable.class);
>> >    job.setOutputValueClass(Put.class);
>> >    job.setReducerClass(PutSortReducer.class);
>> >
>> >    Path outPath = new Path(output);
>> >    FileOutputFormat.setOutputPath(job, outPath);
>> >          job.waitForCompletion(true);
>> >
>> > and here is mapper skeleton
>> >
>> > public class MyParserMapper   extends
>> >    Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
>> >  while(true)
>> >   {
>> >       Put put = new Put(rowId);
>> >      put.add(...);
>> >      context.write(rwId, put);
>> >   }
>> >
>> > The link says:
>> > *In order to function efficiently, HFileOutputFormat must be configured
>> such
>> > that each output HFile fits within a single region. In order to do this,
>> > jobs use Hadoop's TotalOrderPartitioner class to partition the map output
>> > into disjoint ranges of the key space, corresponding to the key ranges of
>> > the regions in the table. *"
>> >
>> > Now according to my configuration above  where i need to set
>> > *TotalOrderPartitioner
>> > ? *Should I need to add the following line also
>> >
>> > job.setPartitionerClass(TotalOrderPartitioner.class);
>> >
>> >
>> >
>> > On totalorderpartition, this is a partitioner class from hadoop.  The
>> >> MR partitioner -- the class that dictates which reducers get what map
>> >> outputs -- is pluggable. The default partitioner does a hash of the
>> >> output key to figure which reducer.  This won't work if you are
>> >> looking to have your hfile output totally sorted.
>> >>
>> >>
>> >
>> >
>> >> If you can't figure what its about, I'd suggest you check out the
>> >> hadoop book where it gets a good explication.
>> >>
>> >>   which book? OReilly.Hadoop.The.Definitive.Guide.Jun.2009?
>> >
>> > On incremental upload, the doc. suggests you look at the output for
>> >> LoadIncrementalHFiles command.  Have you done that?  You run the
>> >> command and it'll add in whatever is ready for loading.
>> >>
>> >
>> >   I just use the command line tool for bulk uplaod but not seen
>> > LoadIncrementalHFiles  class yet to do it through program
>> >
>> >
>> >  ------------------------------
>> >
>> >
>> >>
>> >> St.Ack
>> >>
>> >>
>> >> On Wed, Nov 10, 2010 at 6:47 AM, Shuja Rehman <[email protected]>
>> >> wrote:
>> >> > Hey Community,
>> >> >
>> >> > Well...it seems that nobody has experienced with the bulk load option.
>> I
>> >> > have found one class which might help to write the code for it.
>> >> >
>> >> >
>> >>
>> https://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java
>> >> >
>> >> > From this, you can get the idea how to write map reduce job to output
>> in
>> >> > HFiles format. But There is a little confusion about these two things
>> >> >
>> >> > 1-TotalOrderPartitioner
>> >> > 2-configureIncrementalLoad
>> >> >
>> >> > Does anybody have idea about how these things and how to configure it
>> for
>> >> > the job?
>> >> >
>> >> > Thanks
>> >> >
>> >> >
>> >> >
>> >> > On Wed, Nov 10, 2010 at 1:02 AM, Shuja Rehman <[email protected]>
>> >> wrote:
>> >> >
>> >> >> Hi
>> >> >>
>> >> >> I am trying to investigate the bulk load option as described in the
>> >> >> following link.
>> >> >>
>> >> >> http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html
>> >> >>
>> >> >> Does anybody have sample code or have used it before?
>> >> >> Can it be helpful to insert data into existing table. In my scenario,
>> I
>> >> >> have one table with 1 column family in which data will be inserted
>> every
>> >> 15
>> >> >> minutes.
>> >> >>
>> >> >> Kindly share your experiences
>> >> >>
>> >> >> Thanks
>> >> >> --
>> >> >> Regards
>> >> >> Shuja-ur-Rehman Baig
>> >> >> <http://pk.linkedin.com/in/shujamughal>
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >> > --
>> >> > Regards
>> >> > Shuja-ur-Rehman Baig
>> >> > <http://pk.linkedin.com/in/shujamughal>
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> > Regards
>> > Shuja-ur-Rehman Baig
>> > <http://pk.linkedin.com/in/shujamughal>
>> >
>>
>
>
>
> --
> Regards
> Shuja-ur-Rehman Baig
> <http://pk.linkedin.com/in/shujamughal>
>

Re: Bulk Load Sample Code

Reply via email to