Re: Bulk Load Sample Code

Stack Wed, 10 Nov 2010 11:16:02 -0800

All below looks reasonable (I did not do detailed review of your code
posting).  Have you tried it?  Did it fail?
St.Ack


On Wed, Nov 10, 2010 at 11:12 AM, Shuja Rehman <[email protected]> wrote:
> On Wed, Nov 10, 2010 at 9:20 PM, Stack <[email protected]> wrote:
>
>> What you need?  bulk-upload, in the scheme of things, is a well
>> documented feature.  Its also one that has had some exercise and is
>> known to work well.  For a 0.89 release and trunk, documentation is
>> here: http://hbase.apache.org/docs/r0.89.20100924/bulk-loads.html.
>> The unit test you refer to below is good for figuring how to run a job
>> (Bulk-upload was redone for 0.89/trunk and is much improved over what
>> was available in 0.20.x)
>>
>
> *I need to load data into hbase using Hfiles.  *
>
> Ok, let me tell what I understand from all these things. Basically there are
> two ways to bulk load into hbase.
>
> 1- Using Command Line tools (importtsv, completebulkload )
> 2- Mapreduce job using HFileOutputFormat
>
> At the moment, I have generated the Hfiles using HFileOutputFormat and
> loading these files into hbase using completebulkload command line tool.
> here is my basic code skeleton. Correct me if I do anything wrong.
>
> Configuration conf = new Configuration();
> Job job = new Job(conf, "myjob");
>
>    FileInputFormat.setInputPaths(job, input);
>    job.setJarByClass(ParserDriver.class);
>    job.setMapperClass(MyParserMapper.class);
>    job.setNumReduceTasks(1);
>    job.setInputFormatClass(XmlInputFormat.class);
>    job.setOutputFormatClass(HFileOutputFormat.class);
>    job.setOutputKeyClass(ImmutableBytesWritable.class);
>    job.setOutputValueClass(Put.class);
>    job.setReducerClass(PutSortReducer.class);
>
>    Path outPath = new Path(output);
>    FileOutputFormat.setOutputPath(job, outPath);
>          job.waitForCompletion(true);
>
> and here is mapper skeleton
>
> public class MyParserMapper   extends
>    Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
>  while(true)
>   {
>       Put put = new Put(rowId);
>      put.add(...);
>      context.write(rwId, put);
>   }
>
> The link says:
> *In order to function efficiently, HFileOutputFormat must be configured such
> that each output HFile fits within a single region. In order to do this,
> jobs use Hadoop's TotalOrderPartitioner class to partition the map output
> into disjoint ranges of the key space, corresponding to the key ranges of
> the regions in the table. *"
>
> Now according to my configuration above  where i need to set
> *TotalOrderPartitioner
> ? *Should I need to add the following line also
>
> job.setPartitionerClass(TotalOrderPartitioner.class);
>
>
>
> On totalorderpartition, this is a partitioner class from hadoop.  The
>> MR partitioner -- the class that dictates which reducers get what map
>> outputs -- is pluggable. The default partitioner does a hash of the
>> output key to figure which reducer.  This won't work if you are
>> looking to have your hfile output totally sorted.
>>
>>
>
>
>> If you can't figure what its about, I'd suggest you check out the
>> hadoop book where it gets a good explication.
>>
>>   which book? OReilly.Hadoop.The.Definitive.Guide.Jun.2009?
>
> On incremental upload, the doc. suggests you look at the output for
>> LoadIncrementalHFiles command.  Have you done that?  You run the
>> command and it'll add in whatever is ready for loading.
>>
>
>   I just use the command line tool for bulk uplaod but not seen
> LoadIncrementalHFiles  class yet to do it through program
>
>
>  ------------------------------
>
>
>>
>> St.Ack
>>
>>
>> On Wed, Nov 10, 2010 at 6:47 AM, Shuja Rehman <[email protected]>
>> wrote:
>> > Hey Community,
>> >
>> > Well...it seems that nobody has experienced with the bulk load option. I
>> > have found one class which might help to write the code for it.
>> >
>> >
>> https://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java
>> >
>> > From this, you can get the idea how to write map reduce job to output in
>> > HFiles format. But There is a little confusion about these two things
>> >
>> > 1-TotalOrderPartitioner
>> > 2-configureIncrementalLoad
>> >
>> > Does anybody have idea about how these things and how to configure it for
>> > the job?
>> >
>> > Thanks
>> >
>> >
>> >
>> > On Wed, Nov 10, 2010 at 1:02 AM, Shuja Rehman <[email protected]>
>> wrote:
>> >
>> >> Hi
>> >>
>> >> I am trying to investigate the bulk load option as described in the
>> >> following link.
>> >>
>> >> http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html
>> >>
>> >> Does anybody have sample code or have used it before?
>> >> Can it be helpful to insert data into existing table. In my scenario, I
>> >> have one table with 1 column family in which data will be inserted every
>> 15
>> >> minutes.
>> >>
>> >> Kindly share your experiences
>> >>
>> >> Thanks
>> >> --
>> >> Regards
>> >> Shuja-ur-Rehman Baig
>> >> <http://pk.linkedin.com/in/shujamughal>
>> >>
>> >>
>> >
>> >
>> > --
>> > Regards
>> > Shuja-ur-Rehman Baig
>> > <http://pk.linkedin.com/in/shujamughal>
>> >
>>
>
>
>
> --
> Regards
> Shuja-ur-Rehman Baig
> <http://pk.linkedin.com/in/shujamughal>
>

Re: Bulk Load Sample Code

Reply via email to