All below looks reasonable (I did not do detailed review of your code posting). Have you tried it? Did it fail? St.Ack
On Wed, Nov 10, 2010 at 11:12 AM, Shuja Rehman <[email protected]> wrote: > On Wed, Nov 10, 2010 at 9:20 PM, Stack <[email protected]> wrote: > >> What you need? bulk-upload, in the scheme of things, is a well >> documented feature. Its also one that has had some exercise and is >> known to work well. For a 0.89 release and trunk, documentation is >> here: http://hbase.apache.org/docs/r0.89.20100924/bulk-loads.html. >> The unit test you refer to below is good for figuring how to run a job >> (Bulk-upload was redone for 0.89/trunk and is much improved over what >> was available in 0.20.x) >> > > *I need to load data into hbase using Hfiles. * > > Ok, let me tell what I understand from all these things. Basically there are > two ways to bulk load into hbase. > > 1- Using Command Line tools (importtsv, completebulkload ) > 2- Mapreduce job using HFileOutputFormat > > At the moment, I have generated the Hfiles using HFileOutputFormat and > loading these files into hbase using completebulkload command line tool. > here is my basic code skeleton. Correct me if I do anything wrong. > > Configuration conf = new Configuration(); > Job job = new Job(conf, "myjob"); > > FileInputFormat.setInputPaths(job, input); > job.setJarByClass(ParserDriver.class); > job.setMapperClass(MyParserMapper.class); > job.setNumReduceTasks(1); > job.setInputFormatClass(XmlInputFormat.class); > job.setOutputFormatClass(HFileOutputFormat.class); > job.setOutputKeyClass(ImmutableBytesWritable.class); > job.setOutputValueClass(Put.class); > job.setReducerClass(PutSortReducer.class); > > Path outPath = new Path(output); > FileOutputFormat.setOutputPath(job, outPath); > job.waitForCompletion(true); > > and here is mapper skeleton > > public class MyParserMapper extends > Mapper<LongWritable, Text, ImmutableBytesWritable, Put> { > while(true) > { > Put put = new Put(rowId); > put.add(...); > context.write(rwId, put); > } > > The link says: > *In order to function efficiently, HFileOutputFormat must be configured such > that each output HFile fits within a single region. In order to do this, > jobs use Hadoop's TotalOrderPartitioner class to partition the map output > into disjoint ranges of the key space, corresponding to the key ranges of > the regions in the table. *" > > Now according to my configuration above where i need to set > *TotalOrderPartitioner > ? *Should I need to add the following line also > > job.setPartitionerClass(TotalOrderPartitioner.class); > > > > On totalorderpartition, this is a partitioner class from hadoop. The >> MR partitioner -- the class that dictates which reducers get what map >> outputs -- is pluggable. The default partitioner does a hash of the >> output key to figure which reducer. This won't work if you are >> looking to have your hfile output totally sorted. >> >> > > >> If you can't figure what its about, I'd suggest you check out the >> hadoop book where it gets a good explication. >> >> which book? OReilly.Hadoop.The.Definitive.Guide.Jun.2009? > > On incremental upload, the doc. suggests you look at the output for >> LoadIncrementalHFiles command. Have you done that? You run the >> command and it'll add in whatever is ready for loading. >> > > I just use the command line tool for bulk uplaod but not seen > LoadIncrementalHFiles class yet to do it through program > > > ------------------------------ > > >> >> St.Ack >> >> >> On Wed, Nov 10, 2010 at 6:47 AM, Shuja Rehman <[email protected]> >> wrote: >> > Hey Community, >> > >> > Well...it seems that nobody has experienced with the bulk load option. I >> > have found one class which might help to write the code for it. >> > >> > >> https://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java >> > >> > From this, you can get the idea how to write map reduce job to output in >> > HFiles format. But There is a little confusion about these two things >> > >> > 1-TotalOrderPartitioner >> > 2-configureIncrementalLoad >> > >> > Does anybody have idea about how these things and how to configure it for >> > the job? >> > >> > Thanks >> > >> > >> > >> > On Wed, Nov 10, 2010 at 1:02 AM, Shuja Rehman <[email protected]> >> wrote: >> > >> >> Hi >> >> >> >> I am trying to investigate the bulk load option as described in the >> >> following link. >> >> >> >> http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html >> >> >> >> Does anybody have sample code or have used it before? >> >> Can it be helpful to insert data into existing table. In my scenario, I >> >> have one table with 1 column family in which data will be inserted every >> 15 >> >> minutes. >> >> >> >> Kindly share your experiences >> >> >> >> Thanks >> >> -- >> >> Regards >> >> Shuja-ur-Rehman Baig >> >> <http://pk.linkedin.com/in/shujamughal> >> >> >> >> >> > >> > >> > -- >> > Regards >> > Shuja-ur-Rehman Baig >> > <http://pk.linkedin.com/in/shujamughal> >> > >> > > > > -- > Regards > Shuja-ur-Rehman Baig > <http://pk.linkedin.com/in/shujamughal> >
