oh! I think u have not read the full post. The essay has 3 paragraphs :) *Should I need to add the following line also
job.setPartitionerClass(TotalOrderPartitioner.class); which book? OReilly.Hadoop.The.Definitive.Guide.Jun.2009? On Thu, Nov 11, 2010 at 12:49 AM, Stack <[email protected]> wrote: > Which two questions (you wrote an essay that looked like one big > question -- smile). > St.Ack > > On Wed, Nov 10, 2010 at 11:44 AM, Shuja Rehman <[email protected]> > wrote: > > yeah, I tried it and it did not fails. can u answer other 2 questions as > > well? > > > > > > > > On Thu, Nov 11, 2010 at 12:15 AM, Stack <[email protected]> wrote: > > > >> All below looks reasonable (I did not do detailed review of your code > >> posting). Have you tried it? Did it fail? > >> St.Ack > >> > >> On Wed, Nov 10, 2010 at 11:12 AM, Shuja Rehman <[email protected]> > >> wrote: > >> > On Wed, Nov 10, 2010 at 9:20 PM, Stack <[email protected]> wrote: > >> > > >> >> What you need? bulk-upload, in the scheme of things, is a well > >> >> documented feature. Its also one that has had some exercise and is > >> >> known to work well. For a 0.89 release and trunk, documentation is > >> >> here: http://hbase.apache.org/docs/r0.89.20100924/bulk-loads.html. > >> >> The unit test you refer to below is good for figuring how to run a > job > >> >> (Bulk-upload was redone for 0.89/trunk and is much improved over what > >> >> was available in 0.20.x) > >> >> > >> > > >> > *I need to load data into hbase using Hfiles. * > >> > > >> > Ok, let me tell what I understand from all these things. Basically > there > >> are > >> > two ways to bulk load into hbase. > >> > > >> > 1- Using Command Line tools (importtsv, completebulkload ) > >> > 2- Mapreduce job using HFileOutputFormat > >> > > >> > At the moment, I have generated the Hfiles using HFileOutputFormat and > >> > loading these files into hbase using completebulkload command line > tool. > >> > here is my basic code skeleton. Correct me if I do anything wrong. > >> > > >> > Configuration conf = new Configuration(); > >> > Job job = new Job(conf, "myjob"); > >> > > >> > FileInputFormat.setInputPaths(job, input); > >> > job.setJarByClass(ParserDriver.class); > >> > job.setMapperClass(MyParserMapper.class); > >> > job.setNumReduceTasks(1); > >> > job.setInputFormatClass(XmlInputFormat.class); > >> > job.setOutputFormatClass(HFileOutputFormat.class); > >> > job.setOutputKeyClass(ImmutableBytesWritable.class); > >> > job.setOutputValueClass(Put.class); > >> > job.setReducerClass(PutSortReducer.class); > >> > > >> > Path outPath = new Path(output); > >> > FileOutputFormat.setOutputPath(job, outPath); > >> > job.waitForCompletion(true); > >> > > >> > and here is mapper skeleton > >> > > >> > public class MyParserMapper extends > >> > Mapper<LongWritable, Text, ImmutableBytesWritable, Put> { > >> > while(true) > >> > { > >> > Put put = new Put(rowId); > >> > put.add(...); > >> > context.write(rwId, put); > >> > } > >> > > >> > The link says: > >> > *In order to function efficiently, HFileOutputFormat must be > configured > >> such > >> > that each output HFile fits within a single region. In order to do > this, > >> > jobs use Hadoop's TotalOrderPartitioner class to partition the map > output > >> > into disjoint ranges of the key space, corresponding to the key ranges > of > >> > the regions in the table. *" > >> > > >> > Now according to my configuration above where i need to set > >> > *TotalOrderPartitioner > >> > ? *Should I need to add the following line also > >> > > >> > job.setPartitionerClass(TotalOrderPartitioner.class); > >> > > >> > > >> > > >> > On totalorderpartition, this is a partitioner class from hadoop. The > >> >> MR partitioner -- the class that dictates which reducers get what map > >> >> outputs -- is pluggable. The default partitioner does a hash of the > >> >> output key to figure which reducer. This won't work if you are > >> >> looking to have your hfile output totally sorted. > >> >> > >> >> > >> > > >> > > >> >> If you can't figure what its about, I'd suggest you check out the > >> >> hadoop book where it gets a good explication. > >> >> > >> >> which book? OReilly.Hadoop.The.Definitive.Guide.Jun.2009? > >> > > >> > On incremental upload, the doc. suggests you look at the output for > >> >> LoadIncrementalHFiles command. Have you done that? You run the > >> >> command and it'll add in whatever is ready for loading. > >> >> > >> > > >> > I just use the command line tool for bulk uplaod but not seen > >> > LoadIncrementalHFiles class yet to do it through program > >> > > >> > > >> > ------------------------------ > >> > > >> > > >> >> > >> >> St.Ack > >> >> > >> >> > >> >> On Wed, Nov 10, 2010 at 6:47 AM, Shuja Rehman <[email protected] > > > >> >> wrote: > >> >> > Hey Community, > >> >> > > >> >> > Well...it seems that nobody has experienced with the bulk load > option. > >> I > >> >> > have found one class which might help to write the code for it. > >> >> > > >> >> > > >> >> > >> > https://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java > >> >> > > >> >> > From this, you can get the idea how to write map reduce job to > output > >> in > >> >> > HFiles format. But There is a little confusion about these two > things > >> >> > > >> >> > 1-TotalOrderPartitioner > >> >> > 2-configureIncrementalLoad > >> >> > > >> >> > Does anybody have idea about how these things and how to configure > it > >> for > >> >> > the job? > >> >> > > >> >> > Thanks > >> >> > > >> >> > > >> >> > > >> >> > On Wed, Nov 10, 2010 at 1:02 AM, Shuja Rehman < > [email protected]> > >> >> wrote: > >> >> > > >> >> >> Hi > >> >> >> > >> >> >> I am trying to investigate the bulk load option as described in > the > >> >> >> following link. > >> >> >> > >> >> >> http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html > >> >> >> > >> >> >> Does anybody have sample code or have used it before? > >> >> >> Can it be helpful to insert data into existing table. In my > scenario, > >> I > >> >> >> have one table with 1 column family in which data will be inserted > >> every > >> >> 15 > >> >> >> minutes. > >> >> >> > >> >> >> Kindly share your experiences > >> >> >> > >> >> >> Thanks > >> >> >> -- > >> >> >> Regards > >> >> >> Shuja-ur-Rehman Baig > >> >> >> <http://pk.linkedin.com/in/shujamughal> > >> >> >> > >> >> >> > >> >> > > >> >> > > >> >> > -- > >> >> > Regards > >> >> > Shuja-ur-Rehman Baig > >> >> > <http://pk.linkedin.com/in/shujamughal> > >> >> > > >> >> > >> > > >> > > >> > > >> > -- > >> > Regards > >> > Shuja-ur-Rehman Baig > >> > <http://pk.linkedin.com/in/shujamughal> > >> > > >> > > > > > > > > -- > > Regards > > Shuja-ur-Rehman Baig > > <http://pk.linkedin.com/in/shujamughal> > > > -- Regards Shuja-ur-Rehman Baig <http://pk.linkedin.com/in/shujamughal>
