yeah, I tried it and it did not fails. can u answer other 2 questions as well?
On Thu, Nov 11, 2010 at 12:15 AM, Stack <[email protected]> wrote: > All below looks reasonable (I did not do detailed review of your code > posting). Have you tried it? Did it fail? > St.Ack > > On Wed, Nov 10, 2010 at 11:12 AM, Shuja Rehman <[email protected]> > wrote: > > On Wed, Nov 10, 2010 at 9:20 PM, Stack <[email protected]> wrote: > > > >> What you need? bulk-upload, in the scheme of things, is a well > >> documented feature. Its also one that has had some exercise and is > >> known to work well. For a 0.89 release and trunk, documentation is > >> here: http://hbase.apache.org/docs/r0.89.20100924/bulk-loads.html. > >> The unit test you refer to below is good for figuring how to run a job > >> (Bulk-upload was redone for 0.89/trunk and is much improved over what > >> was available in 0.20.x) > >> > > > > *I need to load data into hbase using Hfiles. * > > > > Ok, let me tell what I understand from all these things. Basically there > are > > two ways to bulk load into hbase. > > > > 1- Using Command Line tools (importtsv, completebulkload ) > > 2- Mapreduce job using HFileOutputFormat > > > > At the moment, I have generated the Hfiles using HFileOutputFormat and > > loading these files into hbase using completebulkload command line tool. > > here is my basic code skeleton. Correct me if I do anything wrong. > > > > Configuration conf = new Configuration(); > > Job job = new Job(conf, "myjob"); > > > > FileInputFormat.setInputPaths(job, input); > > job.setJarByClass(ParserDriver.class); > > job.setMapperClass(MyParserMapper.class); > > job.setNumReduceTasks(1); > > job.setInputFormatClass(XmlInputFormat.class); > > job.setOutputFormatClass(HFileOutputFormat.class); > > job.setOutputKeyClass(ImmutableBytesWritable.class); > > job.setOutputValueClass(Put.class); > > job.setReducerClass(PutSortReducer.class); > > > > Path outPath = new Path(output); > > FileOutputFormat.setOutputPath(job, outPath); > > job.waitForCompletion(true); > > > > and here is mapper skeleton > > > > public class MyParserMapper extends > > Mapper<LongWritable, Text, ImmutableBytesWritable, Put> { > > while(true) > > { > > Put put = new Put(rowId); > > put.add(...); > > context.write(rwId, put); > > } > > > > The link says: > > *In order to function efficiently, HFileOutputFormat must be configured > such > > that each output HFile fits within a single region. In order to do this, > > jobs use Hadoop's TotalOrderPartitioner class to partition the map output > > into disjoint ranges of the key space, corresponding to the key ranges of > > the regions in the table. *" > > > > Now according to my configuration above where i need to set > > *TotalOrderPartitioner > > ? *Should I need to add the following line also > > > > job.setPartitionerClass(TotalOrderPartitioner.class); > > > > > > > > On totalorderpartition, this is a partitioner class from hadoop. The > >> MR partitioner -- the class that dictates which reducers get what map > >> outputs -- is pluggable. The default partitioner does a hash of the > >> output key to figure which reducer. This won't work if you are > >> looking to have your hfile output totally sorted. > >> > >> > > > > > >> If you can't figure what its about, I'd suggest you check out the > >> hadoop book where it gets a good explication. > >> > >> which book? OReilly.Hadoop.The.Definitive.Guide.Jun.2009? > > > > On incremental upload, the doc. suggests you look at the output for > >> LoadIncrementalHFiles command. Have you done that? You run the > >> command and it'll add in whatever is ready for loading. > >> > > > > I just use the command line tool for bulk uplaod but not seen > > LoadIncrementalHFiles class yet to do it through program > > > > > > ------------------------------ > > > > > >> > >> St.Ack > >> > >> > >> On Wed, Nov 10, 2010 at 6:47 AM, Shuja Rehman <[email protected]> > >> wrote: > >> > Hey Community, > >> > > >> > Well...it seems that nobody has experienced with the bulk load option. > I > >> > have found one class which might help to write the code for it. > >> > > >> > > >> > https://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java > >> > > >> > From this, you can get the idea how to write map reduce job to output > in > >> > HFiles format. But There is a little confusion about these two things > >> > > >> > 1-TotalOrderPartitioner > >> > 2-configureIncrementalLoad > >> > > >> > Does anybody have idea about how these things and how to configure it > for > >> > the job? > >> > > >> > Thanks > >> > > >> > > >> > > >> > On Wed, Nov 10, 2010 at 1:02 AM, Shuja Rehman <[email protected]> > >> wrote: > >> > > >> >> Hi > >> >> > >> >> I am trying to investigate the bulk load option as described in the > >> >> following link. > >> >> > >> >> http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html > >> >> > >> >> Does anybody have sample code or have used it before? > >> >> Can it be helpful to insert data into existing table. In my scenario, > I > >> >> have one table with 1 column family in which data will be inserted > every > >> 15 > >> >> minutes. > >> >> > >> >> Kindly share your experiences > >> >> > >> >> Thanks > >> >> -- > >> >> Regards > >> >> Shuja-ur-Rehman Baig > >> >> <http://pk.linkedin.com/in/shujamughal> > >> >> > >> >> > >> > > >> > > >> > -- > >> > Regards > >> > Shuja-ur-Rehman Baig > >> > <http://pk.linkedin.com/in/shujamughal> > >> > > >> > > > > > > > > -- > > Regards > > Shuja-ur-Rehman Baig > > <http://pk.linkedin.com/in/shujamughal> > > > -- Regards Shuja-ur-Rehman Baig <http://pk.linkedin.com/in/shujamughal>
