Which two questions (you wrote an essay that looked like one big question -- smile). St.Ack
On Wed, Nov 10, 2010 at 11:44 AM, Shuja Rehman <[email protected]> wrote: > yeah, I tried it and it did not fails. can u answer other 2 questions as > well? > > > > On Thu, Nov 11, 2010 at 12:15 AM, Stack <[email protected]> wrote: > >> All below looks reasonable (I did not do detailed review of your code >> posting). Have you tried it? Did it fail? >> St.Ack >> >> On Wed, Nov 10, 2010 at 11:12 AM, Shuja Rehman <[email protected]> >> wrote: >> > On Wed, Nov 10, 2010 at 9:20 PM, Stack <[email protected]> wrote: >> > >> >> What you need? bulk-upload, in the scheme of things, is a well >> >> documented feature. Its also one that has had some exercise and is >> >> known to work well. For a 0.89 release and trunk, documentation is >> >> here: http://hbase.apache.org/docs/r0.89.20100924/bulk-loads.html. >> >> The unit test you refer to below is good for figuring how to run a job >> >> (Bulk-upload was redone for 0.89/trunk and is much improved over what >> >> was available in 0.20.x) >> >> >> > >> > *I need to load data into hbase using Hfiles. * >> > >> > Ok, let me tell what I understand from all these things. Basically there >> are >> > two ways to bulk load into hbase. >> > >> > 1- Using Command Line tools (importtsv, completebulkload ) >> > 2- Mapreduce job using HFileOutputFormat >> > >> > At the moment, I have generated the Hfiles using HFileOutputFormat and >> > loading these files into hbase using completebulkload command line tool. >> > here is my basic code skeleton. Correct me if I do anything wrong. >> > >> > Configuration conf = new Configuration(); >> > Job job = new Job(conf, "myjob"); >> > >> > FileInputFormat.setInputPaths(job, input); >> > job.setJarByClass(ParserDriver.class); >> > job.setMapperClass(MyParserMapper.class); >> > job.setNumReduceTasks(1); >> > job.setInputFormatClass(XmlInputFormat.class); >> > job.setOutputFormatClass(HFileOutputFormat.class); >> > job.setOutputKeyClass(ImmutableBytesWritable.class); >> > job.setOutputValueClass(Put.class); >> > job.setReducerClass(PutSortReducer.class); >> > >> > Path outPath = new Path(output); >> > FileOutputFormat.setOutputPath(job, outPath); >> > job.waitForCompletion(true); >> > >> > and here is mapper skeleton >> > >> > public class MyParserMapper extends >> > Mapper<LongWritable, Text, ImmutableBytesWritable, Put> { >> > while(true) >> > { >> > Put put = new Put(rowId); >> > put.add(...); >> > context.write(rwId, put); >> > } >> > >> > The link says: >> > *In order to function efficiently, HFileOutputFormat must be configured >> such >> > that each output HFile fits within a single region. In order to do this, >> > jobs use Hadoop's TotalOrderPartitioner class to partition the map output >> > into disjoint ranges of the key space, corresponding to the key ranges of >> > the regions in the table. *" >> > >> > Now according to my configuration above where i need to set >> > *TotalOrderPartitioner >> > ? *Should I need to add the following line also >> > >> > job.setPartitionerClass(TotalOrderPartitioner.class); >> > >> > >> > >> > On totalorderpartition, this is a partitioner class from hadoop. The >> >> MR partitioner -- the class that dictates which reducers get what map >> >> outputs -- is pluggable. The default partitioner does a hash of the >> >> output key to figure which reducer. This won't work if you are >> >> looking to have your hfile output totally sorted. >> >> >> >> >> > >> > >> >> If you can't figure what its about, I'd suggest you check out the >> >> hadoop book where it gets a good explication. >> >> >> >> which book? OReilly.Hadoop.The.Definitive.Guide.Jun.2009? >> > >> > On incremental upload, the doc. suggests you look at the output for >> >> LoadIncrementalHFiles command. Have you done that? You run the >> >> command and it'll add in whatever is ready for loading. >> >> >> > >> > I just use the command line tool for bulk uplaod but not seen >> > LoadIncrementalHFiles class yet to do it through program >> > >> > >> > ------------------------------ >> > >> > >> >> >> >> St.Ack >> >> >> >> >> >> On Wed, Nov 10, 2010 at 6:47 AM, Shuja Rehman <[email protected]> >> >> wrote: >> >> > Hey Community, >> >> > >> >> > Well...it seems that nobody has experienced with the bulk load option. >> I >> >> > have found one class which might help to write the code for it. >> >> > >> >> > >> >> >> https://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java >> >> > >> >> > From this, you can get the idea how to write map reduce job to output >> in >> >> > HFiles format. But There is a little confusion about these two things >> >> > >> >> > 1-TotalOrderPartitioner >> >> > 2-configureIncrementalLoad >> >> > >> >> > Does anybody have idea about how these things and how to configure it >> for >> >> > the job? >> >> > >> >> > Thanks >> >> > >> >> > >> >> > >> >> > On Wed, Nov 10, 2010 at 1:02 AM, Shuja Rehman <[email protected]> >> >> wrote: >> >> > >> >> >> Hi >> >> >> >> >> >> I am trying to investigate the bulk load option as described in the >> >> >> following link. >> >> >> >> >> >> http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html >> >> >> >> >> >> Does anybody have sample code or have used it before? >> >> >> Can it be helpful to insert data into existing table. In my scenario, >> I >> >> >> have one table with 1 column family in which data will be inserted >> every >> >> 15 >> >> >> minutes. >> >> >> >> >> >> Kindly share your experiences >> >> >> >> >> >> Thanks >> >> >> -- >> >> >> Regards >> >> >> Shuja-ur-Rehman Baig >> >> >> <http://pk.linkedin.com/in/shujamughal> >> >> >> >> >> >> >> >> > >> >> > >> >> > -- >> >> > Regards >> >> > Shuja-ur-Rehman Baig >> >> > <http://pk.linkedin.com/in/shujamughal> >> >> > >> >> >> > >> > >> > >> > -- >> > Regards >> > Shuja-ur-Rehman Baig >> > <http://pk.linkedin.com/in/shujamughal> >> > >> > > > > -- > Regards > Shuja-ur-Rehman Baig > <http://pk.linkedin.com/in/shujamughal> >
