Fix your classpath. Add the google library. See http://hbase.apache.org/docs/r0.89.20100924/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description for more on classpath.
St.Ack On Fri, Nov 12, 2010 at 5:07 AM, Shuja Rehman <[email protected]> wrote: > Hi > > I am trying to use configureIncrementalLoad() function to handle the > totalOrderPartitioning but it throws this exception. > > 10/11/12 05:02:21 INFO zookeeper.ClientCnxn: Opening socket connection to > server /10.10.10.2:2181 > 10/11/12 05:02:21 INFO zookeeper.ClientCnxn: Socket connection established > to app4.hsd1.wa.comcast.net./10.10.10.2:2181, initiating session > 10/11/12 05:02:21 INFO zookeeper.ClientCnxn: Session establishment complete > on server app4.hsd1.wa.comcast.net./10.10.10.2:2181, sessionid = > 0x12c401bfdae0008, negotiated timeout = 40000 > 10/11/12 05:02:21 INFO mapreduce.HFileOutputFormat: Looking up current > regions for table org.apache.hadoop.hbase.client.hta...@21e554 > 10/11/12 05:02:21 INFO mapreduce.HFileOutputFormat: Configuring 1 reduce > partitions to match current region count > 10/11/12 05:02:21 INFO mapreduce.HFileOutputFormat: Writing partition > information to > hdfs://app4.hsd1.wa.comcast.net./user/root/partitions_1289566941504 > Exception in thread "main" java.lang.NoClassDefFoundError: > com/google/common/base/Preconditions > at > org.apache.hadoop.hbase.mapreduce.HFileOutputFormat.writePartitions(HFileOutputFormat.java:185) > at > org.apache.hadoop.hbase.mapreduce.HFileOutputFormat.configureIncrementalLoad(HFileOutputFormat.java:258) > at ParserDriver.runJob(ParserDriver.java:162) > at ParserDriver.main(ParserDriver.java:109) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:186) > Caused by: java.lang.ClassNotFoundException: > com.google.common.base.Preconditions > at java.net.URLClassLoader$1.run(URLClassLoader.java:202) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:190) > at java.lang.ClassLoader.loadClass(ClassLoader.java:307) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) > at java.lang.ClassLoader.loadClass(ClassLoader.java:248) > ... 9 more > 10/11/12 05:02:21 INFO zookeeper.ZooKeeper: Session: 0x12c401bfdae0008 > closed > > Here is the code. > > Configuration conf = HBaseConfiguration.create(); > > Job job = new Job(conf, "j"); > > HTable table = new HTable(conf, "mytab"); > > FileInputFormat.setInputPaths(job, input); > job.setJarByClass(ParserDriver.class); > job.setMapperClass(MyParserMapper.class); > > job.setInputFormatClass(XmlInputFormat.class); > job.setReducerClass(PutSortReducer.class); > Path outPath = new Path(output); > FileOutputFormat.setOutputPath(job, outPath); > > job.setMapOutputValueClass(Put.class); > job.setMapOutputKeyClass(ImmutableBytesWritable.class); > * HFileOutputFormat.configureIncrementalLoad(job, table);* > TableMapReduceUtil.addDependencyJars(job); > job.waitForCompletion(true); > > I guess, there are some jar files missing. if yes then from where to get > these? > > Thanks > > On Thu, Nov 11, 2010 at 12:57 AM, Stack <[email protected]> wrote: > >> On Wed, Nov 10, 2010 at 11:53 AM, Shuja Rehman <[email protected]> >> wrote: >> > oh! I think u have not read the full post. The essay has 3 paragraphs :) >> > >> > *Should I need to add the following line also >> > >> > job.setPartitionerClass(TotalOrderPartitioner.class); >> > >> >> You need to specify other than default partitioner so yes, above seems >> necessary (Be aware that if only one reducer, all may appear to work >> though your partitioner is bad... its when you have multiple reducers >> that bad partitioner will show). >> >> > which book? OReilly.Hadoop.The.Definitive.Guide.Jun.2009? >> > >> >> Yes. Or 2nd edition, October 2010. >> >> St.Ack >> >> > >> > >> > >> > On Thu, Nov 11, 2010 at 12:49 AM, Stack <[email protected]> wrote: >> > >> >> Which two questions (you wrote an essay that looked like one big >> >> question -- smile). >> >> St.Ack >> >> >> >> On Wed, Nov 10, 2010 at 11:44 AM, Shuja Rehman <[email protected]> >> >> wrote: >> >> > yeah, I tried it and it did not fails. can u answer other 2 questions >> as >> >> > well? >> >> > >> >> > >> >> > >> >> > On Thu, Nov 11, 2010 at 12:15 AM, Stack <[email protected]> wrote: >> >> > >> >> >> All below looks reasonable (I did not do detailed review of your code >> >> >> posting). Have you tried it? Did it fail? >> >> >> St.Ack >> >> >> >> >> >> On Wed, Nov 10, 2010 at 11:12 AM, Shuja Rehman < >> [email protected]> >> >> >> wrote: >> >> >> > On Wed, Nov 10, 2010 at 9:20 PM, Stack <[email protected]> wrote: >> >> >> > >> >> >> >> What you need? bulk-upload, in the scheme of things, is a well >> >> >> >> documented feature. Its also one that has had some exercise and >> is >> >> >> >> known to work well. For a 0.89 release and trunk, documentation >> is >> >> >> >> here: http://hbase.apache.org/docs/r0.89.20100924/bulk-loads.html >> . >> >> >> >> The unit test you refer to below is good for figuring how to run a >> >> job >> >> >> >> (Bulk-upload was redone for 0.89/trunk and is much improved over >> what >> >> >> >> was available in 0.20.x) >> >> >> >> >> >> >> > >> >> >> > *I need to load data into hbase using Hfiles. * >> >> >> > >> >> >> > Ok, let me tell what I understand from all these things. Basically >> >> there >> >> >> are >> >> >> > two ways to bulk load into hbase. >> >> >> > >> >> >> > 1- Using Command Line tools (importtsv, completebulkload ) >> >> >> > 2- Mapreduce job using HFileOutputFormat >> >> >> > >> >> >> > At the moment, I have generated the Hfiles using HFileOutputFormat >> and >> >> >> > loading these files into hbase using completebulkload command line >> >> tool. >> >> >> > here is my basic code skeleton. Correct me if I do anything wrong. >> >> >> > >> >> >> > Configuration conf = new Configuration(); >> >> >> > Job job = new Job(conf, "myjob"); >> >> >> > >> >> >> > FileInputFormat.setInputPaths(job, input); >> >> >> > job.setJarByClass(ParserDriver.class); >> >> >> > job.setMapperClass(MyParserMapper.class); >> >> >> > job.setNumReduceTasks(1); >> >> >> > job.setInputFormatClass(XmlInputFormat.class); >> >> >> > job.setOutputFormatClass(HFileOutputFormat.class); >> >> >> > job.setOutputKeyClass(ImmutableBytesWritable.class); >> >> >> > job.setOutputValueClass(Put.class); >> >> >> > job.setReducerClass(PutSortReducer.class); >> >> >> > >> >> >> > Path outPath = new Path(output); >> >> >> > FileOutputFormat.setOutputPath(job, outPath); >> >> >> > job.waitForCompletion(true); >> >> >> > >> >> >> > and here is mapper skeleton >> >> >> > >> >> >> > public class MyParserMapper extends >> >> >> > Mapper<LongWritable, Text, ImmutableBytesWritable, Put> { >> >> >> > while(true) >> >> >> > { >> >> >> > Put put = new Put(rowId); >> >> >> > put.add(...); >> >> >> > context.write(rwId, put); >> >> >> > } >> >> >> > >> >> >> > The link says: >> >> >> > *In order to function efficiently, HFileOutputFormat must be >> >> configured >> >> >> such >> >> >> > that each output HFile fits within a single region. In order to do >> >> this, >> >> >> > jobs use Hadoop's TotalOrderPartitioner class to partition the map >> >> output >> >> >> > into disjoint ranges of the key space, corresponding to the key >> ranges >> >> of >> >> >> > the regions in the table. *" >> >> >> > >> >> >> > Now according to my configuration above where i need to set >> >> >> > *TotalOrderPartitioner >> >> >> > ? *Should I need to add the following line also >> >> >> > >> >> >> > job.setPartitionerClass(TotalOrderPartitioner.class); >> >> >> > >> >> >> > >> >> >> > >> >> >> > On totalorderpartition, this is a partitioner class from hadoop. >> The >> >> >> >> MR partitioner -- the class that dictates which reducers get what >> map >> >> >> >> outputs -- is pluggable. The default partitioner does a hash of >> the >> >> >> >> output key to figure which reducer. This won't work if you are >> >> >> >> looking to have your hfile output totally sorted. >> >> >> >> >> >> >> >> >> >> >> > >> >> >> > >> >> >> >> If you can't figure what its about, I'd suggest you check out the >> >> >> >> hadoop book where it gets a good explication. >> >> >> >> >> >> >> >> which book? OReilly.Hadoop.The.Definitive.Guide.Jun.2009? >> >> >> > >> >> >> > On incremental upload, the doc. suggests you look at the output for >> >> >> >> LoadIncrementalHFiles command. Have you done that? You run the >> >> >> >> command and it'll add in whatever is ready for loading. >> >> >> >> >> >> >> > >> >> >> > I just use the command line tool for bulk uplaod but not seen >> >> >> > LoadIncrementalHFiles class yet to do it through program >> >> >> > >> >> >> > >> >> >> > ------------------------------ >> >> >> > >> >> >> > >> >> >> >> >> >> >> >> St.Ack >> >> >> >> >> >> >> >> >> >> >> >> On Wed, Nov 10, 2010 at 6:47 AM, Shuja Rehman < >> [email protected] >> >> > >> >> >> >> wrote: >> >> >> >> > Hey Community, >> >> >> >> > >> >> >> >> > Well...it seems that nobody has experienced with the bulk load >> >> option. >> >> >> I >> >> >> >> > have found one class which might help to write the code for it. >> >> >> >> > >> >> >> >> > >> >> >> >> >> >> >> >> >> >> https://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java >> >> >> >> > >> >> >> >> > From this, you can get the idea how to write map reduce job to >> >> output >> >> >> in >> >> >> >> > HFiles format. But There is a little confusion about these two >> >> things >> >> >> >> > >> >> >> >> > 1-TotalOrderPartitioner >> >> >> >> > 2-configureIncrementalLoad >> >> >> >> > >> >> >> >> > Does anybody have idea about how these things and how to >> configure >> >> it >> >> >> for >> >> >> >> > the job? >> >> >> >> > >> >> >> >> > Thanks >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > On Wed, Nov 10, 2010 at 1:02 AM, Shuja Rehman < >> >> [email protected]> >> >> >> >> wrote: >> >> >> >> > >> >> >> >> >> Hi >> >> >> >> >> >> >> >> >> >> I am trying to investigate the bulk load option as described in >> >> the >> >> >> >> >> following link. >> >> >> >> >> >> >> >> >> >> http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html >> >> >> >> >> >> >> >> >> >> Does anybody have sample code or have used it before? >> >> >> >> >> Can it be helpful to insert data into existing table. In my >> >> scenario, >> >> >> I >> >> >> >> >> have one table with 1 column family in which data will be >> inserted >> >> >> every >> >> >> >> 15 >> >> >> >> >> minutes. >> >> >> >> >> >> >> >> >> >> Kindly share your experiences >> >> >> >> >> >> >> >> >> >> Thanks >> >> >> >> >> -- >> >> >> >> >> Regards >> >> >> >> >> Shuja-ur-Rehman Baig >> >> >> >> >> <http://pk.linkedin.com/in/shujamughal> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > >> >> >> >> > >> >> >> >> > -- >> >> >> >> > Regards >> >> >> >> > Shuja-ur-Rehman Baig >> >> >> >> > <http://pk.linkedin.com/in/shujamughal> >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> > >> >> >> > >> >> >> > -- >> >> >> > Regards >> >> >> > Shuja-ur-Rehman Baig >> >> >> > <http://pk.linkedin.com/in/shujamughal> >> >> >> > >> >> >> >> >> > >> >> > >> >> > >> >> > -- >> >> > Regards >> >> > Shuja-ur-Rehman Baig >> >> > <http://pk.linkedin.com/in/shujamughal> >> >> > >> >> >> > >> > >> > >> > -- >> > Regards >> > Shuja-ur-Rehman Baig >> > <http://pk.linkedin.com/in/shujamughal> >> > >> > > > > -- > Regards > Shuja-ur-Rehman Baig > <http://pk.linkedin.com/in/shujamughal> >
