St.Ack, Here is one new problem now. When I run the job with 1 file, everything goes smooth. but when i give a set of files as input, it stuck and did not doing anything. here is output.
10/11/12 07:42:54 INFO mapreduce.HFileOutputFormat: Looking up current regions for table org.apache.hadoop.hbase.client.hta...@55bb93 10/11/12 07:42:54 INFO mapreduce.HFileOutputFormat: Configuring 1 reduce partitions to match current region count 10/11/12 07:42:54 INFO mapreduce.HFileOutputFormat: Writing partition information to hdfs://app4.hsd1.wa.comcast.net./user/root/partitions_1289576574949 10/11/12 07:42:55 INFO util.NativeCodeLoader: Loaded the native-hadoop library 10/11/12 07:42:55 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 10/11/12 07:42:55 INFO compress.CodecPool: Got brand-new compressor 10/11/12 07:42:55 INFO mapreduce.HFileOutputFormat: Incremental table output configured. 10/11/12 07:42:55 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 10/11/12 07:42:56 INFO input.FileInputFormat: Total input paths to process : 96 10/11/12 07:42:56 INFO mapred.JobClient: Running job: job_201011120442_0004 10/11/12 07:42:57 INFO mapred.JobClient: map 0% reduce 0% Any guess why it is not proceeding forward? Thanks On Fri, Nov 12, 2010 at 8:04 PM, Shuja Rehman <[email protected]> wrote: > Thanks St.Ack > > It solved the problem. > > > On Fri, Nov 12, 2010 at 7:41 PM, Stack <[email protected]> wrote: > >> Fix your classpath. Add the google library. See >> >> http://hbase.apache.org/docs/r0.89.20100924/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description >> for more on classpath. >> >> St.Ack >> >> >> On Fri, Nov 12, 2010 at 5:07 AM, Shuja Rehman <[email protected]> >> wrote: >> > Hi >> > >> > I am trying to use configureIncrementalLoad() function to handle the >> > totalOrderPartitioning but it throws this exception. >> > >> > 10/11/12 05:02:21 INFO zookeeper.ClientCnxn: Opening socket connection >> to >> > server /10.10.10.2:2181 >> > 10/11/12 05:02:21 INFO zookeeper.ClientCnxn: Socket connection >> established >> > to app4.hsd1.wa.comcast.net./10.10.10.2:2181, initiating session >> > 10/11/12 05:02:21 INFO zookeeper.ClientCnxn: Session establishment >> complete >> > on server app4.hsd1.wa.comcast.net./10.10.10.2:2181, sessionid = >> > 0x12c401bfdae0008, negotiated timeout = 40000 >> > 10/11/12 05:02:21 INFO mapreduce.HFileOutputFormat: Looking up current >> > regions for table org.apache.hadoop.hbase.client.hta...@21e554 >> > 10/11/12 05:02:21 INFO mapreduce.HFileOutputFormat: Configuring 1 reduce >> > partitions to match current region count >> > 10/11/12 05:02:21 INFO mapreduce.HFileOutputFormat: Writing partition >> > information to >> > hdfs://app4.hsd1.wa.comcast.net./user/root/partitions_1289566941504 >> > Exception in thread "main" java.lang.NoClassDefFoundError: >> > com/google/common/base/Preconditions >> > at >> > >> org.apache.hadoop.hbase.mapreduce.HFileOutputFormat.writePartitions(HFileOutputFormat.java:185) >> > at >> > >> org.apache.hadoop.hbase.mapreduce.HFileOutputFormat.configureIncrementalLoad(HFileOutputFormat.java:258) >> > at ParserDriver.runJob(ParserDriver.java:162) >> > at ParserDriver.main(ParserDriver.java:109) >> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> > at >> > >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> > at >> > >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> > at java.lang.reflect.Method.invoke(Method.java:597) >> > at org.apache.hadoop.util.RunJar.main(RunJar.java:186) >> > Caused by: java.lang.ClassNotFoundException: >> > com.google.common.base.Preconditions >> > at java.net.URLClassLoader$1.run(URLClassLoader.java:202) >> > at java.security.AccessController.doPrivileged(Native Method) >> > at java.net.URLClassLoader.findClass(URLClassLoader.java:190) >> > at java.lang.ClassLoader.loadClass(ClassLoader.java:307) >> > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) >> > at java.lang.ClassLoader.loadClass(ClassLoader.java:248) >> > ... 9 more >> > 10/11/12 05:02:21 INFO zookeeper.ZooKeeper: Session: 0x12c401bfdae0008 >> > closed >> > >> > Here is the code. >> > >> > Configuration conf = HBaseConfiguration.create(); >> > >> > Job job = new Job(conf, "j"); >> > >> > HTable table = new HTable(conf, "mytab"); >> > >> > FileInputFormat.setInputPaths(job, input); >> > job.setJarByClass(ParserDriver.class); >> > job.setMapperClass(MyParserMapper.class); >> > >> > job.setInputFormatClass(XmlInputFormat.class); >> > job.setReducerClass(PutSortReducer.class); >> > Path outPath = new Path(output); >> > FileOutputFormat.setOutputPath(job, outPath); >> > >> > job.setMapOutputValueClass(Put.class); >> > job.setMapOutputKeyClass(ImmutableBytesWritable.class); >> > * HFileOutputFormat.configureIncrementalLoad(job, table);* >> > TableMapReduceUtil.addDependencyJars(job); >> > job.waitForCompletion(true); >> > >> > I guess, there are some jar files missing. if yes then from where to get >> > these? >> > >> > Thanks >> > >> > On Thu, Nov 11, 2010 at 12:57 AM, Stack <[email protected]> wrote: >> > >> >> On Wed, Nov 10, 2010 at 11:53 AM, Shuja Rehman <[email protected]> >> >> wrote: >> >> > oh! I think u have not read the full post. The essay has 3 paragraphs >> :) >> >> > >> >> > *Should I need to add the following line also >> >> > >> >> > job.setPartitionerClass(TotalOrderPartitioner.class); >> >> > >> >> >> >> You need to specify other than default partitioner so yes, above seems >> >> necessary (Be aware that if only one reducer, all may appear to work >> >> though your partitioner is bad... its when you have multiple reducers >> >> that bad partitioner will show). >> >> >> >> > which book? OReilly.Hadoop.The.Definitive.Guide.Jun.2009? >> >> > >> >> >> >> Yes. Or 2nd edition, October 2010. >> >> >> >> St.Ack >> >> >> >> > >> >> > >> >> > >> >> > On Thu, Nov 11, 2010 at 12:49 AM, Stack <[email protected]> wrote: >> >> > >> >> >> Which two questions (you wrote an essay that looked like one big >> >> >> question -- smile). >> >> >> St.Ack >> >> >> >> >> >> On Wed, Nov 10, 2010 at 11:44 AM, Shuja Rehman < >> [email protected]> >> >> >> wrote: >> >> >> > yeah, I tried it and it did not fails. can u answer other 2 >> questions >> >> as >> >> >> > well? >> >> >> > >> >> >> > >> >> >> > >> >> >> > On Thu, Nov 11, 2010 at 12:15 AM, Stack <[email protected]> wrote: >> >> >> > >> >> >> >> All below looks reasonable (I did not do detailed review of your >> code >> >> >> >> posting). Have you tried it? Did it fail? >> >> >> >> St.Ack >> >> >> >> >> >> >> >> On Wed, Nov 10, 2010 at 11:12 AM, Shuja Rehman < >> >> [email protected]> >> >> >> >> wrote: >> >> >> >> > On Wed, Nov 10, 2010 at 9:20 PM, Stack <[email protected]> >> wrote: >> >> >> >> > >> >> >> >> >> What you need? bulk-upload, in the scheme of things, is a >> well >> >> >> >> >> documented feature. Its also one that has had some exercise >> and >> >> is >> >> >> >> >> known to work well. For a 0.89 release and trunk, >> documentation >> >> is >> >> >> >> >> here: >> http://hbase.apache.org/docs/r0.89.20100924/bulk-loads.html >> >> . >> >> >> >> >> The unit test you refer to below is good for figuring how to >> run a >> >> >> job >> >> >> >> >> (Bulk-upload was redone for 0.89/trunk and is much improved >> over >> >> what >> >> >> >> >> was available in 0.20.x) >> >> >> >> >> >> >> >> >> > >> >> >> >> > *I need to load data into hbase using Hfiles. * >> >> >> >> > >> >> >> >> > Ok, let me tell what I understand from all these things. >> Basically >> >> >> there >> >> >> >> are >> >> >> >> > two ways to bulk load into hbase. >> >> >> >> > >> >> >> >> > 1- Using Command Line tools (importtsv, completebulkload ) >> >> >> >> > 2- Mapreduce job using HFileOutputFormat >> >> >> >> > >> >> >> >> > At the moment, I have generated the Hfiles using >> HFileOutputFormat >> >> and >> >> >> >> > loading these files into hbase using completebulkload command >> line >> >> >> tool. >> >> >> >> > here is my basic code skeleton. Correct me if I do anything >> wrong. >> >> >> >> > >> >> >> >> > Configuration conf = new Configuration(); >> >> >> >> > Job job = new Job(conf, "myjob"); >> >> >> >> > >> >> >> >> > FileInputFormat.setInputPaths(job, input); >> >> >> >> > job.setJarByClass(ParserDriver.class); >> >> >> >> > job.setMapperClass(MyParserMapper.class); >> >> >> >> > job.setNumReduceTasks(1); >> >> >> >> > job.setInputFormatClass(XmlInputFormat.class); >> >> >> >> > job.setOutputFormatClass(HFileOutputFormat.class); >> >> >> >> > job.setOutputKeyClass(ImmutableBytesWritable.class); >> >> >> >> > job.setOutputValueClass(Put.class); >> >> >> >> > job.setReducerClass(PutSortReducer.class); >> >> >> >> > >> >> >> >> > Path outPath = new Path(output); >> >> >> >> > FileOutputFormat.setOutputPath(job, outPath); >> >> >> >> > job.waitForCompletion(true); >> >> >> >> > >> >> >> >> > and here is mapper skeleton >> >> >> >> > >> >> >> >> > public class MyParserMapper extends >> >> >> >> > Mapper<LongWritable, Text, ImmutableBytesWritable, Put> { >> >> >> >> > while(true) >> >> >> >> > { >> >> >> >> > Put put = new Put(rowId); >> >> >> >> > put.add(...); >> >> >> >> > context.write(rwId, put); >> >> >> >> > } >> >> >> >> > >> >> >> >> > The link says: >> >> >> >> > *In order to function efficiently, HFileOutputFormat must be >> >> >> configured >> >> >> >> such >> >> >> >> > that each output HFile fits within a single region. In order to >> do >> >> >> this, >> >> >> >> > jobs use Hadoop's TotalOrderPartitioner class to partition the >> map >> >> >> output >> >> >> >> > into disjoint ranges of the key space, corresponding to the key >> >> ranges >> >> >> of >> >> >> >> > the regions in the table. *" >> >> >> >> > >> >> >> >> > Now according to my configuration above where i need to set >> >> >> >> > *TotalOrderPartitioner >> >> >> >> > ? *Should I need to add the following line also >> >> >> >> > >> >> >> >> > job.setPartitionerClass(TotalOrderPartitioner.class); >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > On totalorderpartition, this is a partitioner class from >> hadoop. >> >> The >> >> >> >> >> MR partitioner -- the class that dictates which reducers get >> what >> >> map >> >> >> >> >> outputs -- is pluggable. The default partitioner does a hash >> of >> >> the >> >> >> >> >> output key to figure which reducer. This won't work if you >> are >> >> >> >> >> looking to have your hfile output totally sorted. >> >> >> >> >> >> >> >> >> >> >> >> >> >> > >> >> >> >> > >> >> >> >> >> If you can't figure what its about, I'd suggest you check out >> the >> >> >> >> >> hadoop book where it gets a good explication. >> >> >> >> >> >> >> >> >> >> which book? OReilly.Hadoop.The.Definitive.Guide.Jun.2009? >> >> >> >> > >> >> >> >> > On incremental upload, the doc. suggests you look at the output >> for >> >> >> >> >> LoadIncrementalHFiles command. Have you done that? You run >> the >> >> >> >> >> command and it'll add in whatever is ready for loading. >> >> >> >> >> >> >> >> >> > >> >> >> >> > I just use the command line tool for bulk uplaod but not seen >> >> >> >> > LoadIncrementalHFiles class yet to do it through program >> >> >> >> > >> >> >> >> > >> >> >> >> > ------------------------------ >> >> >> >> > >> >> >> >> > >> >> >> >> >> >> >> >> >> >> St.Ack >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Wed, Nov 10, 2010 at 6:47 AM, Shuja Rehman < >> >> [email protected] >> >> >> > >> >> >> >> >> wrote: >> >> >> >> >> > Hey Community, >> >> >> >> >> > >> >> >> >> >> > Well...it seems that nobody has experienced with the bulk >> load >> >> >> option. >> >> >> >> I >> >> >> >> >> > have found one class which might help to write the code for >> it. >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> https://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java >> >> >> >> >> > >> >> >> >> >> > From this, you can get the idea how to write map reduce job >> to >> >> >> output >> >> >> >> in >> >> >> >> >> > HFiles format. But There is a little confusion about these >> two >> >> >> things >> >> >> >> >> > >> >> >> >> >> > 1-TotalOrderPartitioner >> >> >> >> >> > 2-configureIncrementalLoad >> >> >> >> >> > >> >> >> >> >> > Does anybody have idea about how these things and how to >> >> configure >> >> >> it >> >> >> >> for >> >> >> >> >> > the job? >> >> >> >> >> > >> >> >> >> >> > Thanks >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > On Wed, Nov 10, 2010 at 1:02 AM, Shuja Rehman < >> >> >> [email protected]> >> >> >> >> >> wrote: >> >> >> >> >> > >> >> >> >> >> >> Hi >> >> >> >> >> >> >> >> >> >> >> >> I am trying to investigate the bulk load option as >> described in >> >> >> the >> >> >> >> >> >> following link. >> >> >> >> >> >> >> >> >> >> >> >> >> http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html >> >> >> >> >> >> >> >> >> >> >> >> Does anybody have sample code or have used it before? >> >> >> >> >> >> Can it be helpful to insert data into existing table. In my >> >> >> scenario, >> >> >> >> I >> >> >> >> >> >> have one table with 1 column family in which data will be >> >> inserted >> >> >> >> every >> >> >> >> >> 15 >> >> >> >> >> >> minutes. >> >> >> >> >> >> >> >> >> >> >> >> Kindly share your experiences >> >> >> >> >> >> >> >> >> >> >> >> Thanks >> >> >> >> >> >> -- >> >> >> >> >> >> Regards >> >> >> >> >> >> Shuja-ur-Rehman Baig >> >> >> >> >> >> <http://pk.linkedin.com/in/shujamughal> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > -- >> >> >> >> >> > Regards >> >> >> >> >> > Shuja-ur-Rehman Baig >> >> >> >> >> > <http://pk.linkedin.com/in/shujamughal> >> >> >> >> >> > >> >> >> >> >> >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > -- >> >> >> >> > Regards >> >> >> >> > Shuja-ur-Rehman Baig >> >> >> >> > <http://pk.linkedin.com/in/shujamughal> >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> > >> >> >> > >> >> >> > -- >> >> >> > Regards >> >> >> > Shuja-ur-Rehman Baig >> >> >> > <http://pk.linkedin.com/in/shujamughal> >> >> >> > >> >> >> >> >> > >> >> > >> >> > >> >> > -- >> >> > Regards >> >> > Shuja-ur-Rehman Baig >> >> > <http://pk.linkedin.com/in/shujamughal> >> >> > >> >> >> > >> > >> > >> > -- >> > Regards >> > Shuja-ur-Rehman Baig >> > <http://pk.linkedin.com/in/shujamughal> >> > >> > > > > -- > Regards > Shuja-ur-Rehman Baig > <http://pk.linkedin.com/in/shujamughal> > > -- Regards Shuja-ur-Rehman Baig <http://pk.linkedin.com/in/shujamughal>
