I find it is often faster to skip the reduce phase when updating rows in hbase. (A trick I picked up from Ryan) Essentially, you read a row from hbase, do your processing, and write the row back to hbase. The only time you would want to do the reduce phase is if there is some aggregation that you need, or if there is some output you want to skip (e.g. you have a zipfian distribution and you want to ignore the low count occurrences).
Dave -----Original Message----- From: Taylor, Ronald C [mailto:[email protected]] Sent: Friday, September 17, 2010 4:19 PM To: '[email protected]' Cc: Taylor, Ronald C Subject: hadoop-hbase failure - could use some help, a class is apparently not being found by Hadoop Hi folks, Got a problem in basic Hadoop-Hbase communication. My small test program ProteinCounter1.java - shown in full below - reports out this error java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.mapreduce.TableOutputFormat at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809) The full invocation and error msgs are shown at bottom. We are using Hadoop 20.2 with HBase0.89.2010726 on a 24-node cluster. Hadoop and Hbase each appears to work fine separately. That is, I've created programs that run MapReduce on files, and programs that import data into Hbase tables and manipulate such. Both types of programs have gone quite smoothly. Now I want to combine the two - use MapReduce programs on data drawn from an Hbase table, with results placed back into an Hbase table. But my test program for such, as you see from the error msg, is not working. Apparently the org.apache.hadoop.hbase.mapreduce.TableOutputFormat class is not found. However, I have added these paths, including the relevant Hbase *.jar, to HADOOP_CLASSPATH, so the missing class should have been found, as you can see: export HADOOP_CLASSPATH=/home/hbase/hbase/conf: /home/hbase/hbase/hbase-0.89.20100726.jar: /home/rtaylor/HadoopWork/log4j-1.2.16.jar: /home/rtaylor/HadoopWork/zookeeper-3.3.1.jar This change was made in the ../hadoop/conf/hadoop-env.sh file. I checked the manifest of /home/hbase/hbase/hbase-0.89.20100726.jar and org/apache/hadoop/hbase/mapreduce/TableOutputFormat.class is indeed present that Hbase *.jar file. Also, I have restarted both Hbase and Hadoop after making this change. Don't understand why the TableOutputFormat class is not being found. Or is the error msg misleading, and something else is going wrong? I would very much appreciate any advice people have as to what is going wrong. Need to get this working very soon. Regards, Ron T. ___________________________________________ Ronald Taylor, Ph.D. Computational Biology & Bioinformatics Group Pacific Northwest National Laboratory 902 Battelle Boulevard P.O. Box 999, Mail Stop J4-33 Richland, WA 99352 USA Office: 509-372-6568 Email: [email protected] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% contents of the "ProteinCounter1.java" file: // to compile // javac ProteinCounter1.java // jar cf ProteinCounterTest.jar *.class // to run // hadoop jar ProteinCounterTest.jar ProteinCounter1 import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.io.IntWritable; import java.util.*; import java.io.*; import org.apache.hadoop.hbase.*; import org.apache.hadoop.hbase.client.*; import org.apache.hadoop.hbase.io.*; import org.apache.hadoop.hbase.util.*; import org.apache.hadoop.hbase.mapreduce.*; // %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% /** * counts the number of times each protein appears in the proteinTable * */ public class ProteinCounter1 { static class ProteinMapper1 extends TableMapper<ImmutableBytesWritable, IntWritable> { private int numRecords = 0; private static final IntWritable one = new IntWritable(1); @Override public void map(ImmutableBytesWritable row, Result values, Context context) throws IOException { // retrieve the value of proteinID, which is the row key for each protein in the proteinTable ImmutableBytesWritable proteinID_Key = new ImmutableBytesWritable(row.get()); try { context.write(proteinID_Key, one); } catch (InterruptedException e) { throw new IOException(e); } numRecords++; if ((numRecords % 100) == 0) { context.setStatus("mapper processed " + numRecords + " proteinTable records so far"); } } } public static class ProteinReducer1 extends TableReducer<ImmutableBytesWritable, IntWritable, ImmutableBytesWritable> { public void reduce(ImmutableBytesWritable proteinID_key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } Put put = new Put(proteinID_key.get()); put.add(Bytes.toBytes("resultFields"), Bytes.toBytes("total"), Bytes.toBytes(sum)); System.out.println(String.format("stats : proteinID_key : %d, count : %d", Bytes.toInt(proteinID_key.get()), sum)); context.write(proteinID_key, put); } } public static void main(String[] args) throws Exception { org.apache.hadoop.conf.Configuration conf; conf = org.apache.hadoop.hbase.HBaseConfiguration.create(); Job job = new Job(conf, "HBaseTest_Using_ProteinCounter"); job.setJarByClass(ProteinCounter1.class); org.apache.hadoop.hbase.client.Scan scan = new Scan(); String colFamilyToUse = "proteinFields"; String fieldToUse = "Protein_Ref_ID"; // retreive this one column from the specified family scan.addColumn(Bytes.toBytes(colFamilyToUse), Bytes.toBytes(fieldToUse)); org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter filterToUse = new org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter(); scan.setFilter(filterToUse); TableMapReduceUtil.initTableMapperJob("proteinTable", scan, ProteinMapper1.class, ImmutableBytesWritable.class, IntWritable.class, job); TableMapReduceUtil.initTableReducerJob("testTable", ProteinReducer1.class, job); System.exit(job.waitForCompletion(true) ? 0 : 1); } } %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% session output: [rtay...@h01 Hadoop]$ javac ProteinCounter1.java [rtay...@h01 Hadoop]$ jar cf ProteinCounterTest.jar *.class [rtay...@h01 Hadoop]$ hadoop jar ProteinCounterTest.jar ProteinCounter1 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.job.tracker; Ignoring. 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.local.dir; Ignoring. 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.system.dir; Ignoring. 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.map.tasks.maximum; Ignoring. 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.reduce.tasks.maximum; Ignoring. 10/09/17 15:46:18 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.job.tracker; Ignoring. 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.local.dir; Ignoring. 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.system.dir; Ignoring. 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.map.tasks.maximum; Ignoring. 10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.reduce.tasks.maximum; Ignoring. 10/09/17 15:46:18 WARN conf.Configuration: hdfs-default.xml:a attempt to override final parameter: dfs.name.dir; Ignoring. 10/09/17 15:46:18 WARN conf.Configuration: hdfs-default.xml:a attempt to override final parameter: dfs.data.dir; Ignoring. 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.job.tracker; Ignoring. 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.local.dir; Ignoring. 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.system.dir; Ignoring. 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.map.tasks.maximum; Ignoring. 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.reduce.tasks.maximum; Ignoring. 10/09/17 15:46:19 WARN conf.Configuration: hdfs-default.xml:a attempt to override final parameter: dfs.name.dir; Ignoring. 10/09/17 15:46:19 WARN conf.Configuration: hdfs-default.xml:a attempt to override final parameter: dfs.data.dir; Ignoring. 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.job.tracker; Ignoring. 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.local.dir; Ignoring. 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.system.dir; Ignoring. 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.map.tasks.maximum; Ignoring. 10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to override final parameter: mapred.tasktracker.reduce.tasks.maximum; Ignoring. 10/09/17 15:46:19 WARN conf.Configuration: hdfs-default.xml:a attempt to override final parameter: dfs.name.dir; Ignoring. 10/09/17 15:46:19 WARN conf.Configuration: hdfs-default.xml:a attempt to override final parameter: dfs.data.dir; Ignoring. 10/09/17 15:46:19 INFO zookeeper.ZooKeeperWrapper: Reconnecting to zookeeper 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.1-942149, built on 05/07/2010 17:14 GMT 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client environment:host.name=h01.emsl.pnl.gov 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_21 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc. 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.6.0_21/jre 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/home/hadoop/hadoop/bin/../conf:/usr/java/default/lib/tools.jar:/home/hadoop/hadoop/bin/..:/home/hadoop/hadoop/bin/../hadoop-0.20.2-core.jar:/home/hadoop/hadoop/bin/../lib/commons-cli-1.2.jar:/home/hadoop/hadoop/bin/../lib/commons-codec-1.3.jar:/home/hadoop/hadoop/bin/../lib/commons-el-1.0.jar:/home/hadoop/hadoop/bin/../lib/commons-httpclient-3.0.1.jar:/home/hadoop/hadoop/bin/../lib/commons-logging-1.0.4.jar:/home/hadoop/hadoop/bin/../lib/commons-logging-api-1.0.4.jar:/home/hadoop/hadoop/bin/../lib/commons-net-1.4.1.jar:/home/hadoop/hadoop/bin/../lib/core-3.1.1.jar:/home/hadoop/hadoop/bin/../lib/hsqldb-1.8.0.10.jar:/home/hadoop/hadoop/bin/../lib/jasper-compiler-5.5.12.jar:/home/hadoop/hadoop/bin/../lib/jasper-runtime-5.5.12.jar:/home/hadoop/hadoop/bin/../lib/jets3t-0.6.1.jar:/home/hadoop/hadoop/bin/../lib/jetty-6.1.14.jar:/home/hadoop/hadoop/bin/../lib/jetty-util-6.1.14.jar:/home/hadoop/hadoop/bin/../lib/junit-3.8.1.jar:/home/hadoop/hadoop/bin/../lib/kfs-0.2.2.jar:/home/hadoop/hadoop/bin/../lib/log4j-1.2.15.jar:/home/hadoop/hadoop/bin/../lib/mockito-all-1.8.0.jar:/home/hadoop/hadoop/bin/../lib/oro-2.0.8.jar:/home/hadoop/hadoop/bin/../lib/servlet-api-2.5-6.1.14.jar:/home/hadoop/hadoop/bin/../lib/slf4j-api-1.4.3.jar:/home/hadoop/hadoop/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/hadoop/hadoop/bin/../lib/xmlenc-0.52.jar:/home/hadoop/hadoop/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/hadoop/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar:/home/hbase/hbase/conf:/home/hbase/hbase/hbase-0.89.20100726.jar:/home/rtaylor/HadoopWork/log4j-1.2.16.jar:/home/rtaylor/HadoopWork/zookeeper-3.3.1.jar 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/home/hadoop/hadoop/bin/../lib/native/Linux-i386-32 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA> 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client environment:os.arch=i386 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.18-194.11.1.el5 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client environment:user.name=rtaylor 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/rtaylor 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/rtaylor/HadoopWork/Hadoop 10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=h05:2182,h04:2182,h03:2182,h02:2182,h10:2182,h09:2182,h08:2182,h07:2182,h06:2182 sessionTimeout=60000 watcher=org.apache.hadoop.hbase.zookeeper.zookeeperwrap...@dcb03b 10/09/17 15:46:19 INFO zookeeper.ClientCnxn: Opening socket connection to server h04/192.168.200.24:2182 10/09/17 15:46:19 INFO zookeeper.ClientCnxn: Socket connection established to h04/192.168.200.24:2182, initiating session 10/09/17 15:46:19 INFO zookeeper.ClientCnxn: Session establishment complete on server h04/192.168.200.24:2182, sessionid = 0x22b21c04c330002, negotiated timeout = 60000 10/09/17 15:46:20 INFO mapred.JobClient: Running job: job_201009171510_0004 10/09/17 15:46:21 INFO mapred.JobClient: map 0% reduce 0% 10/09/17 15:46:27 INFO mapred.JobClient: Task Id : attempt_201009171510_0004_m_000002_0, Status : FAILED java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.mapreduce.TableOutputFormat at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809) at org.apache.hadoop.mapreduce.JobContext.getOutputFormatClass(JobContext.java:193) at org.apache.hadoop.mapred.Task.initialize(Task.java:413) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:288) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.mapreduce.TableOutputFormat at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:762) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:807) ... 4 more 10/09/17 15:46:33 INFO mapred.JobClient: Task Id : attempt_201009171510_0004_r_000051_0, Status : FAILED java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.mapreduce.TableOutputFormat at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809) at org.apache.hadoop.mapreduce.JobContext.getOutputFormatClass(JobContext.java:193) at org.apache.hadoop.mapred.Task.initialize(Task.java:413) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:354) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.mapreduce.TableOutputFormat at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:762) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:807) ... 4 more I terminated the program here via <Control><C>, since the error msgs were simply repeating. [rtay...@h01 Hadoop]$
