Hi folks,
Got a problem in basic Hadoop-Hbase communication. My small test program
ProteinCounter1.java - shown in full below - reports out this error
java.lang.RuntimeException: java.lang.ClassNotFoundException:
org.apache.hadoop.hbase.mapreduce.TableOutputFormat
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
The full invocation and error msgs are shown at bottom.
We are using Hadoop 20.2 with HBase0.89.2010726 on a 24-node cluster. Hadoop
and Hbase each appears to work fine separately. That is, I've created programs
that run MapReduce on files, and programs that import data into Hbase tables
and manipulate such. Both types of programs have gone quite smoothly.
Now I want to combine the two - use MapReduce programs on data drawn from an
Hbase table, with results placed back into an Hbase table.
But my test program for such, as you see from the error msg, is not working.
Apparently the
org.apache.hadoop.hbase.mapreduce.TableOutputFormat
class is not found.
However, I have added these paths, including the relevant Hbase *.jar, to
HADOOP_CLASSPATH, so the missing class should have been found, as you can see:
export HADOOP_CLASSPATH=/home/hbase/hbase/conf:
/home/hbase/hbase/hbase-0.89.20100726.jar:
/home/rtaylor/HadoopWork/log4j-1.2.16.jar:
/home/rtaylor/HadoopWork/zookeeper-3.3.1.jar
This change was made in the ../hadoop/conf/hadoop-env.sh file.
I checked the manifest of /home/hbase/hbase/hbase-0.89.20100726.jar and
org/apache/hadoop/hbase/mapreduce/TableOutputFormat.class
is indeed present that Hbase *.jar file.
Also, I have restarted both Hbase and Hadoop after making this change.
Don't understand why the TableOutputFormat class is not being found. Or is the
error msg misleading, and something else is going wrong? I would very much
appreciate any advice people have as to what is going wrong. Need to get this
working very soon.
Regards,
Ron T.
___________________________________________
Ronald Taylor, Ph.D.
Computational Biology & Bioinformatics Group
Pacific Northwest National Laboratory
902 Battelle Boulevard
P.O. Box 999, Mail Stop J4-33
Richland, WA 99352 USA
Office: 509-372-6568
Email: [email protected]
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
contents of the "ProteinCounter1.java" file:
// to compile
// javac ProteinCounter1.java
// jar cf ProteinCounterTest.jar *.class
// to run
// hadoop jar ProteinCounterTest.jar ProteinCounter1
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.io.IntWritable;
import java.util.*;
import java.io.*;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.io.*;
import org.apache.hadoop.hbase.util.*;
import org.apache.hadoop.hbase.mapreduce.*;
// %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
/**
* counts the number of times each protein appears in the proteinTable
*
*/
public class ProteinCounter1 {
static class ProteinMapper1 extends TableMapper<ImmutableBytesWritable,
IntWritable> {
private int numRecords = 0;
private static final IntWritable one = new IntWritable(1);
@Override
public void map(ImmutableBytesWritable row, Result values, Context
context) throws IOException {
// retrieve the value of proteinID, which is the row key for each
protein in the proteinTable
ImmutableBytesWritable proteinID_Key = new
ImmutableBytesWritable(row.get());
try {
context.write(proteinID_Key, one);
} catch (InterruptedException e) {
throw new IOException(e);
}
numRecords++;
if ((numRecords % 100) == 0) {
context.setStatus("mapper processed " + numRecords + "
proteinTable records so far");
}
}
}
public static class ProteinReducer1 extends
TableReducer<ImmutableBytesWritable,
IntWritable,
ImmutableBytesWritable> {
public void reduce(ImmutableBytesWritable proteinID_key,
Iterable<IntWritable> values,
Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
Put put = new Put(proteinID_key.get());
put.add(Bytes.toBytes("resultFields"), Bytes.toBytes("total"),
Bytes.toBytes(sum));
System.out.println(String.format("stats : proteinID_key : %d, count
: %d",
Bytes.toInt(proteinID_key.get()),
sum));
context.write(proteinID_key, put);
}
}
public static void main(String[] args) throws Exception {
org.apache.hadoop.conf.Configuration conf;
conf = org.apache.hadoop.hbase.HBaseConfiguration.create();
Job job = new Job(conf, "HBaseTest_Using_ProteinCounter");
job.setJarByClass(ProteinCounter1.class);
org.apache.hadoop.hbase.client.Scan scan = new Scan();
String colFamilyToUse = "proteinFields";
String fieldToUse = "Protein_Ref_ID";
// retreive this one column from the specified family
scan.addColumn(Bytes.toBytes(colFamilyToUse),
Bytes.toBytes(fieldToUse));
org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter filterToUse =
new org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter();
scan.setFilter(filterToUse);
TableMapReduceUtil.initTableMapperJob("proteinTable", scan,
ProteinMapper1.class,
ImmutableBytesWritable.class,
IntWritable.class, job);
TableMapReduceUtil.initTableReducerJob("testTable",
ProteinReducer1.class, job);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
session output:
[rtay...@h01 Hadoop]$ javac ProteinCounter1.java
[rtay...@h01 Hadoop]$ jar cf ProteinCounterTest.jar *.class
[rtay...@h01 Hadoop]$ hadoop jar ProteinCounterTest.jar ProteinCounter1
10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to
override final parameter: mapred.job.tracker; Ignoring.
10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to
override final parameter: mapred.local.dir; Ignoring.
10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to
override final parameter: mapred.system.dir; Ignoring.
10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to
override final parameter: mapred.tasktracker.map.tasks.maximum; Ignoring.
10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to
override final parameter: mapred.tasktracker.reduce.tasks.maximum; Ignoring.
10/09/17 15:46:18 WARN mapred.JobClient: Use GenericOptionsParser for parsing
the arguments. Applications should implement Tool for the same.
10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to
override final parameter: mapred.job.tracker; Ignoring.
10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to
override final parameter: mapred.local.dir; Ignoring.
10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to
override final parameter: mapred.system.dir; Ignoring.
10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to
override final parameter: mapred.tasktracker.map.tasks.maximum; Ignoring.
10/09/17 15:46:18 WARN conf.Configuration: mapred-default.xml:a attempt to
override final parameter: mapred.tasktracker.reduce.tasks.maximum; Ignoring.
10/09/17 15:46:18 WARN conf.Configuration: hdfs-default.xml:a attempt to
override final parameter: dfs.name.dir; Ignoring.
10/09/17 15:46:18 WARN conf.Configuration: hdfs-default.xml:a attempt to
override final parameter: dfs.data.dir; Ignoring.
10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to
override final parameter: mapred.job.tracker; Ignoring.
10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to
override final parameter: mapred.local.dir; Ignoring.
10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to
override final parameter: mapred.system.dir; Ignoring.
10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to
override final parameter: mapred.tasktracker.map.tasks.maximum; Ignoring.
10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to
override final parameter: mapred.tasktracker.reduce.tasks.maximum; Ignoring.
10/09/17 15:46:19 WARN conf.Configuration: hdfs-default.xml:a attempt to
override final parameter: dfs.name.dir; Ignoring.
10/09/17 15:46:19 WARN conf.Configuration: hdfs-default.xml:a attempt to
override final parameter: dfs.data.dir; Ignoring.
10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to
override final parameter: mapred.job.tracker; Ignoring.
10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to
override final parameter: mapred.local.dir; Ignoring.
10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to
override final parameter: mapred.system.dir; Ignoring.
10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to
override final parameter: mapred.tasktracker.map.tasks.maximum; Ignoring.
10/09/17 15:46:19 WARN conf.Configuration: mapred-default.xml:a attempt to
override final parameter: mapred.tasktracker.reduce.tasks.maximum; Ignoring.
10/09/17 15:46:19 WARN conf.Configuration: hdfs-default.xml:a attempt to
override final parameter: dfs.name.dir; Ignoring.
10/09/17 15:46:19 WARN conf.Configuration: hdfs-default.xml:a attempt to
override final parameter: dfs.data.dir; Ignoring.
10/09/17 15:46:19 INFO zookeeper.ZooKeeperWrapper: Reconnecting to zookeeper
10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
environment:zookeeper.version=3.3.1-942149, built on 05/07/2010 17:14 GMT
10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
environment:host.name=h01.emsl.pnl.gov
10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
environment:java.version=1.6.0_21
10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun
Microsystems Inc.
10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
environment:java.home=/usr/java/jdk1.6.0_21/jre
10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
environment:java.class.path=/home/hadoop/hadoop/bin/../conf:/usr/java/default/lib/tools.jar:/home/hadoop/hadoop/bin/..:/home/hadoop/hadoop/bin/../hadoop-0.20.2-core.jar:/home/hadoop/hadoop/bin/../lib/commons-cli-1.2.jar:/home/hadoop/hadoop/bin/../lib/commons-codec-1.3.jar:/home/hadoop/hadoop/bin/../lib/commons-el-1.0.jar:/home/hadoop/hadoop/bin/../lib/commons-httpclient-3.0.1.jar:/home/hadoop/hadoop/bin/../lib/commons-logging-1.0.4.jar:/home/hadoop/hadoop/bin/../lib/commons-logging-api-1.0.4.jar:/home/hadoop/hadoop/bin/../lib/commons-net-1.4.1.jar:/home/hadoop/hadoop/bin/../lib/core-3.1.1.jar:/home/hadoop/hadoop/bin/../lib/hsqldb-1.8.0.10.jar:/home/hadoop/hadoop/bin/../lib/jasper-compiler-5.5.12.jar:/home/hadoop/hadoop/bin/../lib/jasper-runtime-5.5.12.jar:/home/hadoop/hadoop/bin/../lib/jets3t-0.6.1.jar:/home/hadoop/hadoop/bin/../lib/jetty-6.1.14.jar:/home/hadoop/hadoop/bin/../lib/jetty-util-6.1.14.jar:/home/hadoop/hadoop/bin/../lib/junit-3.8.1.jar:/home/hadoop/hadoop/bin/../lib/kfs-0.2.2.jar:/home/hadoop/hadoop/bin/../lib/log4j-1.2.15.jar:/home/hadoop/hadoop/bin/../lib/mockito-all-1.8.0.jar:/home/hadoop/hadoop/bin/../lib/oro-2.0.8.jar:/home/hadoop/hadoop/bin/../lib/servlet-api-2.5-6.1.14.jar:/home/hadoop/hadoop/bin/../lib/slf4j-api-1.4.3.jar:/home/hadoop/hadoop/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/hadoop/hadoop/bin/../lib/xmlenc-0.52.jar:/home/hadoop/hadoop/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/hadoop/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar:/home/hbase/hbase/conf:/home/hbase/hbase/hbase-0.89.20100726.jar:/home/rtaylor/HadoopWork/log4j-1.2.16.jar:/home/rtaylor/HadoopWork/zookeeper-3.3.1.jar
10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
environment:java.library.path=/home/hadoop/hadoop/bin/../lib/native/Linux-i386-32
10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
environment:java.io.tmpdir=/tmp
10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
environment:java.compiler=<NA>
10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client environment:os.arch=i386
10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
environment:os.version=2.6.18-194.11.1.el5
10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client environment:user.name=rtaylor
10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
environment:user.home=/home/rtaylor
10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Client
environment:user.dir=/home/rtaylor/HadoopWork/Hadoop
10/09/17 15:46:19 INFO zookeeper.ZooKeeper: Initiating client connection,
connectString=h05:2182,h04:2182,h03:2182,h02:2182,h10:2182,h09:2182,h08:2182,h07:2182,h06:2182
sessionTimeout=60000
watcher=org.apache.hadoop.hbase.zookeeper.zookeeperwrap...@dcb03b
10/09/17 15:46:19 INFO zookeeper.ClientCnxn: Opening socket connection to
server h04/192.168.200.24:2182
10/09/17 15:46:19 INFO zookeeper.ClientCnxn: Socket connection established to
h04/192.168.200.24:2182, initiating session
10/09/17 15:46:19 INFO zookeeper.ClientCnxn: Session establishment complete on
server h04/192.168.200.24:2182, sessionid = 0x22b21c04c330002, negotiated
timeout = 60000
10/09/17 15:46:20 INFO mapred.JobClient: Running job: job_201009171510_0004
10/09/17 15:46:21 INFO mapred.JobClient: map 0% reduce 0%
10/09/17 15:46:27 INFO mapred.JobClient: Task Id :
attempt_201009171510_0004_m_000002_0, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException:
org.apache.hadoop.hbase.mapreduce.TableOutputFormat
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
at
org.apache.hadoop.mapreduce.JobContext.getOutputFormatClass(JobContext.java:193)
at org.apache.hadoop.mapred.Task.initialize(Task.java:413)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:288)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.hbase.mapreduce.TableOutputFormat
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:762)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:807)
... 4 more
10/09/17 15:46:33 INFO mapred.JobClient: Task Id :
attempt_201009171510_0004_r_000051_0, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException:
org.apache.hadoop.hbase.mapreduce.TableOutputFormat
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
at
org.apache.hadoop.mapreduce.JobContext.getOutputFormatClass(JobContext.java:193)
at org.apache.hadoop.mapred.Task.initialize(Task.java:413)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:354)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.hbase.mapreduce.TableOutputFormat
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:762)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:807)
... 4 more
I terminated the program here via <Control><C>, since the error msgs were
simply repeating.
[rtay...@h01 Hadoop]$