Hi all,
this might be a bit specific question and I don't know if the problem is
Giraph, Hadoop or HBase related
but maybe someone has an idea.
I am running an application on a cluster using:
Hadoop 2.5.1
Giraph 1.1.0-hadoop2
HBase 0.98.10.1-hadoop2
Giraph jobs run fine when I start them via the GiraphRunner using text
base input formats. My application is a
fat-jar containing Giraph libs, but not HBase libs (provided). HBase
libs are in the HADOOP_CLASSPATH and
MapReduce jobs using HBase as data source / sink run fine.
The problem occurs when I start a GiraphJob from my Driver program. The
driver does the following:
1) Bulk Load text data into HBase via MapReduce
2) Run a Giraph algorithm using HBase as data source (using
TableInputFormat)
The *driver runs fine in a unit test* using the MiniCluster.
When I start the driver on a cluster, 1) runs successful but after the
GiraphJob is submitted, I get a:
2015-02-21 12:50:38,954 INFO [main]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter set in config
null
2015-02-21 12:50:39,018 FATAL [main]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.lang.NoClassDefFoundError:
org/apache/hadoop/hbase/mapreduce/TableInputFormat
at
org.myapp.io.HBaseVertexInputFormat.<clinit>(HBaseVertexInputFormat.java:48)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at
org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
at org.apache.giraph.conf.ClassConfOption.get(ClassConfOption.java:128)
at org.apache.giraph.conf.GiraphClasses.<init>(GiraphClasses.java:180)
at
org.apache.giraph.conf.ImmutableClassesGiraphConfiguration.<init>(ImmutableClassesGiraphConfiguration.java:138)
at
org.apache.giraph.bsp.BspOutputFormat.getOutputCommitter(BspOutputFormat.java:62)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:473)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:376)
at
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1485)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1482)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1415)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.hbase.mapreduce.TableInputFormat
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 20 more
2015-02-21 12:50:39,021 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting
with status 1
HBaseVertexInputFormat.java:48: protected static final TableInputFormat
BASE_FORMAT = new TableInputFormat();
The class*org/apache/hadoop/hbase/mapreduce/TableInputFormat* is contained
in*hbase-server-0.98.10.1-hadoop2.jar* which
is in the HADOOP_CLASSPATH and - according the the nodemanager logs - gets
downloaded from staging when the application runs.
The GiraphJob is initialized in the driver the following way:
//...
conf.set(TableInputFormat.INPUT_TABLE, MY_TABLE);
conf.set(TableOutputFormat.OUTPUT_TABLE, MY_TABLE);
GiraphJob job = new GiraphJob(conf, JOB_NAME);
GiraphConfiguration giraphConf = job.getConfiguration();
giraphConf.setComputationClass(MyComputation.class);
giraphConf.setVertexInputFormatClass(MyHBaseVertexInputFormat.class);
giraphConf.setVertexOutputFormatClass(MyHBaseVertexOutputFormat.class);
giraphConf.setWorkerConfiguration(workerCount, workerCount, 100f);
job.run(verbose);
//...
Fyi, the*driver ran fine on a Hadoop 1.2.1 cluster with hbase and giraph libs
(hadoop1) packaged in my jar*.
But since this is not really necessary (at least for HBase), there seems to be
a problem loading the jars in the GiraphJob.
Hope you guys have any ideas.
Thanks in advance.
Cheers,
Martin