Hi all,

this might be a bit specific question and I don't know if the problem is Giraph, Hadoop or HBase related
but maybe someone has an idea.

I am running an application on a cluster using:

Hadoop 2.5.1
Giraph 1.1.0-hadoop2
HBase 0.98.10.1-hadoop2

Giraph jobs run fine when I start them via the GiraphRunner using text base input formats. My application is a fat-jar containing Giraph libs, but not HBase libs (provided). HBase libs are in the HADOOP_CLASSPATH and
MapReduce jobs using HBase as data source / sink run fine.

The problem occurs when I start a GiraphJob from my Driver program. The driver does the following:
1) Bulk Load text data into HBase via MapReduce
2) Run a Giraph algorithm using HBase as data source (using TableInputFormat)

The *driver runs fine in a unit test* using the MiniCluster.

When I start the driver on a cluster, 1) runs successful but after the GiraphJob is submitted, I get a:

2015-02-21 12:50:38,954 INFO [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter set in config 
null
2015-02-21 12:50:39,018 FATAL [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.lang.NoClassDefFoundError: 
org/apache/hadoop/hbase/mapreduce/TableInputFormat
        at 
org.myapp.io.HBaseVertexInputFormat.<clinit>(HBaseVertexInputFormat.java:48)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:274)
        at 
org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
        at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
        at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
        at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
        at org.apache.giraph.conf.ClassConfOption.get(ClassConfOption.java:128)
        at org.apache.giraph.conf.GiraphClasses.<init>(GiraphClasses.java:180)
        at 
org.apache.giraph.conf.ImmutableClassesGiraphConfiguration.<init>(ImmutableClassesGiraphConfiguration.java:138)
        at 
org.apache.giraph.bsp.BspOutputFormat.getOutputCommitter(BspOutputFormat.java:62)
        at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:473)
        at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:376)
        at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1485)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
        at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1482)
        at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1415)
Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.hbase.mapreduce.TableInputFormat
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        ... 20 more
2015-02-21 12:50:39,021 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting 
with status 1

HBaseVertexInputFormat.java:48: protected static final TableInputFormat 
BASE_FORMAT = new TableInputFormat();

The class*org/apache/hadoop/hbase/mapreduce/TableInputFormat*  is contained 
in*hbase-server-0.98.10.1-hadoop2.jar*  which
is in the HADOOP_CLASSPATH and - according the the nodemanager logs - gets 
downloaded from staging when the application runs.

The GiraphJob is initialized in the driver the following way:

//...
conf.set(TableInputFormat.INPUT_TABLE, MY_TABLE);
conf.set(TableOutputFormat.OUTPUT_TABLE, MY_TABLE);

GiraphJob job = new GiraphJob(conf, JOB_NAME);
GiraphConfiguration giraphConf = job.getConfiguration();
giraphConf.setComputationClass(MyComputation.class);
giraphConf.setVertexInputFormatClass(MyHBaseVertexInputFormat.class);
giraphConf.setVertexOutputFormatClass(MyHBaseVertexOutputFormat.class);
giraphConf.setWorkerConfiguration(workerCount, workerCount, 100f);

job.run(verbose);
//...

Fyi, the*driver ran fine on a Hadoop 1.2.1 cluster with hbase and giraph libs 
(hadoop1) packaged in my jar*.
But since this is not really necessary (at least for HBase), there seems to be 
a problem loading the jars in the GiraphJob.

Hope you guys have any ideas.

Thanks in advance.

Cheers,
Martin







Reply via email to