I found the solution. For the GiraphJob it is necessary to 1) add HBase libs to HADOOP_CLASSPATH on all machines 2) add (comma-separated) HBase libs via -libjars parameter when running the Driver
I still don't know why this is not necessary for regular MapReduce jobs, but I like this solution as I don't need to build a fat jar anymore. Cheers, Martin On 21.02.2015 13:20, Martin Junghanns wrote: > Hi all, > > this might be a bit specific question and I don't know if the problem is > Giraph, Hadoop or HBase related > but maybe someone has an idea. > > I am running an application on a cluster using: > > Hadoop 2.5.1 > Giraph 1.1.0-hadoop2 > HBase 0.98.10.1-hadoop2 > > Giraph jobs run fine when I start them via the GiraphRunner using text > base input formats. My application is a > fat-jar containing Giraph libs, but not HBase libs (provided). HBase > libs are in the HADOOP_CLASSPATH and > MapReduce jobs using HBase as data source / sink run fine. > > The problem occurs when I start a GiraphJob from my Driver program. The > driver does the following: > 1) Bulk Load text data into HBase via MapReduce > 2) Run a Giraph algorithm using HBase as data source (using > TableInputFormat) > > The *driver runs fine in a unit test* using the MiniCluster. > > When I start the driver on a cluster, 1) runs successful but after the > GiraphJob is submitted, I get a: > > 2015-02-21 12:50:38,954 INFO [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter set in > config null > 2015-02-21 12:50:39,018 FATAL [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster > java.lang.NoClassDefFoundError: > org/apache/hadoop/hbase/mapreduce/TableInputFormat > at > org.myapp.io.HBaseVertexInputFormat.<clinit>(HBaseVertexInputFormat.java:48) > > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:274) > at > org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844) > > at > org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809) > > at > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903) > at > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929) > at org.apache.giraph.conf.ClassConfOption.get(ClassConfOption.java:128) > at org.apache.giraph.conf.GiraphClasses.<init>(GiraphClasses.java:180) > at > org.apache.giraph.conf.ImmutableClassesGiraphConfiguration.<init>(ImmutableClassesGiraphConfiguration.java:138) > > at > org.apache.giraph.bsp.BspOutputFormat.getOutputCommitter(BspOutputFormat.java:62) > > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:473) > > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:376) > > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1485) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1482) > > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1415) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.hbase.mapreduce.TableInputFormat > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > ... 20 more > 2015-02-21 12:50:39,021 INFO [main] org.apache.hadoop.util.ExitUtil: > Exiting with status 1 > > HBaseVertexInputFormat.java:48: protected static final TableInputFormat > BASE_FORMAT = new TableInputFormat(); > > The class*org/apache/hadoop/hbase/mapreduce/TableInputFormat* is > contained in*hbase-server-0.98.10.1-hadoop2.jar* which > is in the HADOOP_CLASSPATH and - according the the nodemanager logs - > gets downloaded from staging when the application runs. > > The GiraphJob is initialized in the driver the following way: > > //... > conf.set(TableInputFormat.INPUT_TABLE, MY_TABLE); > conf.set(TableOutputFormat.OUTPUT_TABLE, MY_TABLE); > > GiraphJob job = new GiraphJob(conf, JOB_NAME); > GiraphConfiguration giraphConf = job.getConfiguration(); > giraphConf.setComputationClass(MyComputation.class); > giraphConf.setVertexInputFormatClass(MyHBaseVertexInputFormat.class); > giraphConf.setVertexOutputFormatClass(MyHBaseVertexOutputFormat.class); > giraphConf.setWorkerConfiguration(workerCount, workerCount, 100f); > > job.run(verbose); > //... > > Fyi, the*driver ran fine on a Hadoop 1.2.1 cluster with hbase and giraph > libs (hadoop1) packaged in my jar*. > But since this is not really necessary (at least for HBase), there seems > to be a problem loading the jars in the GiraphJob. > > Hope you guys have any ideas. > > Thanks in advance. > > Cheers, > Martin > > > > > > > >
