Well to try and close the loop on this thread. I went back to first principles,
download the 0.7.0 example code and built it against hadoop-2.0.6-alpha and
used the -Dcrunch.platform=2 option to build. I've launched the job jar (with
dependencies) and get the following error.
2014-02-26 16:32:50,468 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart
= 0; bufvoid = 268435456
2014-02-26 16:32:50,468 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart =
67108860; length = 16777216
2014-02-26 16:32:50,520 WARN [main] org.apache.hadoop.mapred.YarnChild:
Exception running child : org.apache.crunch.CrunchRuntimeException: Could not
read runtime node information
at
org.apache.crunch.impl.mr.run.CrunchTaskContext.<init>(CrunchTaskContext.java:48)
at
org.apache.crunch.impl.mr.run.CrunchMapper.setup(CrunchMapper.java:37)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:757)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
If I look at the code it's trying to read the crunch.tmp.dir configuration and
failing. We are on a Cray … so we have a little different HDFS structure (sorry
about that). Currently this is our HDFS structure.
+ hdfs dfs -ls -R /
drwxrwxrwx - jsparks supergroup 0 2014-02-26 16:32 /tmp
drwxrwx--- - jsparks supergroup 0 2014-02-26 16:32 /tmp/hadoop-yarn
drwxrwx--- - jsparks supergroup 0 2014-02-26 16:32
/tmp/hadoop-yarn/staging
drwxrwx--- - jsparks supergroup 0 2014-02-26 16:32
/tmp/hadoop-yarn/staging/history
drwxrwx--- - jsparks supergroup 0 2014-02-26 16:32
/tmp/hadoop-yarn/staging/history/done
drwxrwxrwt - jsparks supergroup 0 2014-02-26 16:32
/tmp/hadoop-yarn/staging/history/done_intermediate
drwxr-xr-x - jsparks supergroup 0 2014-02-26 16:32 /user
drwxr-xr-x - jsparks supergroup 0 2014-02-26 16:32 /user/jsparks
-rw-r--r-- 1 jsparks supergroup 610157 2014-02-26 16:32
/user/jsparks/HuckleberryFinn.txt
And yes, we are reading Huck Finn …
--
Jonathan (Bill) Sparks
Software Architecture
Cray Inc.
From: Josh Wills <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Date: Tuesday, February 25, 2014 4:19 PM
To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Subject: Re: Yarnchild error : crunch-0.7.0
The first error looks like a weird serialization error, like as if the Crunch
version that was being used on the cluster was different from the one that was
used to compile the client. Is crunch installed on the cluster, or is there
another version of Crunch in the hadoop classpath?
The second one still looks to me like the hadoop1/hadoop2 incompatibility
issue, like the local client was compiled with hadoop1 APIs instead of the
hadoop2 APIs on the cluster.
There's an 0.7.0-hadoop2 maven target that should have the right API profile--
http://mvnrepository.com/artifact/org.apache.crunch/crunch-core/0.7.0-hadoop2
I know that we made an error in the 0.8.0 release w/the hadoop2 versioning, so
0.8.0-hadoop2 doesn't work, but 0.8.1-hadoop2 or 0.8.2-hadoop2 should also work.
On Tue, Feb 25, 2014 at 1:54 PM, Bill Sparks
<[email protected]<mailto:[email protected]>> wrote:
So interesting … same results.
This time I ran two versions 1) the examples from the crunch build and the
other 2) a standalone application. The result for the standalone was the same
as before - I guess I expected that. The other failure was different and a
little more confusing. I guess the question I have is can this be caused by the
JDK used to build crunch. We are using JDK1.7
Failure 1)
2014-02-25 14:59:04,252 WARN [main] org.apache.hadoop.mapred.YarnChild:
Exception running child : org.apache.crunch.CrunchRuntimeException: Could not
read runtime node information
at
org.apache.crunch.impl.mr.run.CrunchTaskContext.<init>(CrunchTaskContext.java:48)
at org.apache.crunch.impl.mr.run.CrunchMapper.setup(CrunchMapper.java:37)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:757)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
at java.security.AccessController.doPrivileged(AccessController.java:366)
at javax.security.auth.Subject.doAs(Subject.java:572)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)
Caused by: java.io.InvalidClassException:
org.apache.crunch.types.writable.Writables$4; local class incompatible: stream
classdesc serialVersionUID = 5855040850180329703, local class serialVersionUID
= 4130080921736307351
Failure 2)
2014-02-25 14:59:33,926 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error
running child : java.lang.IncompatibleClassChangeError:
org/apache/hadoop/mapreduce/JobContext.getConfiguration()Lorg/apache/hadoop/conf/Configuration;
at
org.apache.crunch.impl.mr.run.CrunchTaskContext.<init>(CrunchTaskContext.java:42)
at org.apache.crunch.impl.mr.run.CrunchMapper.setup(CrunchMapper.java:37)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:757)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
at java.security.AccessController.doPrivileged(AccessController.java:366)
at javax.security.auth.Subject.doAs(Subject.java:572)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)
JDK
jsparks@jupiter:/lus/dal/jsparks/example/tmp/hdlogs.jsparks/userlogs> java
-version
java version "1.7.0_45"
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)
--
Jonathan (Bill) Sparks
Software Architecture
Cray Inc.
From: Josh Wills <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Date: Tuesday, February 25, 2014 1:54 PM
To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Subject: Re: Yarnchild error : crunch-0.7.0
Yeah, try it again w/ -Dcrunch.platform=2 instead of -Dhadoop.profile=2.0
J
On Tue, Feb 25, 2014 at 11:47 AM, Bill Sparks
<[email protected]<mailto:[email protected]>> wrote:
Well I did the following and also changed the pom.xml to reference the correct
hadoop version.
$ mvn clean install -Dhadoop.profile=2.0 –DskipTests
<hadoop.version>2.0.6-alpha</hadoop.version>
--
Jonathan (Bill) Sparks
Software Architecture
Cray Inc.
From: Josh Wills <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Date: Tuesday, February 25, 2014 1:43 PM
To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Subject: Re: Yarnchild error : crunch-0.7.0
Hrm-- that's usually related to the API changes between hadoop1 and hadoop2.
How did you build crunch, exactly? Did you use -Dcrunch.platform=2?
J
On Tue, Feb 25, 2014 at 11:37 AM, Bill Sparks
<[email protected]<mailto:[email protected]>> wrote:
Can anyone shed some light on why I would be getting the following error when
submitting a simple crunch wordcount example. Other Hadoop MR applications
work, just it seems that Crunch is confused about some class definitions.
I'm running hadoop-2.0.6-alpha and have build crunch to match.
Hadoop 2.0.6-alpha
Subversion Unknown -r ca4c88898f95aaab3fd85b5e9c194ffd647c2109
Compiled by jenkins on 2013-10-30T07:19Z
>From source with checksum 95e88b2a9589fa69d6d5c1dbd48d4e
2014-02-25 13:23:00,049 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart
= 0; bufvoid = 268435456
2014-02-25 13:23:00,049 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart =
67108860; length = 16777216
2014-02-25 13:23:00,070 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error
running child : java.lang.IncompatibleClassChangeError:
org/apache/hadoop/mapreduce/JobContext.getConfiguration()Lorg/apache/hadoop/conf/Configuration;
at
org.apache.crunch.impl.mr.run.CrunchTaskContext.<init>(CrunchTaskContext.java:42)
at org.apache.crunch.impl.mr.run.CrunchMapper.setup(CrunchMapper.java:37)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:757)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
at java.security.AccessController.doPrivileged(AccessController.java:366)
at javax.security.auth.Subject.doAs(Subject.java:572)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)
--
Director of Data Science
Cloudera<http://www.cloudera.com>
Twitter: @josh_wills<http://twitter.com/josh_wills>
--
Director of Data Science
Cloudera<http://www.cloudera.com>
Twitter: @josh_wills<http://twitter.com/josh_wills>