Spark error NoClassDefFoundError: org/apache/hadoop/mapred/InputSplit

2015-03-23 Thread , Roy
Hi,


  I am using CDH 5.3.2 packages installation through Cloudera Manager 5.3.2

I am trying to run one spark job with following command

PYTHONPATH=~/code/utils/ spark-submit --master yarn --executor-memory 3G
--num-executors 30 --driver-memory 2G --executor-cores 2 --name=analytics
/home/abc/code/updb/spark/UPDB3analytics.py -date 2015-03-01

but I am getting following error

15/03/23 11:06:49 WARN TaskSetManager: Lost task 9.0 in stage 0.0 (TID 7,
hdp003.dev.xyz.com): java.lang.NoClassDefFoundError:
org/apache/hadoop/mapred/InputSplit
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2532)
at java.lang.Class.getDeclaredConstructors(Class.java:1901)
at java.io.ObjectStreamClass.computeDefaultSUID(ObjectStreamClass.java:1749)
at java.io.ObjectStreamClass.access$100(ObjectStreamClass.java:72)
at java.io.ObjectStreamClass$1.run(ObjectStreamClass.java:250)
at java.io.ObjectStreamClass$1.run(ObjectStreamClass.java:248)
at java.security.AccessController.doPrivileged(Native Method)
at java.io.ObjectStreamClass.getSerialVersionUID(ObjectStreamClass.java:247)
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:613)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.mapred.InputSplit
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 25 more

here is the full trace

https://gist.github.com/anonymous/3492f0ec63d7a23c47cf


Re: Spark error NoClassDefFoundError: org/apache/hadoop/mapred/InputSplit

2015-03-23 Thread Ted Yu
InputSplit is in hadoop-mapreduce-client-core jar

Please check that the jar is in your classpath.

Cheers

On Mon, Mar 23, 2015 at 8:10 AM, , Roy rp...@njit.edu wrote:

 Hi,


   I am using CDH 5.3.2 packages installation through Cloudera Manager 5.3.2

 I am trying to run one spark job with following command

 PYTHONPATH=~/code/utils/ spark-submit --master yarn --executor-memory 3G
 --num-executors 30 --driver-memory 2G --executor-cores 2 --name=analytics
 /home/abc/code/updb/spark/UPDB3analytics.py -date 2015-03-01

 but I am getting following error

 15/03/23 11:06:49 WARN TaskSetManager: Lost task 9.0 in stage 0.0 (TID 7,
 hdp003.dev.xyz.com): java.lang.NoClassDefFoundError:
 org/apache/hadoop/mapred/InputSplit
 at java.lang.Class.getDeclaredConstructors0(Native Method)
 at java.lang.Class.privateGetDeclaredConstructors(Class.java:2532)
 at java.lang.Class.getDeclaredConstructors(Class.java:1901)
 at
 java.io.ObjectStreamClass.computeDefaultSUID(ObjectStreamClass.java:1749)
 at java.io.ObjectStreamClass.access$100(ObjectStreamClass.java:72)
 at java.io.ObjectStreamClass$1.run(ObjectStreamClass.java:250)
 at java.io.ObjectStreamClass$1.run(ObjectStreamClass.java:248)
 at java.security.AccessController.doPrivileged(Native Method)
 at
 java.io.ObjectStreamClass.getSerialVersionUID(ObjectStreamClass.java:247)
 at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:613)
 at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
 at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
 at
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
 at
 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.ClassNotFoundException:
 org.apache.hadoop.mapred.InputSplit
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 ... 25 more

 here is the full trace

 https://gist.github.com/anonymous/3492f0ec63d7a23c47cf