Re: ClassCastException while running a simple wordCount
you should have just tried it and let us know what your experience had been! Anyways, after spending long hours on this problem I realized this is actually a classLoader problem. If you use spark-submit this exception should go away but you haven't told us how you are submitting a Job such that you ended up seeing this exception? so I am taking a guess on that and saying Spark submit should be able to resolve. If you are just building an Uber jar and just trying to run the Jar like me with the sample code like below then you will run into the classloader problem SparkConf sparkConf = config.buildSparkConfig(); sparkConf.setJars(JavaSparkContext.jarOfClass(SparkDriver.class)); JavaStreamingContext ssc = new JavaStreamingContext(sparkConf, new Duration (1000); Receiver receiver = new Receiver(config); JavaReceiverInputDStream jsonMessagesDStream = ssc.receiverStream( receiver); jsonMessagesDStream.count() ssc.start(); ssc.awaitTermination(); so for me spark-submit works but the code above doesn't work although I am a bit surprised that the code above did work for me in majority of the cases. And when I looked into spark-submit script it was clear that spark-submit is actually replacing the classloader with org.apache.spark.deploy.SparkSubmit but I don't understand why Spark is unable to set the classloader underneath to org.apache.spark.deploy.SparkSubmit through Spark API's? (My Biggest question) On Tue, Oct 11, 2016 at 12:28 AM, vaibhav thapliyal < vaibhav.thapliyal...@gmail.com> wrote: > Hi, > > I am currently running this code from my IDE(Eclipse). I tried adding the > scope "provided" to the dependency without any effect. Should I build this > and submit it using the spark-submit command? > > Thanks > Vaibhav > > On 11 October 2016 at 04:36, Jakob Oderskywrote: > >> Just thought of another potential issue: you should use the "provided" >> scope when depending on spark. I.e in your project's pom: >> >> org.apache.spark >> spark-core_2.11 >> 2.0.1 >> provided >> >> >> On Mon, Oct 10, 2016 at 2:00 PM, Jakob Odersky wrote: >> >>> Ho do you submit the application? A version mismatch between the >>> launcher, driver and workers could lead to the bug you're seeing. A common >>> reason for a mismatch is if the SPARK_HOME environment variable is set. >>> This will cause the spark-submit script to use the launcher determined by >>> that environment variable, regardless of the directory from which it was >>> called. >>> >>> On Mon, Oct 10, 2016 at 3:42 AM, kant kodali wrote: >>> +1 Wooho I have the same problem. I have been trying hard to fix this. On Mon, Oct 10, 2016 3:23 AM, vaibhav thapliyal vaibhav.thapliyal...@gmail.com wrote: > Hi, > If I change the parameter inside the setMaster() to "local", the > program runs. Is there something wrong with the cluster installation? > > I used the spark-2.0.1-bin-hadoop2.7.tgz package to install on my > cluster with default configuration. > > Thanks > Vaibhav > > On 10 Oct 2016 12:49, "vaibhav thapliyal" < > vaibhav.thapliyal...@gmail.com> wrote: > > Here is the code that I am using: > > public class SparkTest { > > > public static void main(String[] args) { > > SparkConf conf = new SparkConf().setMaster("spark:// > 192.168.10.174:7077").setAppName("TestSpark"); > JavaSparkContext sc = new JavaSparkContext(conf); > > JavaRDD textFile = sc.textFile("sampleFile.txt"); > JavaRDD words = textFile.flatMap(new > FlatMapFunction () { > public Iterator call(String s) { > return Arrays.asList(s.split(" ")).iterator(); > } > }); > JavaPairRDD pairs = words.mapToPair(new > PairFunction () { > public Tuple2 call(String s) { > return new Tuple2 (s, 1); > } > }); > JavaPairRDD counts = pairs.reduceByKey(new > Function2 () { > public Integer call(Integer a, Integer b) { > return a + b; > } > }); > counts.saveAsTextFile("outputFile.txt"); > > } > } > > The content of the input file: > Hello Spark > Hi Spark > Spark is running > > > I am using the spark 2.0.1 dependency from maven. > > Thanks > Vaibhav > > On 10 October 2016 at 12:37, Sudhanshu Janghel < > sudhanshu.jang...@cloudwick.com> wrote: > > Seems like a straightforward error it's trying to cast something as a > list which is not a list or cannot be casted. Are you using standard >
Re: ClassCastException while running a simple wordCount
Hi, I am currently running this code from my IDE(Eclipse). I tried adding the scope "provided" to the dependency without any effect. Should I build this and submit it using the spark-submit command? Thanks Vaibhav On 11 October 2016 at 04:36, Jakob Oderskywrote: > Just thought of another potential issue: you should use the "provided" > scope when depending on spark. I.e in your project's pom: > > org.apache.spark > spark-core_2.11 > 2.0.1 > provided > > > On Mon, Oct 10, 2016 at 2:00 PM, Jakob Odersky wrote: > >> Ho do you submit the application? A version mismatch between the >> launcher, driver and workers could lead to the bug you're seeing. A common >> reason for a mismatch is if the SPARK_HOME environment variable is set. >> This will cause the spark-submit script to use the launcher determined by >> that environment variable, regardless of the directory from which it was >> called. >> >> On Mon, Oct 10, 2016 at 3:42 AM, kant kodali wrote: >> >>> +1 Wooho I have the same problem. I have been trying hard to fix this. >>> >>> >>> >>> On Mon, Oct 10, 2016 3:23 AM, vaibhav thapliyal >>> vaibhav.thapliyal...@gmail.com wrote: >>> Hi, If I change the parameter inside the setMaster() to "local", the program runs. Is there something wrong with the cluster installation? I used the spark-2.0.1-bin-hadoop2.7.tgz package to install on my cluster with default configuration. Thanks Vaibhav On 10 Oct 2016 12:49, "vaibhav thapliyal" < vaibhav.thapliyal...@gmail.com> wrote: Here is the code that I am using: public class SparkTest { public static void main(String[] args) { SparkConf conf = new SparkConf().setMaster("spark:// 192.168.10.174:7077").setAppName("TestSpark"); JavaSparkContext sc = new JavaSparkContext(conf); JavaRDD textFile = sc.textFile("sampleFile.txt"); JavaRDD words = textFile.flatMap(new FlatMapFunction () { public Iterator call(String s) { return Arrays.asList(s.split(" ")).iterator(); } }); JavaPairRDD pairs = words.mapToPair(new PairFunction () { public Tuple2 call(String s) { return new Tuple2 (s, 1); } }); JavaPairRDD counts = pairs.reduceByKey(new Function2 () { public Integer call(Integer a, Integer b) { return a + b; } }); counts.saveAsTextFile("outputFile.txt"); } } The content of the input file: Hello Spark Hi Spark Spark is running I am using the spark 2.0.1 dependency from maven. Thanks Vaibhav On 10 October 2016 at 12:37, Sudhanshu Janghel < sudhanshu.jang...@cloudwick.com> wrote: Seems like a straightforward error it's trying to cast something as a list which is not a list or cannot be casted. Are you using standard example code? Can u send the input and code? On Oct 10, 2016 9:05 AM, "vaibhav thapliyal" < vaibhav.thapliyal...@gmail.com> wrote: Dear All, I am getting a ClassCastException Error when using the JAVA API to run the wordcount example from the docs. Here is the log that I got: 16/10/10 11:52:12 ERROR Executor: Exception in task 0.2 in stage 0.0 (TID 4) java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2083) at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1996) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at
Re: ClassCastException while running a simple wordCount
Just thought of another potential issue: you should use the "provided" scope when depending on spark. I.e in your project's pom: org.apache.spark spark-core_2.11 2.0.1 provided On Mon, Oct 10, 2016 at 2:00 PM, Jakob Oderskywrote: > Ho do you submit the application? A version mismatch between the launcher, > driver and workers could lead to the bug you're seeing. A common reason for > a mismatch is if the SPARK_HOME environment variable is set. This will > cause the spark-submit script to use the launcher determined by that > environment variable, regardless of the directory from which it was called. > > On Mon, Oct 10, 2016 at 3:42 AM, kant kodali wrote: > >> +1 Wooho I have the same problem. I have been trying hard to fix this. >> >> >> >> On Mon, Oct 10, 2016 3:23 AM, vaibhav thapliyal >> vaibhav.thapliyal...@gmail.com wrote: >> >>> Hi, >>> If I change the parameter inside the setMaster() to "local", the >>> program runs. Is there something wrong with the cluster installation? >>> >>> I used the spark-2.0.1-bin-hadoop2.7.tgz package to install on my >>> cluster with default configuration. >>> >>> Thanks >>> Vaibhav >>> >>> On 10 Oct 2016 12:49, "vaibhav thapliyal" >> m> wrote: >>> >>> Here is the code that I am using: >>> >>> public class SparkTest { >>> >>> >>> public static void main(String[] args) { >>> >>> SparkConf conf = new SparkConf().setMaster("spark:// >>> 192.168.10.174:7077").setAppName("TestSpark"); >>> JavaSparkContext sc = new JavaSparkContext(conf); >>> >>> JavaRDD textFile = sc.textFile("sampleFile.txt"); >>> JavaRDD words = textFile.flatMap(new >>> FlatMapFunction () { >>> public Iterator call(String s) { >>> return Arrays.asList(s.split(" ")).iterator(); >>> } >>> }); >>> JavaPairRDD pairs = words.mapToPair(new >>> PairFunction () { >>> public Tuple2 call(String s) { >>> return new Tuple2 (s, 1); >>> } >>> }); >>> JavaPairRDD counts = pairs.reduceByKey(new >>> Function2 () { >>> public Integer call(Integer a, Integer b) { >>> return a + b; >>> } >>> }); >>> counts.saveAsTextFile("outputFile.txt"); >>> >>> } >>> } >>> >>> The content of the input file: >>> Hello Spark >>> Hi Spark >>> Spark is running >>> >>> >>> I am using the spark 2.0.1 dependency from maven. >>> >>> Thanks >>> Vaibhav >>> >>> On 10 October 2016 at 12:37, Sudhanshu Janghel < >>> sudhanshu.jang...@cloudwick.com> wrote: >>> >>> Seems like a straightforward error it's trying to cast something as a >>> list which is not a list or cannot be casted. Are you using standard >>> example code? Can u send the input and code? >>> >>> On Oct 10, 2016 9:05 AM, "vaibhav thapliyal" < >>> vaibhav.thapliyal...@gmail.com> wrote: >>> >>> Dear All, >>> >>> I am getting a ClassCastException Error when using the JAVA API to run >>> the wordcount example from the docs. >>> >>> Here is the log that I got: >>> >>> 16/10/10 11:52:12 ERROR Executor: Exception in task 0.2 in stage 0.0 (TID 4) >>> java.lang.ClassCastException: cannot assign instance of >>> scala.collection.immutable.List$SerializationProxy to field >>> org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type >>> scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD >>> at >>> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2083) >>> at >>> java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261) >>> at >>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1996) >>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) >>> at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>> at >>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) >>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) >>> at >>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) >>> at >>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) >>> at >>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) >>> at >>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:71) >>> at >>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) >>>
Re: ClassCastException while running a simple wordCount
Ho do you submit the application? A version mismatch between the launcher, driver and workers could lead to the bug you're seeing. A common reason for a mismatch is if the SPARK_HOME environment variable is set. This will cause the spark-submit script to use the launcher determined by that environment variable, regardless of the directory from which it was called. On Mon, Oct 10, 2016 at 3:42 AM, kant kodaliwrote: > +1 Wooho I have the same problem. I have been trying hard to fix this. > > > > On Mon, Oct 10, 2016 3:23 AM, vaibhav thapliyal > vaibhav.thapliyal...@gmail.com wrote: > >> Hi, >> If I change the parameter inside the setMaster() to "local", the program >> runs. Is there something wrong with the cluster installation? >> >> I used the spark-2.0.1-bin-hadoop2.7.tgz package to install on my cluster >> with default configuration. >> >> Thanks >> Vaibhav >> >> On 10 Oct 2016 12:49, "vaibhav thapliyal" >> wrote: >> >> Here is the code that I am using: >> >> public class SparkTest { >> >> >> public static void main(String[] args) { >> >> SparkConf conf = new SparkConf().setMaster("spark:// >> 192.168.10.174:7077").setAppName("TestSpark"); >> JavaSparkContext sc = new JavaSparkContext(conf); >> >> JavaRDD textFile = sc.textFile("sampleFile.txt"); >> JavaRDD words = textFile.flatMap(new >> FlatMapFunction () { >> public Iterator call(String s) { >> return Arrays.asList(s.split(" ")).iterator(); >> } >> }); >> JavaPairRDD pairs = words.mapToPair(new >> PairFunction () { >> public Tuple2 call(String s) { >> return new Tuple2 (s, 1); >> } >> }); >> JavaPairRDD counts = pairs.reduceByKey(new >> Function2 () { >> public Integer call(Integer a, Integer b) { >> return a + b; >> } >> }); >> counts.saveAsTextFile("outputFile.txt"); >> >> } >> } >> >> The content of the input file: >> Hello Spark >> Hi Spark >> Spark is running >> >> >> I am using the spark 2.0.1 dependency from maven. >> >> Thanks >> Vaibhav >> >> On 10 October 2016 at 12:37, Sudhanshu Janghel < >> sudhanshu.jang...@cloudwick.com> wrote: >> >> Seems like a straightforward error it's trying to cast something as a >> list which is not a list or cannot be casted. Are you using standard >> example code? Can u send the input and code? >> >> On Oct 10, 2016 9:05 AM, "vaibhav thapliyal" < >> vaibhav.thapliyal...@gmail.com> wrote: >> >> Dear All, >> >> I am getting a ClassCastException Error when using the JAVA API to run >> the wordcount example from the docs. >> >> Here is the log that I got: >> >> 16/10/10 11:52:12 ERROR Executor: Exception in task 0.2 in stage 0.0 (TID 4) >> java.lang.ClassCastException: cannot assign instance of >> scala.collection.immutable.List$SerializationProxy to field >> org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type >> scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD >> at >> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2083) >> at >> java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261) >> at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1996) >> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >> at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) >> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) >> at >> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) >> at >> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:71) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) >> at org.apache.spark.scheduler.Task.run(Task.scala:86) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> 16/10/10 11:52:12 ERROR Executor: