Re: ClassCastException while running a simple wordCount

2016-10-11 Thread kant kodali
you should have just tried it and let us know what your experience had
been! Anyways, after spending long hours on this problem I realized this is
actually a classLoader problem.

If you use spark-submit this exception should go away but you haven't told
us how you are submitting a Job such that you ended up seeing this
exception? so I am taking a guess on that and saying Spark submit should be
able to resolve.

If you are just building an Uber jar and just trying to run the Jar like me
with the sample code like below then you will run into the classloader
problem


SparkConf sparkConf = config.buildSparkConfig();
sparkConf.setJars(JavaSparkContext.jarOfClass(SparkDriver.class));
JavaStreamingContext ssc = new JavaStreamingContext(sparkConf, new Duration
(1000);
Receiver receiver = new Receiver(config);
JavaReceiverInputDStream jsonMessagesDStream = ssc.receiverStream(
receiver);
jsonMessagesDStream.count()
ssc.start();
ssc.awaitTermination();


so for me spark-submit works but the code above doesn't work although I am
a bit surprised that the code above did work for me in majority of the
cases. And when I looked into spark-submit script it was clear that
spark-submit is actually replacing the classloader with
org.apache.spark.deploy.SparkSubmit
but I don't understand why Spark is unable to set the classloader
underneath to org.apache.spark.deploy.SparkSubmit through Spark API's? (My
Biggest question)


On Tue, Oct 11, 2016 at 12:28 AM, vaibhav thapliyal <
vaibhav.thapliyal...@gmail.com> wrote:

> Hi,
>
> I am currently running this code from my IDE(Eclipse). I tried adding the
> scope "provided" to the dependency without any effect. Should I build this
> and submit it using the spark-submit command?
>
> Thanks
> Vaibhav
>
> On 11 October 2016 at 04:36, Jakob Odersky  wrote:
>
>> Just thought of another potential issue: you should use the "provided"
>> scope when depending on spark. I.e in your project's pom:
>> 
>> org.apache.spark
>> spark-core_2.11
>> 2.0.1
>> provided
>> 
>>
>> On Mon, Oct 10, 2016 at 2:00 PM, Jakob Odersky  wrote:
>>
>>> Ho do you submit the application? A version mismatch between the
>>> launcher, driver and workers could lead to the bug you're seeing. A common
>>> reason for a mismatch is if the SPARK_HOME environment variable is set.
>>> This will cause the spark-submit script to use the launcher determined by
>>> that environment variable, regardless of the directory from which it was
>>> called.
>>>
>>> On Mon, Oct 10, 2016 at 3:42 AM, kant kodali  wrote:
>>>
 +1 Wooho I have the same problem. I have been trying hard to fix this.



 On Mon, Oct 10, 2016 3:23 AM, vaibhav thapliyal
 vaibhav.thapliyal...@gmail.com wrote:

> Hi,
> If I change the parameter inside the setMaster()  to "local", the
> program runs. Is there something wrong with the cluster installation?
>
> I used the spark-2.0.1-bin-hadoop2.7.tgz package to install on my
> cluster with default configuration.
>
> Thanks
> Vaibhav
>
> On 10 Oct 2016 12:49, "vaibhav thapliyal" <
> vaibhav.thapliyal...@gmail.com> wrote:
>
> Here is the code that I am using:
>
> public class SparkTest {
>
>
> public static void main(String[] args) {
>
> SparkConf conf = new SparkConf().setMaster("spark://
> 192.168.10.174:7077").setAppName("TestSpark");
> JavaSparkContext sc = new JavaSparkContext(conf);
>
> JavaRDD textFile = sc.textFile("sampleFile.txt");
> JavaRDD words = textFile.flatMap(new
> FlatMapFunction() {
> public Iterator call(String s) {
> return Arrays.asList(s.split(" ")).iterator();
> }
> });
> JavaPairRDD pairs = words.mapToPair(new
> PairFunction() {
> public Tuple2 call(String s) {
> return new Tuple2(s, 1);
> }
> });
> JavaPairRDD counts = pairs.reduceByKey(new
> Function2() {
> public Integer call(Integer a, Integer b) {
> return a + b;
> }
> });
> counts.saveAsTextFile("outputFile.txt");
>
> }
> }
>
> The content of the input file:
> Hello Spark
> Hi Spark
> Spark is running
>
>
> I am using the spark 2.0.1 dependency from maven.
>
> Thanks
> Vaibhav
>
> On 10 October 2016 at 12:37, Sudhanshu Janghel <
> sudhanshu.jang...@cloudwick.com> wrote:
>
> Seems like a straightforward error it's trying to cast something as a
> list which is not a list or cannot be casted.  Are you using standard
> 

Re: ClassCastException while running a simple wordCount

2016-10-11 Thread vaibhav thapliyal
Hi,

I am currently running this code from my IDE(Eclipse). I tried adding the
scope "provided" to the dependency without any effect. Should I build this
and submit it using the spark-submit command?

Thanks
Vaibhav

On 11 October 2016 at 04:36, Jakob Odersky  wrote:

> Just thought of another potential issue: you should use the "provided"
> scope when depending on spark. I.e in your project's pom:
> 
> org.apache.spark
> spark-core_2.11
> 2.0.1
> provided
> 
>
> On Mon, Oct 10, 2016 at 2:00 PM, Jakob Odersky  wrote:
>
>> Ho do you submit the application? A version mismatch between the
>> launcher, driver and workers could lead to the bug you're seeing. A common
>> reason for a mismatch is if the SPARK_HOME environment variable is set.
>> This will cause the spark-submit script to use the launcher determined by
>> that environment variable, regardless of the directory from which it was
>> called.
>>
>> On Mon, Oct 10, 2016 at 3:42 AM, kant kodali  wrote:
>>
>>> +1 Wooho I have the same problem. I have been trying hard to fix this.
>>>
>>>
>>>
>>> On Mon, Oct 10, 2016 3:23 AM, vaibhav thapliyal
>>> vaibhav.thapliyal...@gmail.com wrote:
>>>
 Hi,
 If I change the parameter inside the setMaster()  to "local", the
 program runs. Is there something wrong with the cluster installation?

 I used the spark-2.0.1-bin-hadoop2.7.tgz package to install on my
 cluster with default configuration.

 Thanks
 Vaibhav

 On 10 Oct 2016 12:49, "vaibhav thapliyal" <
 vaibhav.thapliyal...@gmail.com> wrote:

 Here is the code that I am using:

 public class SparkTest {


 public static void main(String[] args) {

 SparkConf conf = new SparkConf().setMaster("spark://
 192.168.10.174:7077").setAppName("TestSpark");
 JavaSparkContext sc = new JavaSparkContext(conf);

 JavaRDD textFile = sc.textFile("sampleFile.txt");
 JavaRDD words = textFile.flatMap(new
 FlatMapFunction() {
 public Iterator call(String s) {
 return Arrays.asList(s.split(" ")).iterator();
 }
 });
 JavaPairRDD pairs = words.mapToPair(new
 PairFunction() {
 public Tuple2 call(String s) {
 return new Tuple2(s, 1);
 }
 });
 JavaPairRDD counts = pairs.reduceByKey(new
 Function2() {
 public Integer call(Integer a, Integer b) {
 return a + b;
 }
 });
 counts.saveAsTextFile("outputFile.txt");

 }
 }

 The content of the input file:
 Hello Spark
 Hi Spark
 Spark is running


 I am using the spark 2.0.1 dependency from maven.

 Thanks
 Vaibhav

 On 10 October 2016 at 12:37, Sudhanshu Janghel <
 sudhanshu.jang...@cloudwick.com> wrote:

 Seems like a straightforward error it's trying to cast something as a
 list which is not a list or cannot be casted.  Are you using standard
 example code? Can u send the input and code?

 On Oct 10, 2016 9:05 AM, "vaibhav thapliyal" <
 vaibhav.thapliyal...@gmail.com> wrote:

 Dear All,

 I am getting a ClassCastException Error when using the JAVA API to run
 the wordcount example from the docs.

 Here is the log that I got:

 16/10/10 11:52:12 ERROR Executor: Exception in task 0.2 in stage 0.0 (TID 
 4)
 java.lang.ClassCastException: cannot assign instance of 
 scala.collection.immutable.List$SerializationProxy to field 
 org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type 
 scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
at 
 java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2083)
at 
 java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)
at 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1996)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at 

Re: ClassCastException while running a simple wordCount

2016-10-10 Thread Jakob Odersky
Just thought of another potential issue: you should use the "provided"
scope when depending on spark. I.e in your project's pom:

org.apache.spark
spark-core_2.11
2.0.1
provided


On Mon, Oct 10, 2016 at 2:00 PM, Jakob Odersky  wrote:

> Ho do you submit the application? A version mismatch between the launcher,
> driver and workers could lead to the bug you're seeing. A common reason for
> a mismatch is if the SPARK_HOME environment variable is set. This will
> cause the spark-submit script to use the launcher determined by that
> environment variable, regardless of the directory from which it was called.
>
> On Mon, Oct 10, 2016 at 3:42 AM, kant kodali  wrote:
>
>> +1 Wooho I have the same problem. I have been trying hard to fix this.
>>
>>
>>
>> On Mon, Oct 10, 2016 3:23 AM, vaibhav thapliyal
>> vaibhav.thapliyal...@gmail.com wrote:
>>
>>> Hi,
>>> If I change the parameter inside the setMaster()  to "local", the
>>> program runs. Is there something wrong with the cluster installation?
>>>
>>> I used the spark-2.0.1-bin-hadoop2.7.tgz package to install on my
>>> cluster with default configuration.
>>>
>>> Thanks
>>> Vaibhav
>>>
>>> On 10 Oct 2016 12:49, "vaibhav thapliyal" >> m> wrote:
>>>
>>> Here is the code that I am using:
>>>
>>> public class SparkTest {
>>>
>>>
>>> public static void main(String[] args) {
>>>
>>> SparkConf conf = new SparkConf().setMaster("spark://
>>> 192.168.10.174:7077").setAppName("TestSpark");
>>> JavaSparkContext sc = new JavaSparkContext(conf);
>>>
>>> JavaRDD textFile = sc.textFile("sampleFile.txt");
>>> JavaRDD words = textFile.flatMap(new
>>> FlatMapFunction() {
>>> public Iterator call(String s) {
>>> return Arrays.asList(s.split(" ")).iterator();
>>> }
>>> });
>>> JavaPairRDD pairs = words.mapToPair(new
>>> PairFunction() {
>>> public Tuple2 call(String s) {
>>> return new Tuple2(s, 1);
>>> }
>>> });
>>> JavaPairRDD counts = pairs.reduceByKey(new
>>> Function2() {
>>> public Integer call(Integer a, Integer b) {
>>> return a + b;
>>> }
>>> });
>>> counts.saveAsTextFile("outputFile.txt");
>>>
>>> }
>>> }
>>>
>>> The content of the input file:
>>> Hello Spark
>>> Hi Spark
>>> Spark is running
>>>
>>>
>>> I am using the spark 2.0.1 dependency from maven.
>>>
>>> Thanks
>>> Vaibhav
>>>
>>> On 10 October 2016 at 12:37, Sudhanshu Janghel <
>>> sudhanshu.jang...@cloudwick.com> wrote:
>>>
>>> Seems like a straightforward error it's trying to cast something as a
>>> list which is not a list or cannot be casted.  Are you using standard
>>> example code? Can u send the input and code?
>>>
>>> On Oct 10, 2016 9:05 AM, "vaibhav thapliyal" <
>>> vaibhav.thapliyal...@gmail.com> wrote:
>>>
>>> Dear All,
>>>
>>> I am getting a ClassCastException Error when using the JAVA API to run
>>> the wordcount example from the docs.
>>>
>>> Here is the log that I got:
>>>
>>> 16/10/10 11:52:12 ERROR Executor: Exception in task 0.2 in stage 0.0 (TID 4)
>>> java.lang.ClassCastException: cannot assign instance of 
>>> scala.collection.immutable.List$SerializationProxy to field 
>>> org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type 
>>> scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
>>> at 
>>> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2083)
>>> at 
>>> java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)
>>> at 
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1996)
>>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>> at 
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>> at 
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>> at 
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>>> at 
>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
>>> at 
>>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
>>> at 
>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:71)
>>> at 
>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
>>>  

Re: ClassCastException while running a simple wordCount

2016-10-10 Thread Jakob Odersky
Ho do you submit the application? A version mismatch between the launcher,
driver and workers could lead to the bug you're seeing. A common reason for
a mismatch is if the SPARK_HOME environment variable is set. This will
cause the spark-submit script to use the launcher determined by that
environment variable, regardless of the directory from which it was called.

On Mon, Oct 10, 2016 at 3:42 AM, kant kodali  wrote:

> +1 Wooho I have the same problem. I have been trying hard to fix this.
>
>
>
> On Mon, Oct 10, 2016 3:23 AM, vaibhav thapliyal
> vaibhav.thapliyal...@gmail.com wrote:
>
>> Hi,
>> If I change the parameter inside the setMaster()  to "local", the program
>> runs. Is there something wrong with the cluster installation?
>>
>> I used the spark-2.0.1-bin-hadoop2.7.tgz package to install on my cluster
>> with default configuration.
>>
>> Thanks
>> Vaibhav
>>
>> On 10 Oct 2016 12:49, "vaibhav thapliyal" 
>> wrote:
>>
>> Here is the code that I am using:
>>
>> public class SparkTest {
>>
>>
>> public static void main(String[] args) {
>>
>> SparkConf conf = new SparkConf().setMaster("spark://
>> 192.168.10.174:7077").setAppName("TestSpark");
>> JavaSparkContext sc = new JavaSparkContext(conf);
>>
>> JavaRDD textFile = sc.textFile("sampleFile.txt");
>> JavaRDD words = textFile.flatMap(new
>> FlatMapFunction() {
>> public Iterator call(String s) {
>> return Arrays.asList(s.split(" ")).iterator();
>> }
>> });
>> JavaPairRDD pairs = words.mapToPair(new
>> PairFunction() {
>> public Tuple2 call(String s) {
>> return new Tuple2(s, 1);
>> }
>> });
>> JavaPairRDD counts = pairs.reduceByKey(new
>> Function2() {
>> public Integer call(Integer a, Integer b) {
>> return a + b;
>> }
>> });
>> counts.saveAsTextFile("outputFile.txt");
>>
>> }
>> }
>>
>> The content of the input file:
>> Hello Spark
>> Hi Spark
>> Spark is running
>>
>>
>> I am using the spark 2.0.1 dependency from maven.
>>
>> Thanks
>> Vaibhav
>>
>> On 10 October 2016 at 12:37, Sudhanshu Janghel <
>> sudhanshu.jang...@cloudwick.com> wrote:
>>
>> Seems like a straightforward error it's trying to cast something as a
>> list which is not a list or cannot be casted.  Are you using standard
>> example code? Can u send the input and code?
>>
>> On Oct 10, 2016 9:05 AM, "vaibhav thapliyal" <
>> vaibhav.thapliyal...@gmail.com> wrote:
>>
>> Dear All,
>>
>> I am getting a ClassCastException Error when using the JAVA API to run
>> the wordcount example from the docs.
>>
>> Here is the log that I got:
>>
>> 16/10/10 11:52:12 ERROR Executor: Exception in task 0.2 in stage 0.0 (TID 4)
>> java.lang.ClassCastException: cannot assign instance of 
>> scala.collection.immutable.List$SerializationProxy to field 
>> org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type 
>> scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
>>  at 
>> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2083)
>>  at 
>> java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)
>>  at 
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1996)
>>  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>  at 
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>  at 
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>>  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>  at 
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>>  at 
>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
>>  at 
>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
>>  at 
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:71)
>>  at 
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
>>  at org.apache.spark.scheduler.Task.run(Task.scala:86)
>>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>  at java.lang.Thread.run(Thread.java:745)
>> 16/10/10 11:52:12 ERROR Executor: