Don't know if this'll solve it, but if you're on Spark 1.1, the Cassandra
Connector version 1.1.0 final fixed the guava back compat issue. Maybe taking
the guava exclusions might help?
Date: Mon, 1 Dec 2014 10:48:25 +0100
Subject: Kryo exception for CassandraSQLRow
From: shahab.mok...@gmail.com
To: user@spark.apache.org
I am using Cassandra-Spark connector to pull data from Cassandra, process it
and write it back to Cassandra.
Now I am getting the following exception, and apparently it is Kryo
serialisation. Does anyone what is the reason and how this can be solved?
I also tried to register org.apache.spark.sql.cassandra.CassandraSQLRow in
kryo.register , but even this did not solve the problem and exception remains.
WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 7, ip-X-Y-Z):
com.esotericsoftware.kryo.KryoException: Unable to find class:
org.apache.spark.sql.cassandra.CassandraSQLRowSerialization trace:_2
(org.apache.spark.util.MutablePair)
com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:599)
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:133)
org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
org.apache.spark.storage.BlockManager$LazyProxyIterator$1.hasNext(BlockManager.scala:1171)
scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1218)
org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:904)
org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:904)
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1143)
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1143)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
I am using Spark 1.1.0 with cassandra-spark connector 1.1.0 , here is the
build:
org.apache.spark % spark-mllib_2.10 % 1.1.0
exclude(com.google.guava, guava),
com.google.guava % guava % 16.0 % provided,
com.datastax.spark %% spark-cassandra-connector % 1.1.0
exclude(com.google.guava, guava) withSources() withJavadoc(),
org.apache.cassandra % cassandra-all % 2.1.1
exclude(com.google.guava, guava) ,
org.apache.cassandra % cassandra-thrift % 2.1.1
exclude(com.google.guava, guava) ,
com.datastax.cassandra % cassandra-driver-core % 2.1.2
exclude(com.google.guava, guava) ,
org.apache.spark %% spark-core % 1.1.0 % provided
exclude(com.google.guava, guava) exclude(org.apache.hadoop,
hadoop-core),
org.apache.spark %% spark-streaming % 1.1.0 % provided
exclude(com.google.guava, guava),
org.apache.spark %% spark-catalyst % 1.1.0 % provided
exclude(com.google.guava, guava) exclude(org.apache.spark, spark-core),
org.apache.spark %% spark-sql % 1.1.0 % provided
exclude(com.google.guava, guava) exclude(org.apache.spark, spark-core),
org.apache.spark %% spark-hive % 1.1.0 % provided
exclude(com.google.guava, guava) exclude(org.apache.spark, spark-core),
org.apache.hadoop % hadoop-client % 1.0.4 % provided,
best,/Shahab