Re: Kyro deserialisation error

2014-09-12 Thread ayandas84
Hi,

I am also facing the same problem. Has any one found out the solution yet?

It just returns a vague set of characters.

Please help..


Exception in thread "main" org.apache.spark.SparkException: Job aborted due
to stage failure: Exception while deserializing and fetching task:
com.esotericsoftware.kryo.KryoException: Unable to find class: 

 "$&(*,.02468:<>@BDFHJLNPRTVXZ\^`bdfhjlnprtv=

 "$&(*,.02468:<>@BDFHJLNPRTVXZ\^`bdfhjlnprtv;

 "$&(*,.02468:<>@BDFHJLNPRTVXZ^`bdfhlnprtvD^bjlnpv=

 "$&(*,.02468:<>@BDFHJLNPRTVXZ\^`bdfhjlnprtv:

 "$&(*,.02468:<>@BDFHJNPRTVXZ\`bdfhjlnprtv=

 "$&(*,.02468:<>@BDFHJLNPRTVXZ\^`bdfhjlnprtv=

 "$&(*,.02468:<>@BDFHJLNPRTVXZ\^`bdfhjlnprtv=

 "$&(*,.02468:<>@BDFHJLNPRTVXZ\^`bdfhjlnprtv=

 "$&(*,.02468:<>@BDFHJLNPRTVXZ\^`bdfhjlnprtv=

 "$&(*,.02468:<>@BDFHJLNPRTVXZ\^`bdfhjlnprtv8@p=

 "$&(*,.02468:<>@BDFHJLNPRTVXZ\^`bdfhjlnprtv=

 "$&(*,.02468:<>@BDFHJLNPRTVXZ\^`bdfhjlnprtv=

 "$&(*,.02468:<>@BDFHJLNPRTVXZ\^`bdfhjlnprtvxz|~
Serialization trace:




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Kyro-deserialisation-error-tp6798p14071.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Kyro deserialisation error

2014-07-24 Thread Guillaume Pitel

Hi,

We've got the same problem here (randomly happens) :

Unable to 
find class: 6  4 ڗ4ڻ 8 &44ں*Q|T4⛇` j4 Ǥ4ꙴg8 
4 ¾4Ú»   4   4Ú» pE4ʽ4ں*WsѴμˁ4ڻ4ʤ4ցbל4ڻ&

4[͝4[ۦ44ڻ!~44ڻΡ4Ƈ4Pҍ4҇Ÿ%Q4ɋ4‚ifj4w4Y4ڻ*¸4☮”R4Ҳ؅”R4X4ڻ
4]5ᴁX^34l[?s4ƾ4ڻ8BH4Z4@4jჴ? 4ڻ 
7B4ٛƒ/v4ꃂE4뿁4J04릁4%44ؕ w\44 
Ӓ¯ٕ4ڻ/lv4ⴁ40喴Ƴ䂁4¸C4P4ڻ _o4lbʂԛ4각 
4^x4ڻ


Clearly a stream corruption problem.

We've been running fine (afaik) on 1.0.0 for two weeks, switch to 1.0.1 
this Monday, and since, this kind of problem randomly occur.



Guillaume Pitel

Not sure if this helps, but it does seem to be part of a name in a
Wikipedia article, and Wikipedia is the data set. So something is
reading this class name from the data.

http://en.wikipedia.org/wiki/Carl_Fridtjof_Rode

On Thu, Jul 17, 2014 at 9:40 AM, Tathagata Das
 wrote:

Seems like there is some sort of stream corruption, causing Kryo read to
read a weird class name from the stream (the name "arl Fridtjof Rode" in the
exception cannot be a class!).
Not sure how to debug this.

@Patrick: Any idea?



--
eXenSa


*Guillaume PITEL, Président*
+33(0)626 222 431

eXenSa S.A.S. 
41, rue Périer - 92120 Montrouge - FRANCE
Tel +33(0)184 163 677 / Fax +33(0)972 283 705



Re: Kyro deserialisation error

2014-07-17 Thread Hao Wang
Hi, all

Yes, it's a name of Wikipedia article. I am running WikipediaPageRank
example of Spark Bagels.
I am wondering whether there is any relation to buffer size of Kyro.

The page rank can be successfully finished, sometimes not because this kind
of Kyro exception happens too many times, which beats the maxTaskFailures.

I find this *Kyro exception: unable to find class *in my successful case,
too.


Regards,
Wang Hao(王灏)

CloudTeam | School of Software Engineering
Shanghai Jiao Tong University
Address:800 Dongchuan Road, Minhang District, Shanghai, 200240
Email:wh.s...@gmail.com


On Thu, Jul 17, 2014 at 4:44 PM, Sean Owen  wrote:

> Not sure if this helps, but it does seem to be part of a name in a
> Wikipedia article, and Wikipedia is the data set. So something is
> reading this class name from the data.
>
> http://en.wikipedia.org/wiki/Carl_Fridtjof_Rode
>
> On Thu, Jul 17, 2014 at 9:40 AM, Tathagata Das
>  wrote:
> > Seems like there is some sort of stream corruption, causing Kryo read to
> > read a weird class name from the stream (the name "arl Fridtjof Rode" in
> the
> > exception cannot be a class!).
> > Not sure how to debug this.
> >
> > @Patrick: Any idea?
>


Re: Kyro deserialisation error

2014-07-17 Thread Sean Owen
Not sure if this helps, but it does seem to be part of a name in a
Wikipedia article, and Wikipedia is the data set. So something is
reading this class name from the data.

http://en.wikipedia.org/wiki/Carl_Fridtjof_Rode

On Thu, Jul 17, 2014 at 9:40 AM, Tathagata Das
 wrote:
> Seems like there is some sort of stream corruption, causing Kryo read to
> read a weird class name from the stream (the name "arl Fridtjof Rode" in the
> exception cannot be a class!).
> Not sure how to debug this.
>
> @Patrick: Any idea?


Re: Kyro deserialisation error

2014-07-17 Thread Tathagata Das
Seems like there is some sort of stream corruption, causing Kryo read to
read a weird class name from the stream (the name "arl Fridtjof Rode" in
the exception cannot be a class!).
Not sure how to debug this.

@Patrick: Any idea?



On Wed, Jul 16, 2014 at 10:14 PM, Hao Wang  wrote:

> I am not sure. Not every task will fail at this Kyro exception. In most
> time, the cluster could successfully finish the WikipediaPageRank.
> How could I debug this exception?
>
> Thanks
>
> Regards,
> Wang Hao(王灏)
>
> CloudTeam | School of Software Engineering
> Shanghai Jiao Tong University
> Address:800 Dongchuan Road, Minhang District, Shanghai, 200240
> Email:wh.s...@gmail.com
>
>
> On Thu, Jul 17, 2014 at 2:58 AM, Tathagata Das <
> tathagata.das1...@gmail.com> wrote:
>
>> Is the class that is not found in the wikipediapagerank jar?
>>
>> TD
>>
>>
>> On Wed, Jul 16, 2014 at 12:32 AM, Hao Wang  wrote:
>>
>>> Thanks for your reply. The SparkContext is configured as below:
>>>
>>>
>>>
>>>  sparkConf.setAppName("WikipediaPageRank")
>>>
>>>
>>>
>>>
>>>
>>>
>>> sparkConf.set("spark.serializer", 
>>> "org.apache.spark.serializer.KryoSerializer")
>>>
>>>
>>>
>>>
>>>
>>>
>>> sparkConf.set("spark.kryo.registrator",  
>>> classOf[PRKryoRegistrator].getName)
>>>
>>>
>>>
>>>
>>>
>>>
>>> val inputFile = args(0)
>>>
>>>
>>>
>>>
>>>
>>>
>>> val threshold = args(1).toDouble
>>>
>>>
>>>
>>>
>>>
>>>
>>> val numPartitions = args(2).toInt
>>>
>>>
>>>
>>>
>>>
>>>
>>> val usePartitioner = args(3).toBoolean
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> sparkConf.setAppName("WikipediaPageRank")
>>>
>>>
>>>
>>>
>>>
>>>
>>> sparkConf.set("spark.executor.memory", "60g")
>>>
>>>
>>>
>>>
>>>
>>>
>>> sparkConf.set("spark.cores.max", "48")
>>>
>>>
>>>
>>>
>>>
>>>
>>> sparkConf.set("spark.kryoserializer.buffer.mb", "24")
>>>
>>>
>>>
>>>
>>>
>>>
>>> val sc = new SparkContext(sparkConf)
>>>
>>>
>>>
>>>
>>>
>>>
>>> 
>>> sc.addJar("~/Documents/Scala/WikiPageRank/target/scala-2.10/wikipagerank_2.10-1.0.jar")
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> And I use spark-submit to run the application:
>>>
>>>
>>>
>>>
>>>
>>>
>>> ./bin/spark-submit --master spark://sing12:7077  --total-executor-cores 40 
>>> --executor-memory 40g --class 
>>> org.apache.spark.examples.bagel.WikipediaPageRank 
>>> ~/Documents/Scala/WikiPageRank/target/scala-2.10/wikipagerank_2.10-1.0.jar 
>>> hdfs://192.168.1.12:9000/freebase-26G 1 200 True
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Regards,
>>> Wang Hao(王灏)
>>>
>>> CloudTeam | School of Software Engineering
>>> Shanghai Jiao Tong University
>>> Address:800 Dongchuan Road, Minhang District, Shanghai, 200240
>>> Email:wh.s...@gmail.com
>>>
>>>
>>> On Wed, Jul 16, 2014 at 1:41 PM, Tathagata Das <
>>> tathagata.das1...@gmail.com> wrote:
>>>
 Are you using classes from external libraries that have not been added
 to the sparkContext, using sparkcontext.addJar()?

 TD


 On Tue, Jul 15, 2014 at 8:36 PM, Hao Wang  wrote:

> I am running the WikipediaPageRank in Spark example and share the same
> problem with you:
>
> 4/07/16 11:31:06 DEBUG DAGScheduler: submitStage(Stage 6)
> 14/07/16 11:31:06 ERROR TaskSetManager: Task 6.0:450 failed 4 times;
> aborting job
> 14/07/16 11:31:06 INFO DAGScheduler: Failed to run foreach at
> Bagel.scala:251
> Exception in thread "main" 14/07/16 11:31:06 INFO TaskSchedulerImpl:
> Cancelling stage 6
> org.apache.spark.SparkException: Job aborted due to stage failure:
> Task 6.0:450 failed 4 times, most recent failure: Exception failure in TID
> 1330 on host sing11: com.esotericsoftware.kryo.KryoException: Unable to
> find class: arl Fridtjof Rode
>
> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
>
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
> com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
>
> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721)
>
> com.twitter.chill.TraversableSerializer.read(Traversable.scala:44)
>
> com.twitter.chill.TraversableSerializer.read(Traversable.scala:21)
>
> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
>
> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:115)
>
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:125)
>
> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
>
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>
> org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58)
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:96)
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairR

Re: Kyro deserialisation error

2014-07-16 Thread Hao Wang
I am not sure. Not every task will fail at this Kyro exception. In most
time, the cluster could successfully finish the WikipediaPageRank.
How could I debug this exception?

Thanks

Regards,
Wang Hao(王灏)

CloudTeam | School of Software Engineering
Shanghai Jiao Tong University
Address:800 Dongchuan Road, Minhang District, Shanghai, 200240
Email:wh.s...@gmail.com


On Thu, Jul 17, 2014 at 2:58 AM, Tathagata Das 
wrote:

> Is the class that is not found in the wikipediapagerank jar?
>
> TD
>
>
> On Wed, Jul 16, 2014 at 12:32 AM, Hao Wang  wrote:
>
>> Thanks for your reply. The SparkContext is configured as below:
>>
>>
>>  sparkConf.setAppName("WikipediaPageRank")
>>
>>
>>
>>
>> sparkConf.set("spark.serializer", 
>> "org.apache.spark.serializer.KryoSerializer")
>>
>>
>>
>>
>> sparkConf.set("spark.kryo.registrator",  
>> classOf[PRKryoRegistrator].getName)
>>
>>
>>
>>
>> val inputFile = args(0)
>>
>>
>>
>>
>> val threshold = args(1).toDouble
>>
>>
>>
>>
>> val numPartitions = args(2).toInt
>>
>>
>>
>>
>> val usePartitioner = args(3).toBoolean
>>
>>
>>
>>
>>
>> sparkConf.setAppName("WikipediaPageRank")
>>
>>
>>
>>
>> sparkConf.set("spark.executor.memory", "60g")
>>
>>
>>
>>
>> sparkConf.set("spark.cores.max", "48")
>>
>>
>>
>>
>> sparkConf.set("spark.kryoserializer.buffer.mb", "24")
>>
>>
>>
>>
>> val sc = new SparkContext(sparkConf)
>>
>>
>>
>>
>> 
>> sc.addJar("~/Documents/Scala/WikiPageRank/target/scala-2.10/wikipagerank_2.10-1.0.jar")
>>
>>
>>
>>
>>
>> And I use spark-submit to run the application:
>>
>>
>>
>>
>> ./bin/spark-submit --master spark://sing12:7077  --total-executor-cores 40 
>> --executor-memory 40g --class 
>> org.apache.spark.examples.bagel.WikipediaPageRank 
>> ~/Documents/Scala/WikiPageRank/target/scala-2.10/wikipagerank_2.10-1.0.jar 
>> hdfs://192.168.1.12:9000/freebase-26G 1 200 True
>>
>>
>>
>>
>>
>> Regards,
>> Wang Hao(王灏)
>>
>> CloudTeam | School of Software Engineering
>> Shanghai Jiao Tong University
>> Address:800 Dongchuan Road, Minhang District, Shanghai, 200240
>> Email:wh.s...@gmail.com
>>
>>
>> On Wed, Jul 16, 2014 at 1:41 PM, Tathagata Das <
>> tathagata.das1...@gmail.com> wrote:
>>
>>> Are you using classes from external libraries that have not been added
>>> to the sparkContext, using sparkcontext.addJar()?
>>>
>>> TD
>>>
>>>
>>> On Tue, Jul 15, 2014 at 8:36 PM, Hao Wang  wrote:
>>>
 I am running the WikipediaPageRank in Spark example and share the same
 problem with you:

 4/07/16 11:31:06 DEBUG DAGScheduler: submitStage(Stage 6)
 14/07/16 11:31:06 ERROR TaskSetManager: Task 6.0:450 failed 4 times;
 aborting job
 14/07/16 11:31:06 INFO DAGScheduler: Failed to run foreach at
 Bagel.scala:251
 Exception in thread "main" 14/07/16 11:31:06 INFO TaskSchedulerImpl:
 Cancelling stage 6
 org.apache.spark.SparkException: Job aborted due to stage failure: Task
 6.0:450 failed 4 times, most recent failure: Exception failure in TID 1330
 on host sing11: com.esotericsoftware.kryo.KryoException: Unable to find
 class: arl Fridtjof Rode

 com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)

 com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
 com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
 com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721)

 com.twitter.chill.TraversableSerializer.read(Traversable.scala:44)

 com.twitter.chill.TraversableSerializer.read(Traversable.scala:21)
 com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)

 org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:115)

 org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:125)

 org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)

 org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
 scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)

 org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58)

 org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:96)

 org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:95)
 org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:582)

 Anyone cloud help?

 Regards,
 Wang Hao(王灏)

 CloudTeam | School of Software Engineering
 Shanghai Jiao Tong University
 Address:800 Dongchuan Road, Minhang District, Shanghai, 200240
 Email:wh.s...@gmail.com


 On Tue, Jun 3, 2014 at 8:02 PM, Denes  wrote:

> I tried to use Kryo as a serialiser isn spark streaming, did everything
> according to the guide posted on the spark website, i.e. added the
> following
> lines:
>
> conf.set("spark.se

Re: Kyro deserialisation error

2014-07-16 Thread Tathagata Das
Is the class that is not found in the wikipediapagerank jar?

TD


On Wed, Jul 16, 2014 at 12:32 AM, Hao Wang  wrote:

> Thanks for your reply. The SparkContext is configured as below:
>
>
>  sparkConf.setAppName("WikipediaPageRank")
>
>
> sparkConf.set("spark.serializer", 
> "org.apache.spark.serializer.KryoSerializer")
>
>
> sparkConf.set("spark.kryo.registrator",  
> classOf[PRKryoRegistrator].getName)
>
>
> val inputFile = args(0)
>
>
> val threshold = args(1).toDouble
>
>
> val numPartitions = args(2).toInt
>
>
> val usePartitioner = args(3).toBoolean
>
>
>
> sparkConf.setAppName("WikipediaPageRank")
>
>
> sparkConf.set("spark.executor.memory", "60g")
>
>
> sparkConf.set("spark.cores.max", "48")
>
>
> sparkConf.set("spark.kryoserializer.buffer.mb", "24")
>
>
> val sc = new SparkContext(sparkConf)
>
>
> 
> sc.addJar("~/Documents/Scala/WikiPageRank/target/scala-2.10/wikipagerank_2.10-1.0.jar")
>
>
>
> And I use spark-submit to run the application:
>
>
> ./bin/spark-submit --master spark://sing12:7077  --total-executor-cores 40 
> --executor-memory 40g --class 
> org.apache.spark.examples.bagel.WikipediaPageRank 
> ~/Documents/Scala/WikiPageRank/target/scala-2.10/wikipagerank_2.10-1.0.jar 
> hdfs://192.168.1.12:9000/freebase-26G 1 200 True
>
>
>
> Regards,
> Wang Hao(王灏)
>
> CloudTeam | School of Software Engineering
> Shanghai Jiao Tong University
> Address:800 Dongchuan Road, Minhang District, Shanghai, 200240
> Email:wh.s...@gmail.com
>
>
> On Wed, Jul 16, 2014 at 1:41 PM, Tathagata Das <
> tathagata.das1...@gmail.com> wrote:
>
>> Are you using classes from external libraries that have not been added to
>> the sparkContext, using sparkcontext.addJar()?
>>
>> TD
>>
>>
>> On Tue, Jul 15, 2014 at 8:36 PM, Hao Wang  wrote:
>>
>>> I am running the WikipediaPageRank in Spark example and share the same
>>> problem with you:
>>>
>>> 4/07/16 11:31:06 DEBUG DAGScheduler: submitStage(Stage 6)
>>> 14/07/16 11:31:06 ERROR TaskSetManager: Task 6.0:450 failed 4 times;
>>> aborting job
>>> 14/07/16 11:31:06 INFO DAGScheduler: Failed to run foreach at
>>> Bagel.scala:251
>>> Exception in thread "main" 14/07/16 11:31:06 INFO TaskSchedulerImpl:
>>> Cancelling stage 6
>>> org.apache.spark.SparkException: Job aborted due to stage failure: Task
>>> 6.0:450 failed 4 times, most recent failure: Exception failure in TID 1330
>>> on host sing11: com.esotericsoftware.kryo.KryoException: Unable to find
>>> class: arl Fridtjof Rode
>>>
>>> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
>>>
>>> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
>>> com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
>>> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721)
>>>
>>> com.twitter.chill.TraversableSerializer.read(Traversable.scala:44)
>>>
>>> com.twitter.chill.TraversableSerializer.read(Traversable.scala:21)
>>> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
>>>
>>> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:115)
>>>
>>> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:125)
>>> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
>>>
>>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>>> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>>>
>>> org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58)
>>>
>>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:96)
>>>
>>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:95)
>>> org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:582)
>>>
>>> Anyone cloud help?
>>>
>>> Regards,
>>> Wang Hao(王灏)
>>>
>>> CloudTeam | School of Software Engineering
>>> Shanghai Jiao Tong University
>>> Address:800 Dongchuan Road, Minhang District, Shanghai, 200240
>>> Email:wh.s...@gmail.com
>>>
>>>
>>> On Tue, Jun 3, 2014 at 8:02 PM, Denes  wrote:
>>>
 I tried to use Kryo as a serialiser isn spark streaming, did everything
 according to the guide posted on the spark website, i.e. added the
 following
 lines:

 conf.set("spark.serializer",
 "org.apache.spark.serializer.KryoSerializer");
 conf.set("spark.kryo.registrator", "MyKryoRegistrator");

 I also added the necessary classes to the MyKryoRegistrator.

 However I get the following strange error, can someone help me out
 where to
 look for a solution?

 14/06/03 09:00:49 ERROR scheduler.JobScheduler: Error running job
 streaming
 job 140177880 ms.0
 org.apache.spark.SparkException: Job aborted due to stage failure:
 Exception
 while deserializing and fetching task:
 com.esotericsoftware.kryo.KryoException: Unable to find class: J
 Serialization trace:

Re: Kyro deserialisation error

2014-07-16 Thread Hao Wang
Thanks for your reply. The SparkContext is configured as below:

 sparkConf.setAppName("WikipediaPageRank")
sparkConf.set("spark.serializer",
"org.apache.spark.serializer.KryoSerializer")
sparkConf.set("spark.kryo.registrator",  classOf[PRKryoRegistrator].getName)
val inputFile = args(0)
val threshold = args(1).toDouble
val numPartitions = args(2).toInt
val usePartitioner = args(3).toBoolean

sparkConf.setAppName("WikipediaPageRank")
sparkConf.set("spark.executor.memory", "60g")
sparkConf.set("spark.cores.max", "48")
sparkConf.set("spark.kryoserializer.buffer.mb", "24")
val sc = new SparkContext(sparkConf)

sc.addJar("~/Documents/Scala/WikiPageRank/target/scala-2.10/wikipagerank_2.10-1.0.jar")

And I use spark-submit to run the application:
./bin/spark-submit --master spark://sing12:7077
--total-executor-cores 40 --executor-memory 40g --class
org.apache.spark.examples.bagel.WikipediaPageRank
~/Documents/Scala/WikiPageRank/target/scala-2.10/wikipagerank_2.10-1.0.jar
hdfs://192.168.1.12:9000/freebase-26G 1 200 True


Regards,
Wang Hao(王灏)

CloudTeam | School of Software Engineering
Shanghai Jiao Tong University
Address:800 Dongchuan Road, Minhang District, Shanghai, 200240
Email:wh.s...@gmail.com


On Wed, Jul 16, 2014 at 1:41 PM, Tathagata Das 
wrote:

> Are you using classes from external libraries that have not been added to
> the sparkContext, using sparkcontext.addJar()?
>
> TD
>
>
> On Tue, Jul 15, 2014 at 8:36 PM, Hao Wang  wrote:
>
>> I am running the WikipediaPageRank in Spark example and share the same
>> problem with you:
>>
>> 4/07/16 11:31:06 DEBUG DAGScheduler: submitStage(Stage 6)
>> 14/07/16 11:31:06 ERROR TaskSetManager: Task 6.0:450 failed 4 times;
>> aborting job
>> 14/07/16 11:31:06 INFO DAGScheduler: Failed to run foreach at
>> Bagel.scala:251
>> Exception in thread "main" 14/07/16 11:31:06 INFO TaskSchedulerImpl:
>> Cancelling stage 6
>> org.apache.spark.SparkException: Job aborted due to stage failure: Task
>> 6.0:450 failed 4 times, most recent failure: Exception failure in TID 1330
>> on host sing11: com.esotericsoftware.kryo.KryoException: Unable to find
>> class: arl Fridtjof Rode
>>
>> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
>>
>> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
>> com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
>> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721)
>> com.twitter.chill.TraversableSerializer.read(Traversable.scala:44)
>> com.twitter.chill.TraversableSerializer.read(Traversable.scala:21)
>> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
>>
>> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:115)
>>
>> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:125)
>> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
>>
>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>>
>> org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58)
>>
>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:96)
>>
>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:95)
>> org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:582)
>>
>> Anyone cloud help?
>>
>> Regards,
>> Wang Hao(王灏)
>>
>> CloudTeam | School of Software Engineering
>> Shanghai Jiao Tong University
>> Address:800 Dongchuan Road, Minhang District, Shanghai, 200240
>> Email:wh.s...@gmail.com
>>
>>
>> On Tue, Jun 3, 2014 at 8:02 PM, Denes  wrote:
>>
>>> I tried to use Kryo as a serialiser isn spark streaming, did everything
>>> according to the guide posted on the spark website, i.e. added the
>>> following
>>> lines:
>>>
>>> conf.set("spark.serializer",
>>> "org.apache.spark.serializer.KryoSerializer");
>>> conf.set("spark.kryo.registrator", "MyKryoRegistrator");
>>>
>>> I also added the necessary classes to the MyKryoRegistrator.
>>>
>>> However I get the following strange error, can someone help me out where
>>> to
>>> look for a solution?
>>>
>>> 14/06/03 09:00:49 ERROR scheduler.JobScheduler: Error running job
>>> streaming
>>> job 140177880 ms.0
>>> org.apache.spark.SparkException: Job aborted due to stage failure:
>>> Exception
>>> while deserializing and fetching task:
>>> com.esotericsoftware.kryo.KryoException: Unable to find class: J
>>> Serialization trace:
>>> id (org.apache.spark.storage.GetBlock)
>>> at
>>> org.apache.spark.scheduler.DAGScheduler.org
>>> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033)
>>> at
>>>
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017)
>>> at
>>>
>>> org.apache.spark.sche

Re: Kyro deserialisation error

2014-07-15 Thread Tathagata Das
Are you using classes from external libraries that have not been added to
the sparkContext, using sparkcontext.addJar()?

TD


On Tue, Jul 15, 2014 at 8:36 PM, Hao Wang  wrote:

> I am running the WikipediaPageRank in Spark example and share the same
> problem with you:
>
> 4/07/16 11:31:06 DEBUG DAGScheduler: submitStage(Stage 6)
> 14/07/16 11:31:06 ERROR TaskSetManager: Task 6.0:450 failed 4 times;
> aborting job
> 14/07/16 11:31:06 INFO DAGScheduler: Failed to run foreach at
> Bagel.scala:251
> Exception in thread "main" 14/07/16 11:31:06 INFO TaskSchedulerImpl:
> Cancelling stage 6
> org.apache.spark.SparkException: Job aborted due to stage failure: Task
> 6.0:450 failed 4 times, most recent failure: Exception failure in TID 1330
> on host sing11: com.esotericsoftware.kryo.KryoException: Unable to find
> class: arl Fridtjof Rode
>
> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
>
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
> com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721)
> com.twitter.chill.TraversableSerializer.read(Traversable.scala:44)
> com.twitter.chill.TraversableSerializer.read(Traversable.scala:21)
> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
>
> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:115)
>
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:125)
> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
>
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
> org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58)
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:96)
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:95)
> org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:582)
>
> Anyone cloud help?
>
> Regards,
> Wang Hao(王灏)
>
> CloudTeam | School of Software Engineering
> Shanghai Jiao Tong University
> Address:800 Dongchuan Road, Minhang District, Shanghai, 200240
> Email:wh.s...@gmail.com
>
>
> On Tue, Jun 3, 2014 at 8:02 PM, Denes  wrote:
>
>> I tried to use Kryo as a serialiser isn spark streaming, did everything
>> according to the guide posted on the spark website, i.e. added the
>> following
>> lines:
>>
>> conf.set("spark.serializer",
>> "org.apache.spark.serializer.KryoSerializer");
>> conf.set("spark.kryo.registrator", "MyKryoRegistrator");
>>
>> I also added the necessary classes to the MyKryoRegistrator.
>>
>> However I get the following strange error, can someone help me out where
>> to
>> look for a solution?
>>
>> 14/06/03 09:00:49 ERROR scheduler.JobScheduler: Error running job
>> streaming
>> job 140177880 ms.0
>> org.apache.spark.SparkException: Job aborted due to stage failure:
>> Exception
>> while deserializing and fetching task:
>> com.esotericsoftware.kryo.KryoException: Unable to find class: J
>> Serialization trace:
>> id (org.apache.spark.storage.GetBlock)
>> at
>> org.apache.spark.scheduler.DAGScheduler.org
>> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033)
>> at
>>
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017)
>> at
>>
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015)
>> at
>>
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>> at
>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>> at
>>
>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015)
>> at
>>
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)
>> at
>>
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)
>> at scala.Option.foreach(Option.scala:236)
>> at
>>
>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:633)
>> at
>>
>> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1207)
>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>> at
>>
>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>> at
>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>> at
>>
>> scala.concurrent.forkjoin.ForkJoinPool$WorkQ

Re: Kyro deserialisation error

2014-07-15 Thread Hao Wang
I am running the WikipediaPageRank in Spark example and share the same
problem with you:

4/07/16 11:31:06 DEBUG DAGScheduler: submitStage(Stage 6)
14/07/16 11:31:06 ERROR TaskSetManager: Task 6.0:450 failed 4 times;
aborting job
14/07/16 11:31:06 INFO DAGScheduler: Failed to run foreach at
Bagel.scala:251
Exception in thread "main" 14/07/16 11:31:06 INFO TaskSchedulerImpl:
Cancelling stage 6
org.apache.spark.SparkException: Job aborted due to stage failure: Task
6.0:450 failed 4 times, most recent failure: Exception failure in TID 1330
on host sing11: com.esotericsoftware.kryo.KryoException: Unable to find
class: arl Fridtjof Rode

com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)

com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721)
com.twitter.chill.TraversableSerializer.read(Traversable.scala:44)
com.twitter.chill.TraversableSerializer.read(Traversable.scala:21)
com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)

org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:115)

org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:125)
org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)

org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58)

org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:96)

org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:95)
org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:582)

Anyone cloud help?

Regards,
Wang Hao(王灏)

CloudTeam | School of Software Engineering
Shanghai Jiao Tong University
Address:800 Dongchuan Road, Minhang District, Shanghai, 200240
Email:wh.s...@gmail.com


On Tue, Jun 3, 2014 at 8:02 PM, Denes  wrote:

> I tried to use Kryo as a serialiser isn spark streaming, did everything
> according to the guide posted on the spark website, i.e. added the
> following
> lines:
>
> conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
> conf.set("spark.kryo.registrator", "MyKryoRegistrator");
>
> I also added the necessary classes to the MyKryoRegistrator.
>
> However I get the following strange error, can someone help me out where to
> look for a solution?
>
> 14/06/03 09:00:49 ERROR scheduler.JobScheduler: Error running job streaming
> job 140177880 ms.0
> org.apache.spark.SparkException: Job aborted due to stage failure:
> Exception
> while deserializing and fetching task:
> com.esotericsoftware.kryo.KryoException: Unable to find class: J
> Serialization trace:
> id (org.apache.spark.storage.GetBlock)
> at
> org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015)
> at
>
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)
> at scala.Option.foreach(Option.scala:236)
> at
>
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:633)
> at
>
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1207)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> at
>
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
>
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
>
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Kyro-deserialis