Re: Kryo serialization failed: Buffer overflow : Broadcast Join

2018-02-02 Thread Pralabh Kumar
I am using spark 2.1.0

On Fri, Feb 2, 2018 at 5:08 PM, Pralabh Kumar 
wrote:

> Hi
>
> I am performing broadcast join where my small table is 1 gb .  I am
> getting following error .
>
> I am using
>
>
> org.apache.spark.SparkException:
> . Available: 0, required: 28869232. To avoid this, increase
> spark.kryoserializer.buffer.max value
>
>
>
> I increase the value to
>
> spark.conf.set("spark.kryoserializer.buffer.max","2g")
>
>
> But I am still getting the error .
>
> Please help
>
> Thx
>


Re: Kryo not registered class

2017-11-20 Thread Vadim Semenov
Try:

Class.forName("[Lorg.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex$SerializableFileStatus$SerializableBlockLocation;")

On Sun, Nov 19, 2017 at 3:24 PM, Angel Francisco Orta <
angel.francisco.o...@gmail.com> wrote:

> Hello, I'm with spark 2.1.0 with scala and I'm registering all classes
> with kryo, and I have a  problem registering this class,
>
> org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex$
> SerializableFileStatus$SerializableBlockLocation[]
>
> I can't register with classOf[Array[Class.forName("org.apache.spark.sql.
> execution.datasources.PartitioningAwareFileIndex$SerializableFileStatus$
> SerializableBlockLocation").type]]
>
>
> I have tried as well creating a java class like register and registering
> the class as org.apache.spark.sql.execution.datasources.
> PartitioningAwareFileIndex$SerializableFileStatus$
> SerializableBlockLocation[].class;
>
> Any clue is appreciatted,
>
> Thanks.
>
>


Re: Kryo On Spark 1.6.0

2017-01-14 Thread Yan Facai
For scala, you could fix it by using:
conf.registerKryoClasses(Array(Class.forName("scala.collection.mutable.
WrappedArray$ofRef")))


By the way,
if the class is array of primitive class of Java, say byte[], then to use:
Class.forName("[B")

if it is array of other class, say scala.collection.mutable.WrappedArray$ofRef,
then to use:
Class.forName("[Lscala.collection.mutable.WrappedArray$ofRef")

ref:
https://docs.oracle.com/javase/8/docs/api/java/lang/Class.html#getName--





On Tue, Jan 10, 2017 at 11:11 PM, Yang Cao  wrote:

> If you don’t mind, could please share me with the scala solution? I tried
> to use kryo but seamed not work at all. I hope to get some practical
> example. THX
>
> On 2017年1月10日, at 19:10, Enrico DUrso  wrote:
>
> Hi,
>
> I am trying to use Kryo on Spark 1.6.0.
> I am able to register my own classes and it works, but when I set
> “spark.kryo.registrationRequired “ to true, I get an error about a scala
> class:
> “Class is not registered: scala.collection.mutable.WrappedArray$ofRef”.
>
> Any of you has already solved this issue in Java? I found the code to
> solve it in Scala, but unable to register this class in Java.
>
> Cheers,
>
> enrico
>
> --
>
> CONFIDENTIALITY WARNING.
> This message and the information contained in or attached to it are
> private and confidential and intended exclusively for the addressee. everis
> informs to whom it may receive it in error that it contains privileged
> information and its use, copy, reproduction or distribution is prohibited.
> If you are not an intended recipient of this E-mail, please notify the
> sender, delete it and do not read, act upon, print, disclose, copy, retain
> or redistribute any portion of this E-mail.
>
>
>


RE: Kryo On Spark 1.6.0 [Solution in this email]

2017-01-11 Thread Enrico DUrso
Yes sure,

you can find it here: 
http://stackoverflow.com/questions/34736587/kryo-serializer-causing-exception-on-underlying-scala-class-wrappedarray
hope it works, I did not try, I am using Java.
To be precise I found the solution for my problem:
To sum up, I had problems in registering the following class in Java:
“scala.collection.mutable.WrappedArray$ofRef”
The tip is:
Class a = Class.forName(“scala.collection.mutable.WrappedArray$ofRef”)
and then put a in the array of classes you are passing to the method 
registerKryoClasses()

From: Yang Cao [mailto:cybea...@gmail.com]
Sent: 10 January 2017 15:12
To: Enrico DUrso
Cc: user@spark.apache.org
Subject: Re: Kryo On Spark 1.6.0

If you don’t mind, could please share me with the scala solution? I tried to 
use kryo but seamed not work at all. I hope to get some practical example. THX
On 2017年1月10日, at 19:10, Enrico DUrso 
<enrico.du...@everis.com<mailto:enrico.du...@everis.com>> wrote:

Hi,

I am trying to use Kryo on Spark 1.6.0.
I am able to register my own classes and it works, but when I set 
“spark.kryo.registrationRequired “ to true, I get an error about a scala class:
“Class is not registered: scala.collection.mutable.WrappedArray$ofRef”.

Any of you has already solved this issue in Java? I found the code to solve it 
in Scala, but unable to register this class in Java.

Cheers,

enrico



CONFIDENTIALITY WARNING.
This message and the information contained in or attached to it are private and 
confidential and intended exclusively for the addressee. everis informs to whom 
it may receive it in error that it contains privileged information and its use, 
copy, reproduction or distribution is prohibited. If you are not an intended 
recipient of this E-mail, please notify the sender, delete it and do not read, 
act upon, print, disclose, copy, retain or redistribute any portion of this 
E-mail.




CONFIDENTIALITY WARNING.
This message and the information contained in or attached to it are private and 
confidential and intended exclusively for the addressee. everis informs to whom 
it may receive it in error that it contains privileged information and its use, 
copy, reproduction or distribution is prohibited. If you are not an intended 
recipient of this E-mail, please notify the sender, delete it and do not read, 
act upon, print, disclose, copy, retain or redistribute any portion of this 
E-mail.


Re: Kryo On Spark 1.6.0

2017-01-10 Thread Yang Cao
If you don’t mind, could please share me with the scala solution? I tried to 
use kryo but seamed not work at all. I hope to get some practical example. THX
> On 2017年1月10日, at 19:10, Enrico DUrso  wrote:
> 
> Hi,
> 
> I am trying to use Kryo on Spark 1.6.0.
> I am able to register my own classes and it works, but when I set 
> “spark.kryo.registrationRequired “ to true, I get an error about a scala 
> class:
> “Class is not registered: scala.collection.mutable.WrappedArray$ofRef”.
> 
> Any of you has already solved this issue in Java? I found the code to solve 
> it in Scala, but unable to register this class in Java.
> 
> Cheers,
> 
> enrico
> 
> 
> CONFIDENTIALITY WARNING.
> This message and the information contained in or attached to it are private 
> and confidential and intended exclusively for the addressee. everis informs 
> to whom it may receive it in error that it contains privileged information 
> and its use, copy, reproduction or distribution is prohibited. If you are not 
> an intended recipient of this E-mail, please notify the sender, delete it and 
> do not read, act upon, print, disclose, copy, retain or redistribute any 
> portion of this E-mail.



RE: Kryo On Spark 1.6.0

2017-01-10 Thread Enrico DUrso
Hi,

I agree with you Richard.
The point is that, looks like some classes which are used internally by Spark 
are not registered (for instance, the one I mentioned in the previous email is 
something I am not
directly using).
For those classes the serialization performance will be poor in according to 
how Spark works.
How can I register all those classes?

cheers,

From: Richard Startin [mailto:richardstar...@outlook.com]
Sent: 10 January 2017 11:18
To: Enrico DUrso; user@spark.apache.org
Subject: Re: Kryo On Spark 1.6.0


Hi Enrico,



Only set spark.kryo.registrationRequired if you want to forbid any classes you 
have not explicitly registered - see 
http://spark.apache.org/docs/latest/configuration.html.
Configuration - Spark 2.0.2 
Documentation<http://spark.apache.org/docs/latest/configuration.html>
spark.apache.org
Spark Configuration. Spark Properties. Dynamically Loading Spark Properties; 
Viewing Spark Properties; Available Properties. Application Properties; Runtime 
Environment

To enable kryo, you just need 
spark.serializer=org.apache.spark.serializer.KryoSerializer. There is some info 
here - http://spark.apache.org/docs/latest/tuning.html

Cheers,
Richard



https://richardstartin.com/


From: Enrico DUrso <enrico.du...@everis.com<mailto:enrico.du...@everis.com>>
Sent: 10 January 2017 11:10
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Kryo On Spark 1.6.0


Hi,

I am trying to use Kryo on Spark 1.6.0.
I am able to register my own classes and it works, but when I set 
"spark.kryo.registrationRequired " to true, I get an error about a scala class:
"Class is not registered: scala.collection.mutable.WrappedArray$ofRef".

Any of you has already solved this issue in Java? I found the code to solve it 
in Scala, but unable to register this class in Java.

Cheers,

enrico



CONFIDENTIALITY WARNING.
This message and the information contained in or attached to it are private and 
confidential and intended exclusively for the addressee. everis informs to whom 
it may receive it in error that it contains privileged information and its use, 
copy, reproduction or distribution is prohibited. If you are not an intended 
recipient of this E-mail, please notify the sender, delete it and do not read, 
act upon, print, disclose, copy, retain or redistribute any portion of this 
E-mail.



CONFIDENTIALITY WARNING.
This message and the information contained in or attached to it are private and 
confidential and intended exclusively for the addressee. everis informs to whom 
it may receive it in error that it contains privileged information and its use, 
copy, reproduction or distribution is prohibited. If you are not an intended 
recipient of this E-mail, please notify the sender, delete it and do not read, 
act upon, print, disclose, copy, retain or redistribute any portion of this 
E-mail.


Re: Kryo On Spark 1.6.0

2017-01-10 Thread Richard Startin
Hi Enrico,


Only set spark.kryo.registrationRequired if you want to forbid any classes you 
have not explicitly registered - see 
http://spark.apache.org/docs/latest/configuration.html.

Configuration - Spark 2.0.2 
Documentation
spark.apache.org
Spark Configuration. Spark Properties. Dynamically Loading Spark Properties; 
Viewing Spark Properties; Available Properties. Application Properties; Runtime 
Environment

To enable kryo, you just need 
spark.serializer=org.apache.spark.serializer.KryoSerializer. There is some info 
here - http://spark.apache.org/docs/latest/tuning.html

Cheers,
Richard



https://richardstartin.com/



From: Enrico DUrso 
Sent: 10 January 2017 11:10
To: user@spark.apache.org
Subject: Kryo On Spark 1.6.0


Hi,

I am trying to use Kryo on Spark 1.6.0.
I am able to register my own classes and it works, but when I set 
"spark.kryo.registrationRequired " to true, I get an error about a scala class:
"Class is not registered: scala.collection.mutable.WrappedArray$ofRef".

Any of you has already solved this issue in Java? I found the code to solve it 
in Scala, but unable to register this class in Java.

Cheers,

enrico



CONFIDENTIALITY WARNING.
This message and the information contained in or attached to it are private and 
confidential and intended exclusively for the addressee. everis informs to whom 
it may receive it in error that it contains privileged information and its use, 
copy, reproduction or distribution is prohibited. If you are not an intended 
recipient of this E-mail, please notify the sender, delete it and do not read, 
act upon, print, disclose, copy, retain or redistribute any portion of this 
E-mail.


Re: Kryo ClassCastException during Serialization/deserialization in Spark Streaming

2016-06-23 Thread swetha kasireddy
sampleMap is populated from inside a method that is getting called from
updateStateByKey

On Thu, Jun 23, 2016 at 1:13 PM, Ted Yu  wrote:

> Can you illustrate how sampleMap is populated ?
>
> Thanks
>
> On Thu, Jun 23, 2016 at 12:34 PM, SRK  wrote:
>
>> Hi,
>>
>> I keep getting the following error in my Spark Streaming every now and
>> then
>> after the  job runs for say around 10 hours. I have those 2 classes
>> registered in kryo as shown below.  sampleMap is a field in SampleSession
>> as shown below. Any suggestion as to how to avoid this would be of great
>> help!!
>>
>> public class SampleSession implements Serializable, Cloneable{
>>private Map sampleMap;
>> }
>>
>>  sparkConf.registerKryoClasses(Array( classOf[SampleSession],
>> classOf[Sample]))
>>
>>
>>
>> com.esotericsoftware.kryo.KryoException: java.lang.ClassCastException:
>> com.test.Sample cannot be cast to java.lang.String
>> Serialization trace:
>> sampleMap (com.test.SampleSession)
>> at
>>
>> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:585)
>> at
>>
>> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
>> at
>> com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
>> at
>> com.twitter.chill.Tuple6Serializer.write(TupleSerializers.scala:96)
>> at
>> com.twitter.chill.Tuple6Serializer.write(TupleSerializers.scala:93)
>> at
>> com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
>> at
>> com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:37)
>> at
>> com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:33)
>> at
>> com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
>> at
>>
>> org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:158)
>> at
>>
>> org.apache.spark.serializer.SerializationStream.writeAll(Serializer.scala:153)
>> at
>>
>> org.apache.spark.storage.BlockManager.dataSerializeStream(BlockManager.scala:1190)
>> at
>>
>> org.apache.spark.storage.BlockManager.dataSerialize(BlockManager.scala:1199)
>> at
>> org.apache.spark.storage.MemoryStore.putArray(MemoryStore.scala:132)
>> at
>> org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:793)
>> at
>> org.apache.spark.storage.BlockManager.putArray(BlockManager.scala:669)
>> at
>> org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:175)
>> at
>> org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>> at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>> at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>> at org.apache.spark.scheduler.Task.run(Task.scala:88)
>> at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>> at
>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> Caused by: java.lang.ClassCastException: com.test.Sample cannot be cast to
>> java.lang.String
>> at
>>
>> com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:146)
>> at com.esotericsoftware.kryo.Kryo.writeObjectOrNull(Kryo.java:549)
>> at
>>
>> com.esotericsoftware.kryo.serializers.MapSerializer.write(MapSerializer.java:82)
>> at
>>
>> com.esotericsoftware.kryo.serializers.MapSerializer.write(MapSerializer.java:17)
>> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501)
>> at
>>
>> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
>> ... 37 more
>>
>>
>>
>> --
>> View 

Re: Kryo ClassCastException during Serialization/deserialization in Spark Streaming

2016-06-23 Thread Ted Yu
Can you illustrate how sampleMap is populated ?

Thanks

On Thu, Jun 23, 2016 at 12:34 PM, SRK  wrote:

> Hi,
>
> I keep getting the following error in my Spark Streaming every now and then
> after the  job runs for say around 10 hours. I have those 2 classes
> registered in kryo as shown below.  sampleMap is a field in SampleSession
> as shown below. Any suggestion as to how to avoid this would be of great
> help!!
>
> public class SampleSession implements Serializable, Cloneable{
>private Map sampleMap;
> }
>
>  sparkConf.registerKryoClasses(Array( classOf[SampleSession],
> classOf[Sample]))
>
>
>
> com.esotericsoftware.kryo.KryoException: java.lang.ClassCastException:
> com.test.Sample cannot be cast to java.lang.String
> Serialization trace:
> sampleMap (com.test.SampleSession)
> at
>
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:585)
> at
>
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
> at
> com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
> at
> com.twitter.chill.Tuple6Serializer.write(TupleSerializers.scala:96)
> at
> com.twitter.chill.Tuple6Serializer.write(TupleSerializers.scala:93)
> at
> com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
> at
> com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:37)
> at
> com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:33)
> at
> com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
> at
>
> org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:158)
> at
>
> org.apache.spark.serializer.SerializationStream.writeAll(Serializer.scala:153)
> at
>
> org.apache.spark.storage.BlockManager.dataSerializeStream(BlockManager.scala:1190)
> at
>
> org.apache.spark.storage.BlockManager.dataSerialize(BlockManager.scala:1199)
> at
> org.apache.spark.storage.MemoryStore.putArray(MemoryStore.scala:132)
> at
> org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:793)
> at
> org.apache.spark.storage.BlockManager.putArray(BlockManager.scala:669)
> at
> org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:175)
> at
> org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
> Caused by: java.lang.ClassCastException: com.test.Sample cannot be cast to
> java.lang.String
> at
>
> com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:146)
> at com.esotericsoftware.kryo.Kryo.writeObjectOrNull(Kryo.java:549)
> at
>
> com.esotericsoftware.kryo.serializers.MapSerializer.write(MapSerializer.java:82)
> at
>
> com.esotericsoftware.kryo.serializers.MapSerializer.write(MapSerializer.java:17)
> at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501)
> at
>
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564)
> ... 37 more
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-ClassCastException-during-Serialization-deserialization-in-Spark-Streaming-tp27219.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> 

Re: kryo

2016-05-12 Thread Ted Yu
This should be related:

https://github.com/JodaOrg/joda-time/issues/307

Do you have more of the stack trace ?

Cheers

On Thu, May 12, 2016 at 12:39 PM, Younes Naguib <
younes.nag...@tritondigital.com> wrote:

> Thanks,
>
> I used that.
>
> Now I seem to have the following problem:
>
> java.lang.NullPointerException
>
> at
> org.joda.time.tz.CachedDateTimeZone.getInfo(CachedDateTimeZone.java:143)
>
> at
> org.joda.time.tz.CachedDateTimeZone.getOffset(CachedDateTimeZone.java:103)
>
> at
> org.joda.time.DateTimeZone.convertUTCToLocal(DateTimeZone.java:925)
>
>
>
>
>
> Any ideas?
>
>
>
> Thanks
>
>
>
>
>
> *From:* Ted Yu [mailto:yuzhih...@gmail.com]
> *Sent:* May-11-16 5:32 PM
> *To:* Younes Naguib
> *Cc:* user@spark.apache.org
> *Subject:* Re: kryo
>
>
>
> Have you seen this thread ?
>
>
>
>
> http://search-hadoop.com/m/q3RTtpO0qI3cp06/JodaDateTimeSerializer+spark=Re+NPE+when+using+Joda+DateTime
>
>
>
> On Wed, May 11, 2016 at 2:18 PM, Younes Naguib <
> younes.nag...@tritondigital.com> wrote:
>
> Hi all,
>
> I'm trying to get to use spark.serializer.
> I set it in the spark-default.conf, but I statred getting issues with
> datetimes.
>
> As I understand, I need to disable it.
> Anyways to keep using kryo?
>
> It's seems I can use JodaDateTimeSerializer for datetimes, just not sure
> how to set it, and register it in the spark-default conf.
>
> Thanks,
>
> *Younes Naguib* <younes.nag...@streamtheworld.com>
>
>
>


RE: kryo

2016-05-12 Thread Younes Naguib
Thanks,
I used that.
Now I seem to have the following problem:
java.lang.NullPointerException
at 
org.joda.time.tz.CachedDateTimeZone.getInfo(CachedDateTimeZone.java:143)
at 
org.joda.time.tz.CachedDateTimeZone.getOffset(CachedDateTimeZone.java:103)
at org.joda.time.DateTimeZone.convertUTCToLocal(DateTimeZone.java:925)


Any ideas?

Thanks


From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: May-11-16 5:32 PM
To: Younes Naguib
Cc: user@spark.apache.org
Subject: Re: kryo

Have you seen this thread ?

http://search-hadoop.com/m/q3RTtpO0qI3cp06/JodaDateTimeSerializer+spark=Re+NPE+when+using+Joda+DateTime

On Wed, May 11, 2016 at 2:18 PM, Younes Naguib 
<younes.nag...@tritondigital.com<mailto:younes.nag...@tritondigital.com>> wrote:
Hi all,

I'm trying to get to use spark.serializer.
I set it in the spark-default.conf, but I statred getting issues with datetimes.

As I understand, I need to disable it.
Anyways to keep using kryo?

It's seems I can use JodaDateTimeSerializer for datetimes, just not sure how to 
set it, and register it in the spark-default conf.

Thanks,
Younes Naguib <mailto:younes.nag...@streamtheworld.com>



Re: kryo

2016-05-11 Thread Ted Yu
Have you seen this thread ?

http://search-hadoop.com/m/q3RTtpO0qI3cp06/JodaDateTimeSerializer+spark=Re+NPE+when+using+Joda+DateTime

On Wed, May 11, 2016 at 2:18 PM, Younes Naguib <
younes.nag...@tritondigital.com> wrote:

> Hi all,
>
> I'm trying to get to use spark.serializer.
> I set it in the spark-default.conf, but I statred getting issues with
> datetimes.
>
> As I understand, I need to disable it.
> Anyways to keep using kryo?
>
> It's seems I can use JodaDateTimeSerializer for datetimes, just not sure
> how to set it, and register it in the spark-default conf.
>
> Thanks,
>
> *Younes Naguib* 
>


Re: Kryo serialization mismatch in spark sql windowing function

2016-04-06 Thread Soam Acharya
Hi Josh,

Appreciate the response! Also, Steve - we meet again :) At any rate, here's
the output (a lot of it anyway) of running spark-sql with the verbose
option so that you can get a sense of the settings and the classpath. Does
anything stand out?

Using properties file: /opt/spark/conf/spark-defaults.conf
Adding default property: spark.port.maxRetries=999
Adding default property: spark.broadcast.port=45200
Adding default property:
spark.executor.extraJavaOptions=-XX:+PrintReferenceGC -verbose:gc
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy
-Djava.library.path=/opt/hadoop/lib/native/
Adding default property:
spark.history.fs.logDirectory=hdfs:///logs/spark-history
Adding default property: spark.eventLog.enabled=true
Adding default property: spark.ui.port=45100
Adding default property: spark.driver.port=45055
Adding default property: spark.executor.port=45250
Adding default property: spark.logConf=true
Adding default property: spark.replClassServer.port=45070
Adding default property: spark.blockManager.port=45300
Adding default property: spark.fileserver.port=45090
Adding default property: spark.history.retainedApplications=999
Adding default property: spark.eventLog.dir=hdfs:///logs/spark-history
Adding default property: spark.history.ui.port=18080
Adding default property: spark.shuffle.consolidateFiles=true
Parsed arguments:
  master  yarn
  deployMode  client
  executorMemory  1G
  executorCores   2
  totalExecutorCores  null
  propertiesFile  /opt/spark/conf/spark-defaults.conf
  driverMemory1G
  driverCores null
  driverExtraClassPath
 

Re: Kryo serialization mismatch in spark sql windowing function

2016-04-06 Thread Josh Rosen
Spark is compiled against a custom fork of Hive 1.2.1 which added shading
of Protobuf and removed shading of Kryo. What I think that what's happening
here is that stock Hive 1.2.1 is taking precedence so the Kryo instance
that it's returning is an instance of shaded/relocated Hive version rather
than the unshaded, stock Kryo that Spark is expecting here.

I just so happen to have a patch which reintroduces the shading of Kryo
(motivated by other factors): https://github.com/apache/spark/pull/12215;
there's a chance that a backport of this patch might fix this problem.

However, I'm a bit curious about how your classpath is set up and why stock
1.2.1's shaded Kryo is being used here.

/cc +Marcelo Vanzin  and +Steve Loughran
, who may know more.

On Wed, Apr 6, 2016 at 6:08 PM Soam Acharya  wrote:

> Hi folks,
>
> I have a build of Spark 1.6.1 on which spark sql seems to be functional
> outside of windowing functions. For example, I can create a simple external
> table via Hive:
>
> CREATE EXTERNAL TABLE PSTable (pid int, tty string, time string, cmd
> string)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> LINES TERMINATED BY '\n'
> STORED AS TEXTFILE
> LOCATION '/user/test/ps';
>
> Ensure that the table is pointing to some valid data, set up spark sql to
> point to the Hive metastore (we're running Hive 1.2.1) and run a basic test:
>
> spark-sql> select * from PSTable;
> 7239pts/0   00:24:31java
> 9993pts/9   00:00:00ps
> 9994pts/9   00:00:00tail
> 9995pts/9   00:00:00sed
> 9996pts/9   00:00:00sed
>
> But when I try to run a windowing function which I know runs onHive, I get:
>
> spark-sql> select a.pid ,a.time, a.cmd, min(a.time) over (partition by
> a.cmd order by a.time ) from PSTable a;
> org.apache.spark.SparkException: Task not serializable
> at
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
> at
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
> at
> org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
> at org.apache.spark.SparkContext.clean(SparkContext.scala:2055)
> :
> :
> Caused by: java.lang.ClassCastException:
> org.apache.hive.com.esotericsoftware.kryo.Kryo cannot be cast to
> com.esotericsoftware.kryo.Kryo
> at
> org.apache.spark.sql.hive.HiveShim$HiveFunctionWrapper.serializePlan(HiveShim.scala:178)
> at
> org.apache.spark.sql.hive.HiveShim$HiveFunctionWrapper.writeExternal(HiveShim.scala:191)
> at
> java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1458)
> at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1429)
> at
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
>
> Any thoughts or ideas would be appreciated!
>
> Regards,
>
> Soam
>


Re: Kryo serializer Exception during serialization: java.io.IOException: java.lang.IllegalArgumentException:

2016-01-08 Thread Shixiong(Ryan) Zhu
Could you disable `spark.kryo.registrationRequired`? Some classes may not
be registered but they work well with Kryo's default serializer.

On Fri, Jan 8, 2016 at 8:58 AM, Ted Yu  wrote:

> bq. try adding scala.collection.mutable.WrappedArray
>
> But the hint said registering 
> scala.collection.mutable.WrappedArray$ofRef.class
> , right ?
>
> On Fri, Jan 8, 2016 at 8:52 AM, jiml  wrote:
>
>> (point of post is to see if anyone has ideas about errors at end of post)
>>
>> In addition, the real way to test if it's working is to force
>> serialization:
>>
>> In Java:
>>
>> Create array of all your classes:
>> // for kyro serializer it wants to register all classes that need to be
>> serialized
>> Class[] kryoClassArray = new Class[]{DropResult.class,
>> DropEvaluation.class,
>> PrintHetSharing.class};
>>
>> in the builder for your SparkConf (or in conf/spark-defaults.sh)
>> .set("spark.serializer",
>> "org.apache.spark.serializer.KryoSerializer")
>> //require registration of all classes with Kyro
>> .set("spark.kryo.registrationRequired", "true")
>> // don't forget to register ALL classes or will get error
>> .registerKryoClasses(kryoClassArray);
>>
>> Then you will start to get neat errors like the one I am working on:
>>
>> Exception in thread "main" org.apache.spark.SparkException: Job aborted
>> due
>> to stage failure: Failed to serialize task 0, not attempting to retry it.
>> Exception during serialization: java.io.IOException:
>> java.lang.IllegalArgumentException: Class is not registered:
>> scala.collection.mutable.WrappedArray$ofRef
>> Note: To register this class use:
>> kryo.register(scala.collection.mutable.WrappedArray$ofRef.class);
>>
>> I did try adding scala.collection.mutable.WrappedArray to the Class array
>> up
>> top but no luck. Thanks
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/kryos-serializer-tp16454p25921.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


Re: Kryo serializer Exception during serialization: java.io.IOException: java.lang.IllegalArgumentException:

2016-01-08 Thread jiml
(point of post is to see if anyone has ideas about errors at end of post)

In addition, the real way to test if it's working is to force serialization:

In Java:

Create array of all your classes:
// for kyro serializer it wants to register all classes that need to be
serialized
Class[] kryoClassArray = new Class[]{DropResult.class, DropEvaluation.class,
PrintHetSharing.class};

in the builder for your SparkConf (or in conf/spark-defaults.sh)
.set("spark.serializer",
"org.apache.spark.serializer.KryoSerializer")
//require registration of all classes with Kyro
.set("spark.kryo.registrationRequired", "true")
// don't forget to register ALL classes or will get error
.registerKryoClasses(kryoClassArray);

Then you will start to get neat errors like the one I am working on:

Exception in thread "main" org.apache.spark.SparkException: Job aborted due
to stage failure: Failed to serialize task 0, not attempting to retry it.
Exception during serialization: java.io.IOException:
java.lang.IllegalArgumentException: Class is not registered:
scala.collection.mutable.WrappedArray$ofRef
Note: To register this class use:
kryo.register(scala.collection.mutable.WrappedArray$ofRef.class);

I did try adding scala.collection.mutable.WrappedArray to the Class array up
top but no luck. Thanks





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/kryos-serializer-tp16454p25921.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Kryo serializer Exception during serialization: java.io.IOException: java.lang.IllegalArgumentException:

2016-01-08 Thread Ted Yu
bq. try adding scala.collection.mutable.WrappedArray

But the hint said registering scala.collection.mutable.WrappedArray$ofRef.class
, right ?

On Fri, Jan 8, 2016 at 8:52 AM, jiml  wrote:

> (point of post is to see if anyone has ideas about errors at end of post)
>
> In addition, the real way to test if it's working is to force
> serialization:
>
> In Java:
>
> Create array of all your classes:
> // for kyro serializer it wants to register all classes that need to be
> serialized
> Class[] kryoClassArray = new Class[]{DropResult.class,
> DropEvaluation.class,
> PrintHetSharing.class};
>
> in the builder for your SparkConf (or in conf/spark-defaults.sh)
> .set("spark.serializer",
> "org.apache.spark.serializer.KryoSerializer")
> //require registration of all classes with Kyro
> .set("spark.kryo.registrationRequired", "true")
> // don't forget to register ALL classes or will get error
> .registerKryoClasses(kryoClassArray);
>
> Then you will start to get neat errors like the one I am working on:
>
> Exception in thread "main" org.apache.spark.SparkException: Job aborted due
> to stage failure: Failed to serialize task 0, not attempting to retry it.
> Exception during serialization: java.io.IOException:
> java.lang.IllegalArgumentException: Class is not registered:
> scala.collection.mutable.WrappedArray$ofRef
> Note: To register this class use:
> kryo.register(scala.collection.mutable.WrappedArray$ofRef.class);
>
> I did try adding scala.collection.mutable.WrappedArray to the Class array
> up
> top but no luck. Thanks
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/kryos-serializer-tp16454p25921.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Kryo serialization fails when using SparkSQL and HiveContext

2015-12-14 Thread Michael Armbrust
You'll need to either turn off registration
(spark.kryo.registrationRequired) or create a custom register
spark.kryo.registrator

http://spark.apache.org/docs/latest/configuration.html#compression-and-serialization

On Mon, Dec 14, 2015 at 2:17 AM, Linh M. Tran 
wrote:

> Hi everyone,
> I'm using HiveContext and SparkSQL to query a Hive table and doing join
> operation on it.
> After changing the default serializer to Kryo with
> spark.kryo.registrationRequired = true, the Spark application failed with
> the following error:
>
> java.lang.IllegalArgumentException: Class is not registered:
> org.apache.spark.sql.catalyst.expressions.GenericRow
> Note: To register this class use:
> kryo.register(org.apache.spark.sql.catalyst.expressions.GenericRow.class);
> at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:442)
> at
> com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:79)
> at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:472)
> at
> com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:565)
> at
> com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:36)
> at
> com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:33)
> at
> com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
> at
> org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:124)
> at
> org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:204)
> at
> org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:375)
> at
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211)
> at
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:64)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
> I'm using Spark 1.3.1 (HDP 2.3.0) and submitting Spark application to Yarn
> in cluster mode.
> Any help is appreciated.
> --
> Linh M. Tran
>


Re: Kryo Serialization in Spark

2015-12-10 Thread manasdebashiskar
Are you sure you are using Kryo serialization. 
You are getting a java serialization error.
Are you setting up your sparkcontext with kryo serialization enabled?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-Serialization-in-Spark-tp25628p25678.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Kryo Serializer on Worker doesn't work by default.

2015-07-08 Thread Eugene Morozov
What I seem to be don’t get is how my code ends up being on Worker node.

My understanding was that jar file, which I use to start the job should 
automatically be copied into Worker nodes and added to classpath. It seems to 
be not the case. But if my jar is not copied into Worker nodes, then how my 
code is able to run on Workers? I know that Driver serializes my functions and 
send them to Workers, but most of my functions are using some other classes 
(from the jar). It seems that along with functions code, Driver has to be able 
to copy other classes, too. Is that so?

That actually might explain, why KryoRegistrator is not being found on Worker - 
there are no functions, which use it directly, so it never copied to Workers.

Could you, please, explain of how code is end up on Worker or give me a hint 
where I can find it in the sources?

On 08 Jul 2015, at 17:40, Eugene Morozov fathers...@list.ru wrote:

 Hello.
 
 I have an issue with CustomKryoRegistrator, which causes ClassNotFound on 
 Worker. 
 The issue is resolved if call SparkConf.setJar with path to the same jar I 
 run.
 
 It is a workaround, but it requires to specify the same jar file twice. The 
 first time I use it to actually run the job, and second time in properties 
 file, which looks weird and unclear to as why I should do that. 
 
 What is the reason for it? I thought the jar file has to be copied into all 
 Worker nodes (or else it’s not possible to run the job on Wokrers). Can 
 anyone shed some light on this?
 
 Thanks
 --
 Eugene Morozov
 fathers...@list.ru
 
 
 
 

Eugene Morozov
fathers...@list.ru






Re: Kryo fails to serialise output

2015-07-03 Thread Will Briggs
Kryo serialization is used internally by Spark for spilling or shuffling 
intermediate results, not for writing out an RDD as an action. Look at Sandy 
Ryza's examples for some hints on how to do this: 
https://github.com/sryza/simplesparkavroapp

Regards,
Will

On July 3, 2015, at 2:45 AM, Dominik Hübner cont...@dhuebner.com wrote:

I have a rather simple avro schema to serialize Tweets (message, username, 
timestamp).
Kryo and twitter chill are used to do so.

For my dev environment the Spark context is configured as below

val conf: SparkConf = new SparkConf()
conf.setAppName(kryo_test)
conf.setMaster(“local[4])
conf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer)
conf.set(spark.kryo.registrator, co.feeb.TweetRegistrator”)

Serialization is setup with

override def registerClasses(kryo: Kryo): Unit = {
kryo.register(classOf[Tweet], 
AvroSerializer.SpecificRecordBinarySerializer[Tweet])
}

(This method gets called)


Using this configuration to persist some object fails with 
java.io.NotSerializableException: co.feeb.avro.Tweet 
(which seems to be ok as this class is not Serializable)

I used the following code:

val ctx: SparkContext = new SparkContext(conf)
val tweets: RDD[Tweet] = ctx.parallelize(List(
new Tweet(a, b, 1L),
new Tweet(c, d, 2L),
new Tweet(e, f, 3L)
  )
)

tweets.saveAsObjectFile(file:///tmp/spark”)

Using saveAsTextFile works, but persisted files are not binary but JSON

cat /tmp/spark/part-0
{username: a, text: b, timestamp: 1}
{username: c, text: d, timestamp: 2}
{username: e, text: f, timestamp: 3}

Is this intended behaviour, a configuration issue, avro serialisation not 
working in local mode or something else?





-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Kryo serialization of classes in additional jars

2015-06-26 Thread patcharee

Hi,

I am having this problem on spark 1.4. Do you have any ideas how to 
solve it? I tried to use spark.executor.extraClassPath, but it did not help


BR,
Patcharee

On 04. mai 2015 23:47, Imran Rashid wrote:
Oh, this seems like a real pain.  You should file a jira, I didn't see 
an open issue -- if nothing else just to document the issue.


As you've noted, the problem is that the serializer is created 
immediately in the executors, right when the SparkEnv is created, but 
the other jars aren't downloaded later.  I think you could workaround 
with some combination of pushing the jars to the cluster manually, and 
then using spark.executor.extraClassPath


On Wed, Apr 29, 2015 at 6:42 PM, Akshat Aranya aara...@gmail.com 
mailto:aara...@gmail.com wrote:


Hi,

Is it possible to register kryo serialization for classes
contained in jars that are added with spark.jars?  In my
experiment it doesn't seem to work, likely because the class
registration happens before the jar is shipped to the executor and
added to the classloader.  Here's the general idea of what I want
to do:

   val sparkConf = new SparkConf(true)
  .set(spark.jars, foo.jar)
  .setAppName(foo)
  .set(spark.serializer,
org.apache.spark.serializer.KryoSerializer)

// register classes contained in foo.jar
sparkConf.registerKryoClasses(Array(
  classOf[com.foo.Foo],
  classOf[com.foo.Bar]))






Re: Kryo serialization of classes in additional jars

2015-05-13 Thread Akshat Aranya
I cherry-picked this commit into my local 1.2 branch.  It fixed the problem
with setting spark.serializer, but I get a similar problem with
spark.closure.serializer

org.apache.spark.SparkException: Failed to register classes with Kryo
  at
org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:100)
  at
org.apache.spark.serializer.KryoSerializerInstance.init(KryoSerializer.scala:152)
  at
org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:114)
  at
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receiveWithLogging$1.applyOrElse(CoarseGrainedExecutorBackend.scala:73)
  at
scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
  at
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
  at
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
  at
org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:53)
  at
org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
  at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
  at
org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
  at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
  at
org.apache.spark.executor.CoarseGrainedExecutorBackend.aroundReceive(CoarseGrainedExecutorBackend.scala:36)
  at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
  at akka.actor.ActorCell.invoke(ActorCell.scala:487)
  at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
  at akka.dispatch.Mailbox.run(Mailbox.scala:220)
  at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
  at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
  at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
  at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
  at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.ClassNotFoundException: com.foo.Foo
  at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Class.java:270)
  at
org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$2.apply(KryoSerializer.scala:93)
  at
org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$2.apply(KryoSerializer.scala:93)
  at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
  at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
  at
org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:93)
  ... 21 more


On Mon, May 4, 2015 at 5:43 PM, Akshat Aranya aara...@gmail.com wrote:

 Actually, after some digging, I did find a JIRA for it: SPARK-5470.
 The fix for this has gone into master, but it isn't in 1.2.

 On Mon, May 4, 2015 at 2:47 PM, Imran Rashid iras...@cloudera.com wrote:
  Oh, this seems like a real pain.  You should file a jira, I didn't see an
  open issue -- if nothing else just to document the issue.
 
  As you've noted, the problem is that the serializer is created
 immediately
  in the executors, right when the SparkEnv is created, but the other jars
  aren't downloaded later.  I think you could workaround with some
 combination
  of pushing the jars to the cluster manually, and then using
  spark.executor.extraClassPath
 
  On Wed, Apr 29, 2015 at 6:42 PM, Akshat Aranya aara...@gmail.com
 wrote:
 
  Hi,
 
  Is it possible to register kryo serialization for classes contained in
  jars that are added with spark.jars?  In my experiment it doesn't
 seem to
  work, likely because the class registration happens before the jar is
  shipped to the executor and added to the classloader.  Here's the
 general
  idea of what I want to do:
 
 val sparkConf = new SparkConf(true)
.set(spark.jars, foo.jar)
.setAppName(foo)
.set(spark.serializer,
  org.apache.spark.serializer.KryoSerializer)
 
  // register classes contained in foo.jar
  sparkConf.registerKryoClasses(Array(
classOf[com.foo.Foo],
classOf[com.foo.Bar]))
 
 



Re: Kryo serialization of classes in additional jars

2015-05-04 Thread Akshat Aranya
Actually, after some digging, I did find a JIRA for it: SPARK-5470.
The fix for this has gone into master, but it isn't in 1.2.

On Mon, May 4, 2015 at 2:47 PM, Imran Rashid iras...@cloudera.com wrote:
 Oh, this seems like a real pain.  You should file a jira, I didn't see an
 open issue -- if nothing else just to document the issue.

 As you've noted, the problem is that the serializer is created immediately
 in the executors, right when the SparkEnv is created, but the other jars
 aren't downloaded later.  I think you could workaround with some combination
 of pushing the jars to the cluster manually, and then using
 spark.executor.extraClassPath

 On Wed, Apr 29, 2015 at 6:42 PM, Akshat Aranya aara...@gmail.com wrote:

 Hi,

 Is it possible to register kryo serialization for classes contained in
 jars that are added with spark.jars?  In my experiment it doesn't seem to
 work, likely because the class registration happens before the jar is
 shipped to the executor and added to the classloader.  Here's the general
 idea of what I want to do:

val sparkConf = new SparkConf(true)
   .set(spark.jars, foo.jar)
   .setAppName(foo)
   .set(spark.serializer,
 org.apache.spark.serializer.KryoSerializer)

 // register classes contained in foo.jar
 sparkConf.registerKryoClasses(Array(
   classOf[com.foo.Foo],
   classOf[com.foo.Bar]))



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RE: Kryo exception : Encountered unregistered class ID: 13994

2015-04-13 Thread mehdisinger
Hello,

Thank you for your answer.

I'm already registering my classes as you're suggesting...

Regards

De : tsingfu [via Apache Spark User List] 
[mailto:ml-node+s1001560n22468...@n3.nabble.com]
Envoyé : lundi 13 avril 2015 03:48
À : Mehdi Singer
Objet : Re: Kryo exception : Encountered unregistered class ID: 13994

Hi,
error message is mentioned:
com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 
13994

So I think this is issue with kryo, You should use 
`kryo.register(classOf[your_class_name])` in your app code.


If you reply to this email, your message will be added to the discussion below:
http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-exception-Encountered-unregistered-class-ID-13994-tp22437p22468.html
To unsubscribe from Kryo exception : Encountered unregistered class ID: 13994, 
click 
herehttp://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=22437code=bWVoZGkuc2luZ2VyQGxhbXBpcmlzLmJlfDIyNDM3fC0xNDI5MjI3OTAz.
NAMLhttp://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-exception-Encountered-unregistered-class-ID-13994-tp22437p22471.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Kryo exception : Encountered unregistered class ID: 13994

2015-04-13 Thread ๏̯͡๏
You need to do few more things or you will eventually run into these issues

val conf = new SparkConf()
  .set(spark.serializer, org.apache.spark.serializer.KryoSerializer)
*  .set(spark.kryoserializer.buffer.mb,
arguments.get(buffersize).get)*
*  .set(spark.kryoserializer.buffer.max.mb,
arguments.get(maxbuffersize).get)*
.registerKryoClasses(Array(classOf[com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum]))

-Deepak

On Mon, Apr 13, 2015 at 1:19 PM, mehdisinger mehdi.sin...@lampiris.be
wrote:

  Hello,



 Thank you for your answer.



 I’m already registering my classes as you’re suggesting…



 Regards



 *De :* tsingfu [via Apache Spark User List] [mailto:ml-node+[hidden email]
 http:///user/SendEmail.jtp?type=nodenode=22471i=0]
 *Envoyé :* lundi 13 avril 2015 03:48
 *À :* Mehdi Singer
 *Objet :* Re: Kryo exception : Encountered unregistered class ID: 13994



 Hi,
 error message is mentioned:
 com.esotericsoftware.kryo.KryoException: Encountered unregistered class
 ID: 13994

 So I think this is issue with kryo, You should use
 `kryo.register(classOf[your_class_name])` in your app code.

  --

 *If you reply to this email, your message will be added to the discussion
 below:*


 http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-exception-Encountered-unregistered-class-ID-13994-tp22437p22468.html

 To unsubscribe from Kryo exception : Encountered unregistered class ID:
 13994, click here.
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml

 --
 View this message in context: RE: Kryo exception : Encountered
 unregistered class ID: 13994
 http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-exception-Encountered-unregistered-class-ID-13994-tp22437p22471.html

 Sent from the Apache Spark User List mailing list archive
 http://apache-spark-user-list.1001560.n3.nabble.com/ at Nabble.com.




-- 
Deepak


Re: Kryo exception : Encountered unregistered class ID: 13994

2015-04-09 Thread Ted Yu
Is there custom class involved in your application ?

I assume you have called sparkConf.registerKryoClasses() for such class(es).

Cheers

On Thu, Apr 9, 2015 at 7:15 AM, mehdisinger mehdi.sin...@lampiris.be
wrote:

 Hi,

 I'm facing an issue when I try to run my Spark application. I keep getting
 the following exception:

 15/04/09 15:14:07 ERROR Executor: Exception in task 5.0 in stage 1.0 (TID
 5)
 com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID:
 13994
 Serialization trace:
 ord (org.apache.spark.util.BoundedPriorityQueue)
 at

 com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:119)
 at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
 at

 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:599)
 at

 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
 at

 org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:133)
 at

 org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:236)
 at

 org.apache.spark.broadcast.TorrentBroadcast.readObject(TorrentBroadcast.scala:169)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at
 java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
 at
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 at
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 at
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
 at
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 at
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
 at

 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
 at

 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
 at
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:159)
 at

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)

 I'm not sure where this exception occurs exactly...
 Does anyone know about this issue?

 I'm running Spark version 1.1.0.
 My Master and workers are running on different machines (cluster mode), all
 with the exact same architecture/configuration

 Can anyone help?

 Regards



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-exception-Encountered-unregistered-class-ID-13994-tp22437.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Kryo NPE with Array

2014-12-02 Thread Simone Franzini
I finally solved this issue. The problem was that:
1. I defined a case class with a Buffer[MyType] field.
2. I instantiated the class with the field set to the value given by an
implicit conversion from a Java list, which is supposedly a Buffer.
3. However, the underlying type of that field was instead
scala.collection.convert.Wrappers.JListWrapper, as noted in the exception
above. This type was not registered with Kryo and so that's why I got the
exception.

Registering the type did not solve the problem. However, an additional call
to .toBuffer did solve the problem, since the Buffer class is registered
through the Chill AllScalaRegistrar which is called by the Spark Kryo
serializer.

I thought I'd document this in case somebody else is running into a similar
issue.

Simone Franzini, PhD

http://www.linkedin.com/in/simonefranzini

On Wed, Nov 26, 2014 at 7:40 PM, Simone Franzini captainfr...@gmail.com
wrote:

 I guess I already have the answer of what I have to do here, which is to
 configure the kryo object with the strategy as above.
 Now the question becomes: how can I pass this custom kryo configuration to
 the spark kryo serializer / kryo registrator?
 I've had a look at the code but I am still fairly new to Scala and I can't
 see how I would do this. In the worst case, could I override the newKryo
 method and put my configuration there? It appears to me that method is the
 one where the kryo instance is created.

 Simone Franzini, PhD

 http://www.linkedin.com/in/simonefranzini

 On Tue, Nov 25, 2014 at 2:38 PM, Simone Franzini captainfr...@gmail.com
 wrote:

 I am running into the following NullPointerException:

 com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
 Serialization trace:
 underlying (scala.collection.convert.Wrappers$JListWrapper)
 myArrayField (MyCaseClass)
 at
 com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)

 I have been running into similar issues when using avro classes, that I
 was able to resolve by registering them with a Kryo serializer that uses
 chill-avro. However, in this case the field is in a case class and it seems
 that registering the class does not help.

 I found this stack overflow that seems to be relevant:

 http://stackoverflow.com/questions/23962796/kryo-readobject-cause-nullpointerexception-with-arraylist
 I have this line of code translated to Scala, that supposedly solves the
 issue:

 val kryo = new Kryo()
 kryo.getInstantiatorStrategy().asInstanceOf[Kryo.DefaultInstantiatorStrategy].setFallbackInstantiatorStrategy(new
 StdInstantiatorStrategy())

 However, I am not sure where this line should be placed to take effect.

 I already have the following, should it go somewhere in here?
 class MyRegistrator extends KryoRegistrator {
 override def registerClasses(kryo: Kryo) {
 kryo.register(...)
 }
 }


 Simone Franzini, PhD

 http://www.linkedin.com/in/simonefranzini





RE: Kryo exception for CassandraSQLRow

2014-12-01 Thread Ashic Mahtab
Don't know if this'll solve it, but if you're on Spark 1.1, the Cassandra 
Connector version 1.1.0 final fixed the guava back compat issue. Maybe taking 
the guava exclusions might help?

Date: Mon, 1 Dec 2014 10:48:25 +0100
Subject: Kryo exception for CassandraSQLRow
From: shahab.mok...@gmail.com
To: user@spark.apache.org

I am using Cassandra-Spark connector to pull data from Cassandra, process it 
and write it back to Cassandra.
 Now I am  getting the following exception, and apparently it is Kryo 
serialisation. Does anyone what is the reason and how this can be solved?
I also tried to register org.apache.spark.sql.cassandra.CassandraSQLRow in  
kryo.register , but even this did not solve the problem and exception remains.
WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 7, ip-X-Y-Z): 
com.esotericsoftware.kryo.KryoException: Unable to find class: 
org.apache.spark.sql.cassandra.CassandraSQLRowSerialization trace:_2 
(org.apache.spark.util.MutablePair)
com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)

com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:599)

com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:133)

org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)   
 
org.apache.spark.storage.BlockManager$LazyProxyIterator$1.hasNext(BlockManager.scala:1171)
scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)   
 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)  
  scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1218)
org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:904)
org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:904)
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1143)  
  
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1143)  
  org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
   
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
   java.lang.Thread.run(Thread.java:745)


I am using  Spark 1.1.0 with cassandra-spark connector 1.1.0 , here is the 
build:







   org.apache.spark % spark-mllib_2.10 % 1.1.0 
exclude(com.google.guava, guava),

com.google.guava % guava % 16.0 % provided,
com.datastax.spark %% spark-cassandra-connector % 1.1.0 
exclude(com.google.guava, guava)   withSources() withJavadoc(),

org.apache.cassandra % cassandra-all % 2.1.1  
exclude(com.google.guava, guava) ,

org.apache.cassandra % cassandra-thrift % 2.1.1  
exclude(com.google.guava, guava) ,

com.datastax.cassandra % cassandra-driver-core % 2.1.2  
exclude(com.google.guava, guava) ,

org.apache.spark %% spark-core % 1.1.0 % provided 
exclude(com.google.guava, guava) exclude(org.apache.hadoop, 
hadoop-core),

org.apache.spark %% spark-streaming % 1.1.0 % provided  
exclude(com.google.guava, guava),

org.apache.spark %% spark-catalyst   % 1.1.0  % provided 
exclude(com.google.guava, guava) exclude(org.apache.spark, spark-core),

 org.apache.spark %% spark-sql % 1.1.0 %  provided 
exclude(com.google.guava, guava) exclude(org.apache.spark, spark-core),

org.apache.spark %% spark-hive % 1.1.0 % provided 
exclude(com.google.guava, guava) exclude(org.apache.spark, spark-core), 
   

org.apache.hadoop % hadoop-client % 1.0.4 % provided,

best,/Shahab
  

Re: Kryo NPE with Array

2014-11-26 Thread Simone Franzini
I guess I already have the answer of what I have to do here, which is to
configure the kryo object with the strategy as above.
Now the question becomes: how can I pass this custom kryo configuration to
the spark kryo serializer / kryo registrator?
I've had a look at the code but I am still fairly new to Scala and I can't
see how I would do this. In the worst case, could I override the newKryo
method and put my configuration there? It appears to me that method is the
one where the kryo instance is created.

Simone Franzini, PhD

http://www.linkedin.com/in/simonefranzini

On Tue, Nov 25, 2014 at 2:38 PM, Simone Franzini captainfr...@gmail.com
wrote:

 I am running into the following NullPointerException:

 com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
 Serialization trace:
 underlying (scala.collection.convert.Wrappers$JListWrapper)
 myArrayField (MyCaseClass)
 at
 com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)

 I have been running into similar issues when using avro classes, that I
 was able to resolve by registering them with a Kryo serializer that uses
 chill-avro. However, in this case the field is in a case class and it seems
 that registering the class does not help.

 I found this stack overflow that seems to be relevant:

 http://stackoverflow.com/questions/23962796/kryo-readobject-cause-nullpointerexception-with-arraylist
 I have this line of code translated to Scala, that supposedly solves the
 issue:

 val kryo = new Kryo()
 kryo.getInstantiatorStrategy().asInstanceOf[Kryo.DefaultInstantiatorStrategy].setFallbackInstantiatorStrategy(new
 StdInstantiatorStrategy())

 However, I am not sure where this line should be placed to take effect.

 I already have the following, should it go somewhere in here?
 class MyRegistrator extends KryoRegistrator {
 override def registerClasses(kryo: Kryo) {
 kryo.register(...)
 }
 }


 Simone Franzini, PhD

 http://www.linkedin.com/in/simonefranzini



Re: Kryo UnsupportedOperationException

2014-09-25 Thread Ian O'Connell
I would guess the field serializer is having issues being able to
reconstruct the class again, its pretty much best effort.

Is this an intermediate type?

On Thu, Sep 25, 2014 at 2:12 PM, Sandy Ryza sandy.r...@cloudera.com wrote:

 We're running into an error (below) when trying to read spilled shuffle
 data back in.

 Has anybody encountered this before / is anybody familiar with what causes
 these Kryo UnsupportedOperationExceptions?

 any guidance appreciated,
 Sandy

 ---
 com.esotericsoftware.kryo.KryoException
 (com.esotericsoftware.kryo.KryoException:
 java.lang.UnsupportedOperationException Serialization trace: omitted
 variable name (omitted class name) omitted variable name (omitted
 class name))


 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:626)


 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)

 com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)


 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)


 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)

 com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)

 com.twitter.chill.TraversableSerializer.read(Traversable.scala:44)
 ...




Re: Kryo fails with avro having Arrays and unions, but succeeds with simple avro.

2014-09-19 Thread mohan.gadm
Thanks for the info frank.
Twitter's-chill avro serializer looks great.
But how does spark identifies it as serializer, as its not extending from
KryoSerializer.
(sorry scala is an alien lang for me). 



-
Thanks  Regards,
Mohan
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-fails-with-avro-having-Arrays-and-unions-but-succeeds-with-simple-avro-tp14549p14649.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Kryo fails with avro having Arrays and unions, but succeeds with simple avro.

2014-09-19 Thread Frank Austin Nothaft
Hi Mohan,

It’s a bit convoluted to follow in their source, but they essentially typedef 
KSerializer as being a KryoSerializer, and then their serializers all extend 
KSerializer. Spark should identify them properly as Kryo Serializers, but I 
haven’t tried it myself.

Regards,

Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466

On Sep 19, 2014, at 12:03 AM, mohan.gadm mohan.g...@gmail.com wrote:

 Thanks for the info frank.
 Twitter's-chill avro serializer looks great.
 But how does spark identifies it as serializer, as its not extending from
 KryoSerializer.
 (sorry scala is an alien lang for me). 
 
 
 
 -
 Thanks  Regards,
 Mohan
 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-fails-with-avro-having-Arrays-and-unions-but-succeeds-with-simple-avro-tp14549p14649.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 



Re: Kryo fails with avro having Arrays and unions, but succeeds with simple avro.

2014-09-18 Thread mohan.gadm
Hi frank, thanks for the info, thats great. but im not saying Avro serializer
is failing. Kryo is failing
but 
im using kryo serializer. and registering Avro generated classes with kryo.
sparkConf.set(spark.serializer,
org.apache.spark.serializer.KryoSerializer);
sparkConf.set(spark.kryo.registrator,
com.globallogic.goliath.platform.PlatformKryoRegistrator);

But how did it able to perform output operation when the message is simple.
but not when the message is complex.(please observe no avro schema changes)
just the data is changed.
providing you more info below.

avro schema:
=
record KeyValueObject {
union{boolean, int, long, float, double, bytes, string} name;
union {boolean, int, long, float, double, bytes, string,
arrayunion{boolean, int, long, float, double, bytes, string,
KeyValueObject}, KeyValueObject} value;
}
record Datum {
union {boolean, int, long, float, double, bytes, string,
arrayunion{boolean, int, long, float, double, bytes, string,
KeyValueObject}, KeyValueObject} value;
}
record ResourceMessage {
string version;
string sequence;
string resourceGUID;
string GWID;
string GWTimestamp;
union {Datum, arrayDatum} data;
}

simple message is as below:
===
{version: 01, sequence: 1, resourceGUID: 001, GWID: 002,
GWTimestamp: 1409823150737, data: {value: 30}}

complex message is as below:
===
{version: 01, sequence: 1, resource: sensor-001,
controller: 002, controllerTimestamp: 1411038710358, data:
{value: [{name: Temperature, value: 30}, {name: Speed,
value: 60}, {name: Location, value: [+401213.1, -0750015.1]},
{name: Timestamp, value: 2014-09-09T08:15:25-05:00}]}}


both messages can fit in to the schema,

actually the message is coming from kafka, which is avro binary.
at spark converting the message to Avro objects(ResourceMessage) using
decoders.(this is also working).
able to perform some mappings, able to convert the streamResourceMessage
to streamflume Events

now the events need to be pushed to flume source. for this i need to collect
the RDD, and then send to flume client.

end to end worked fine with simple message. problem is with complex message.




-
Thanks  Regards,
Mohan
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-fails-with-avro-having-Arrays-and-unions-but-succeeds-with-simple-avro-tp14549p14565.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Kryo Issue on Spark 1.0.1, Mesos 0.18.2

2014-07-25 Thread Gary Malouf
Maybe this is me misunderstanding the Spark system property behavior, but
I'm not clear why the class being loaded ends up having '/' rather than '.'
in it's fully qualified name.  When I tested this out locally, the '/' were
preventing the class from being loaded.


On Fri, Jul 25, 2014 at 2:27 PM, Gary Malouf malouf.g...@gmail.com wrote:

 After upgrading to Spark 1.0.1 from 0.9.1 everything seemed to be going
 well.  Looking at the Mesos slave logs, I noticed:

 ERROR KryoSerializer: Failed to run spark.kryo.registrator
 java.lang.ClassNotFoundException:
 com/mediacrossing/verrazano/kryo/MxDataRegistrator

 My spark-env.sh has the following when I run the Spark Shell:

 export MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos.so

 export MASTER=mesos://zk://n-01:2181,n-02:2181,n-03:2181/masters

 export ADD_JARS=/opt/spark/mx-lib/verrazano-assembly.jar


 # -XX:+UseCompressedOops must be disabled to use more than 32GB RAM

 SPARK_JAVA_OPTS=-Xss2m -XX:+UseCompressedOops
 -Dspark.local.dir=/opt/mesos-tmp -Dspark.executor.memory=4g
  -Dspark.serializer=org.apache.spark.serializer.KryoSerializer
 -Dspark.kryo.registrator=com.mediacrossing.verrazano.kryo.MxDataRegistrator
 -Dspark.kryoserializer.buffer.mb=16 -Dspark.akka.askTimeout=30


 I was able to verify that our custom jar was being copied to each worker,
 but for some reason it is not finding my registrator class.  Is anyone else
 struggling with Kryo on 1.0.x branch?



RE: Kryo is slower, and the size saving is minimal

2014-07-09 Thread innowireless TaeYun Kim
Thank you for your response.

Maybe that applies to my case.
In my test case, The types of almost all of the data are either primitive
types, joda DateTime, or String.
But I'm somewhat disappointed with the speed.
At least it should not be slower than Java default serializer...

-Original Message-
From: wxhsdp [mailto:wxh...@gmail.com] 
Sent: Wednesday, July 09, 2014 5:47 PM
To: u...@spark.incubator.apache.org
Subject: Re: Kryo is slower, and the size saving is minimal

i'am not familiar with kryo and my opinion may be not right. in my case,
kryo only saves about 5% of the original size when dealing with primitive
types such as Arrays. i'am not sure whether it is the common case.



--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-is-slower-and-the-s
ize-saving-is-minimal-tp9131p9160.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.



Re: Kryo serialization does not compress

2014-03-07 Thread pradeeps8
Hi Patrick,

Thanks for your reply.

I am guessing even an array type will be registered automatically. Is this
correct?

Thanks,
Pradeep



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-serialization-does-not-compress-tp2042p2400.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Kryo serialization does not compress

2014-03-06 Thread pradeeps8
We are trying to use kryo serialization, but with kryo serialization ON the
memory consumption does not change. We have tried this on multiple sets of
data.
We have also checked the logs of Kryo serialization and have confirmed that
Kryo is being used.

Can somebody please help us with this?

The script used is given below. 
SCRIPT
/import scala.collection.JavaConversions.asScalaBuffer
import scala.collection.JavaConversions.mapAsScalaMap
import scala.collection.JavaConverters.asScalaBufferConverter
import scala.collection.mutable.Buffer
import scala.Array
import scala.math.Ordering.Implicits._ 

import org.apache.spark.rdd.RDD
import org.apache.spark.storage.StorageLevel
import org.apache.spark.RangePartitioner
import org.apache.spark.HashPartitioner

//For Kryo logging
import com.esotericsoftware.minlog.Log
import com.esotericsoftware.minlog.Log._
Log.set(LEVEL_TRACE);

val query = select array(level_1, level_2,  level_3, level_4, level_5,
level_6, level_7, level_8, level_9, 

level_10, level_11, level_12, level_13, level_14, level_15, level_16,
level_17, level_18, level_19, level_20, 

level_21, level_22, level_23, level_24, level_25) as unitids, class, cuts,
type, data from table1 p join table2 b on 

(p.UnitId = b.unit_id) where runid = 912 and b.snapshotid = 220 and p.UnitId
= b.unit_id

val rows: RDD[((Buffer[Any], String, Buffer[Any]), (String,
scala.collection.mutable.Buffer[Any]))] = 

sc.sql2rdd(query).map(row =
((row.getList(unitids).asInstanceOf[java.util.List[Any]].asScala, 

row.getString(class),
row.getList(cuts).asInstanceOf[java.util.List[Any]].asScala),
(row.getString(type), 

row.getList(data).asInstanceOf[java.util.List[Any]].asScala)))

var rows2Array: RDD[((Buffer[Any], String, Buffer[Any]), (String,
Array[Float]))] = rows.map(row = (row._1, 

(row._2._1, ((row._2._2.map(y = y match {
  case floatWritable: org.apache.hadoop.io.FloatWritable =
floatWritable.get
  case lazyFloat: org.apache.hadoop.hive.serde2.`lazy`.LazyFloat =
lazyFloat.getWritableObject().get
  case _ = println(unknown data type  + y +  : ); 0
}))).toArray)))

var allArrays: RDD[((Array[Long], String, Buffer[Any]), (String,
Array[Float]))] = rows2Array.map(row = 

((row._1._1.map(x = x match {case longWritable:
org.apache.hadoop.io.LongWritable = longWritable.get 

case lazyLong: org.apache.hadoop.hive.serde2.`lazy`.LazyLong =
lazyLong.getWritableObject().get  case _ = 

println(unknown data type  + x +  : ); 0}).toArray, row._1._2,
row._1._3), row._2))

var dataRdd: RDD[((Array[Long], String, Array[String]), (String,
Array[Float]))] = allArrays.map(row = ((row._1._1, 

row._1._2, row._1._3.map(x = x match {  case str: String = str  case _ =
println(unknown data type  + x +  : 

); new String()}).toArray), row._2))

dataRdd = dataRdd.partitionBy(new
HashPartitioner(64)).persist(StorageLevel.MEMORY_ONLY_SER)

dataRdd.count()
/





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-serialization-does-not-compress-tp2042p2347.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.