[jira] [Commented] (SPARK-26980) Kryo deserialization not working with KryoSerializable class

2019-03-01 Thread Alexis Sarda-Espinosa (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782082#comment-16782082
 ] 

Alexis Sarda-Espinosa commented on SPARK-26980:
---

Kryo works fine if used directly ([example 
here|https://github.com/asardaes/hello-spark-kotlin/blob/master/src/test/kotlin/hello/spark/kotlin/MainTest.kt#L21]),
 it just breaks when the data goes through Spark.

> Kryo deserialization not working with KryoSerializable class
> 
>
> Key: SPARK-26980
> URL: https://issues.apache.org/jira/browse/SPARK-26980
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
> Environment: Local Spark v2.4.0
> Kotlin v1.3.21
>Reporter: Alexis Sarda-Espinosa
>Priority: Minor
>  Labels: kryo, serialization
>
> I'm trying to create an {{Aggregator}} that uses a custom container that 
> should be serialized with {{Kryo:}} 
> {code:java}
> class StringSet(other: Collection) : HashSet(other), 
> KryoSerializable {
> companion object {
> @JvmStatic
> private val serialVersionUID = 1L
> }
> constructor() : this(Collections.emptyList())
> override fun write(kryo: Kryo, output: Output) {
> output.writeInt(this.size)
> for (string in this) {
> output.writeString(string)
> }
> }
> override fun read(kryo: Kryo, input: Input) {
> val size = input.readInt()
> repeat(size) { this.add(input.readString()) }
> }
> }
> {code}
> However, if I look at the corresponding value in the {{Row}} after 
> aggregation (for example by using {{collectAsList()}}), I see a {{byte[]}}. 
> Interestingly, the first byte in that array seems to be some sort of noise, 
> and I can deserialize by doing something like this: 
> {code:java}
> val b = row.getAs(2)
> val input = Input(b.copyOfRange(1, b.size)) // extra byte?
> val set = Kryo().readObject(input, StringSet::class.java)
> {code}
> Used configuration: 
> {code:java}
> SparkConf()
> .setAppName("Hello Spark with Kotlin")
> .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
> .set("spark.kryo.registrationRequired", "true")
> .registerKryoClasses(arrayOf(StringSet::class.java))
> {code}
> [Sample repo with all the 
> code|https://github.com/asardaes/hello-spark-kotlin/tree/8e8a54fd81f0412507318149841c69bb17d8572c].
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26980) Kryo deserialization not working with KryoSerializable class

2019-02-23 Thread Alexis Sarda-Espinosa (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexis Sarda-Espinosa updated SPARK-26980:
--
Description: 
I'm trying to create an {{Aggregator}} that uses a custom container that should 
be serialized with {{Kryo:}} 
{code:java}
class StringSet(other: Collection) : HashSet(other), 
KryoSerializable {
companion object {
@JvmStatic
private val serialVersionUID = 1L
}

constructor() : this(Collections.emptyList())

override fun write(kryo: Kryo, output: Output) {
output.writeInt(this.size)
for (string in this) {
output.writeString(string)
}
}

override fun read(kryo: Kryo, input: Input) {
val size = input.readInt()
repeat(size) { this.add(input.readString()) }
}
}
{code}
However, if I look at the corresponding value in the {{Row}} after aggregation 
(for example by using {{collectAsList()}}), I see a {{byte[]}}. Interestingly, 
the first byte in that array seems to be some sort of noise, and I can 
deserialize by doing something like this: 
{code:java}
val b = row.getAs(2)
val input = Input(b.copyOfRange(1, b.size)) // extra byte?
val set = Kryo().readObject(input, StringSet::class.java)
{code}
Used configuration: 
{code:java}
SparkConf()
.setAppName("Hello Spark with Kotlin")
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.set("spark.kryo.registrationRequired", "true")
.registerKryoClasses(arrayOf(StringSet::class.java))
{code}
[Sample repo with all the 
code|https://github.com/asardaes/hello-spark-kotlin/tree/8e8a54fd81f0412507318149841c69bb17d8572c].

 

  was:
 

I'm trying to create an {{Aggregator}} that uses a custom container that should 
be serialized with {{Kryo}}:

 
{code:java}
class StringSet(other: Collection) : HashSet(other), 
KryoSerializable {
companion object {
@JvmStatic
private val serialVersionUID = 1L
}

constructor() : this(Collections.emptyList())

override fun write(kryo: Kryo, output: Output) {
output.writeInt(this.size)
for (string in this) {
output.writeString(string)
}
}

override fun read(kryo: Kryo, input: Input) {
val size = input.readInt()
repeat(size) { this.add(input.readString()) }
}
}
{code}
However, if I look at the corresponding value in the {{Row}} after aggregation 
(for example by using {{collectAsList()}}), I see a {{byte[]}}. Interestingly, 
the first byte in that array seems to be some sort of noise, and I can 
deserialize by doing something like this:

 

 
{code:java}
val b = row.getAs(2)
val input = Input(b.copyOfRange(1, b.size)) // extra byte?
val set = Kryo().readObject(input, StringSet::class.java)
{code}
Used configuration:

 

 
{code:java}
SparkConf()
.setAppName("Hello Spark with Kotlin")
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.set("spark.kryo.registrationRequired", "true")
.registerKryoClasses(arrayOf(StringSet::class.java))
{code}
[Sample repo with all the 
code|https://github.com/asardaes/hello-spark-kotlin/tree/8e8a54fd81f0412507318149841c69bb17d8572c].

 


> Kryo deserialization not working with KryoSerializable class
> 
>
> Key: SPARK-26980
> URL: https://issues.apache.org/jira/browse/SPARK-26980
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
> Environment: Local Spark v2.4.0
> Kotlin v1.3.21
>Reporter: Alexis Sarda-Espinosa
>Priority: Minor
>  Labels: kryo, serialization
>
> I'm trying to create an {{Aggregator}} that uses a custom container that 
> should be serialized with {{Kryo:}} 
> {code:java}
> class StringSet(other: Collection) : HashSet(other), 
> KryoSerializable {
> companion object {
> @JvmStatic
> private val serialVersionUID = 1L
> }
> constructor() : this(Collections.emptyList())
> override fun write(kryo: Kryo, output: Output) {
> output.writeInt(this.size)
> for (string in this) {
> output.writeString(string)
> }
> }
> override fun read(kryo: Kryo, input: Input) {
> val size = input.readInt()
> repeat(size) { this.add(input.readString()) }
> }
> }
> {code}
> However, if I look at the corresponding value in the {{Row}} after 
> aggregation (for example by using {{collectAsList()}}), I see a {{byte[]}}. 
> Interestingly, the first byte in that array seems to be some sort of noise, 
> and I can deserialize by doing something like this: 
> {code:java}
> val b = row.getAs(2)
> val input = Input(b.copyOfRange(1, b.size)) // extra byte?
> val set = Kryo().readObject(input, StringSet::class.java)
> {code}
> Used configuration: 
> {code:java}
> 

[jira] [Created] (SPARK-26980) Kryo deserialization not working with KryoSerializable class

2019-02-23 Thread Alexis Sarda-Espinosa (JIRA)
Alexis Sarda-Espinosa created SPARK-26980:
-

 Summary: Kryo deserialization not working with KryoSerializable 
class
 Key: SPARK-26980
 URL: https://issues.apache.org/jira/browse/SPARK-26980
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0
 Environment: Local Spark v2.4.0

Kotlin v1.3.21
Reporter: Alexis Sarda-Espinosa


 

I'm trying to create an {{Aggregator}} that uses a custom container that should 
be serialized with {{Kryo}}:

 
{code:java}
class StringSet(other: Collection) : HashSet(other), 
KryoSerializable {
companion object {
@JvmStatic
private val serialVersionUID = 1L
}

constructor() : this(Collections.emptyList())

override fun write(kryo: Kryo, output: Output) {
output.writeInt(this.size)
for (string in this) {
output.writeString(string)
}
}

override fun read(kryo: Kryo, input: Input) {
val size = input.readInt()
repeat(size) { this.add(input.readString()) }
}
}
{code}
However, if I look at the corresponding value in the {{Row}} after aggregation 
(for example by using {{collectAsList()}}), I see a {{byte[]}}. Interestingly, 
the first byte in that array seems to be some sort of noise, and I can 
deserialize by doing something like this:

 

 
{code:java}
val b = row.getAs(2)
val input = Input(b.copyOfRange(1, b.size)) // extra byte?
val set = Kryo().readObject(input, StringSet::class.java)
{code}
Used configuration:

 

 
{code:java}
SparkConf()
.setAppName("Hello Spark with Kotlin")
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.set("spark.kryo.registrationRequired", "true")
.registerKryoClasses(arrayOf(StringSet::class.java))
{code}
[Sample repo with all the 
code|https://github.com/asardaes/hello-spark-kotlin/tree/8e8a54fd81f0412507318149841c69bb17d8572c].

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org