[ 
https://issues.apache.org/jira/browse/SPARK-7708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554797#comment-14554797
 ] 

Akshat Aranya edited comment on SPARK-7708 at 5/21/15 6:22 PM:
---------------------------------------------------------------

I was able to get this working with a couple of fixes:

1. Implementing serialization methods for Kryo in SerializableBuffer.  An 
alternative is to register SerializableBuffer with JavaSerialization in Kryo, 
but that defeats the purpose.
2. The second part is a bit hokey because tasks within one executor process are 
deserialized from a shared broadcast variable.  Kryo deserialization modifies 
the input buffer, so it isn't thread-safe 
(https://code.google.com/p/kryo/issues/detail?id=128).  I worked around this by 
copying the broadcast buffer to a local buffer before deserializing.

This fixes are for 1.2, so I'll see if I can port them to master and write a 
test for them.


was (Author: aaranya):
I was able to get this working with a couple of fixes:

1. Implementing serialization methods for Kryo in SerializableBuffer.  An 
alternative is to register SerializableBuffer with JavaSerialization in Kryo, 
but that defeats the purpose.
2. The second part is a bit hokey because tasks within one executor process are 
deserialized from a shared broadcast variable.  Kryo deserialization modifies 
the input buffer, so it isn't thread-safe 
(https://code.google.com/p/kryo/issues/detail?id=128).  I worked around this by 
copying the broadcast buffer to a local buffer before deserializing.

> Incorrect task serialization with Kryo closure serializer
> ---------------------------------------------------------
>
>                 Key: SPARK-7708
>                 URL: https://issues.apache.org/jira/browse/SPARK-7708
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.2.2
>            Reporter: Akshat Aranya
>
> I've been investigating the use of Kryo for closure serialization with Spark 
> 1.2, and it seems like I've hit upon a bug:
> When a task is serialized before scheduling, the following log message is 
> generated:
> [info] o.a.s.s.TaskSetManager - Starting task 124.1 in stage 0.0 (TID 342, 
> <host>, PROCESS_LOCAL, 302 bytes)
> This message comes from TaskSetManager which serializes the task using the 
> closure serializer.  Before the message is sent out, the TaskDescription 
> (which included the original task as a byte array), is serialized again into 
> a byte array with the closure serializer.  I added a log message for this in 
> CoarseGrainedSchedulerBackend, which produces the following output:
> [info] o.a.s.s.c.CoarseGrainedSchedulerBackend - 124.1 size=132
> The serialized size of TaskDescription (132 bytes) turns out to be _smaller_ 
> than serialized task that it contains (302 bytes). This implies that 
> TaskDescription.buffer is not getting serialized correctly.
> On the executor side, the deserialization produces a null value for 
> TaskDescription.buffer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to