[GitHub] spark pull request #16052: [SPARK-18617][CORE][STREAMING] Close "kryo auto p...

2016-11-30 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/16052#discussion_r90306255
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala 
---
@@ -869,6 +891,31 @@ object TestReceiver {
   val counter = new AtomicInteger(1)
 }
 
+class FakeByteArrayReceiver extends 
Receiver[Array[Byte]](StorageLevel.MEMORY_ONLY) with Logging {
--- End diff --

Any class in 
[primitiveAndPrimitiveArrayClassTags](https://github.com/apache/spark/blob/3f03c90a807872d47588f3c3920769b8978033bf/core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala#L46)
 can trigger this issue. That's why you can also use the existing TestReceiver. 
It's `Receiver[Int]`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16052: [SPARK-18617][CORE][STREAMING] Close "kryo auto p...

2016-11-30 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16052#discussion_r90191359
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala 
---
@@ -869,6 +891,31 @@ object TestReceiver {
   val counter = new AtomicInteger(1)
 }
 
+class FakeByteArrayReceiver extends 
Receiver[Array[Byte]](StorageLevel.MEMORY_ONLY) with Logging {
--- End diff --

BTW, existing unit test could cover other cases besides `Array[Byte]` type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16052: [SPARK-18617][CORE][STREAMING] Close "kryo auto p...

2016-11-30 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16052#discussion_r90191009
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala 
---
@@ -869,6 +891,31 @@ object TestReceiver {
   val counter = new AtomicInteger(1)
 }
 
+class FakeByteArrayReceiver extends 
Receiver[Array[Byte]](StorageLevel.MEMORY_ONLY) with Logging {
--- End diff --

@zsxwing yes, failure occurs when receiver store `Array[Byte]` data and the 
automatic serializer selection would pick JavaSerializer as the type of data  
is erased to be Object . However, after get from remote executor, the 
input-stream data will be deserialized with KryoSerializer as Task could get 
data type properly,  leading to the **com.esotericsoftware.kryo.KryoException: 
Encountered unregistered class ID: 13994**


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16052: [SPARK-18617][CORE][STREAMING] Close "kryo auto p...

2016-11-29 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/16052#discussion_r90182581
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala 
---
@@ -869,6 +891,31 @@ object TestReceiver {
   val counter = new AtomicInteger(1)
 }
 
+class FakeByteArrayReceiver extends 
Receiver[Array[Byte]](StorageLevel.MEMORY_ONLY) with Logging {
--- End diff --

nit: Why create a new class? Is there any concern to just use TestReceiver?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16052: [SPARK-18617][CORE][STREAMING] Close "kryo auto p...

2016-11-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16052


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16052: [SPARK-18617][CORE][STREAMING] Close "kryo auto p...

2016-11-29 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16052#discussion_r89983105
  
--- Diff: 
core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala ---
@@ -155,7 +155,12 @@ private[spark] class 
SerializerManager(defaultSerializer: Serializer, conf: Spar
   outputStream: OutputStream,
   values: Iterator[T]): Unit = {
 val byteStream = new BufferedOutputStream(outputStream)
-val ser = getSerializer(implicitly[ClassTag[T]]).newInstance()
+val ser = blockId match {
--- End diff --

Ah, yours is better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16052: [SPARK-18617][CORE][STREAMING] Close "kryo auto p...

2016-11-28 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16052#discussion_r89954432
  
--- Diff: 
core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala ---
@@ -77,8 +77,8 @@ private[spark] class SerializerManager(defaultSerializer: 
Serializer, conf: Spar
 primitiveAndPrimitiveArrayClassTags.contains(ct) || ct == 
stringClassTag
   }
 
-  def getSerializer(ct: ClassTag[_]): Serializer = {
-if (canUseKryo(ct)) {
+  def getSerializer(ct: ClassTag[_], autoPick: Boolean): Serializer = {
--- End diff --

document what this means


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16052: [SPARK-18617][CORE][STREAMING] Close "kryo auto p...

2016-11-28 Thread uncleGen
GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/16052

[SPARK-18617][CORE][STREAMING] Close "kryo auto pick" feature for Spark 
Streaming

## What changes were proposed in this pull request?

#15992 provided a solution to fix the bug, i.e. **receiver data can not be 
deserialized properly**. As @zsxwing said, it is a critical bug, but we should 
not break APIs between maintenance releases. It may be a rational choice to 
close auto pick kryo serializer for Spark Streaming in the first step.

## How was this patch tested?

existing ut



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark SPARK-18617

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16052.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16052


commit e4416bd155f1b3a973fd37f03a60d61f1a0612f3
Author: uncleGen 
Date:   2016-11-29T06:11:07Z

SPARK-18617: Close "kryo auto pick" feature for Spark Streaming




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org