[GitHub] spark pull request #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2

2018-09-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22179#discussion_r214762021
  
--- Diff: 
core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala ---
@@ -412,6 +412,26 @@ class KryoSerializerSuite extends SparkFunSuite with 
SharedSparkContext {
 assert(!ser2.getAutoReset)
   }
 
+  test("ClassCastException when writing a Map after previously " +
--- End diff --

Since this is a bug fix test case, could you add `SPARK-25176` like 
`SPARK-25176 ClassCastException ...`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2

2018-09-01 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22179#discussion_r214524736
  
--- Diff: pom.xml ---
@@ -1770,6 +1770,10 @@
 org.apache.hive
 hive-storage-api
   
+  
+ com.esotericsoftware
+kryo-shaded
--- End diff --

I was also worried that part, but it's used only in a run-time 
`SearchArgument` serialization. There was no usage with ORC `file`s.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2

2018-09-01 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22179#discussion_r214514359
  
--- Diff: pom.xml ---
@@ -1770,6 +1770,10 @@
 org.apache.hive
 hive-storage-api
   
+  
+ com.esotericsoftware
+kryo-shaded
--- End diff --

OK, if that's the extent of the usage, then I believe that ORC is OK with 
Kryo 4. That much is not a problem.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2

2018-08-31 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22179#discussion_r214491757
  
--- Diff: pom.xml ---
@@ -1770,6 +1770,10 @@
 org.apache.hive
 hive-storage-api
   
+  
+ com.esotericsoftware
+kryo-shaded
--- End diff --

Yes. I checked that, @srowen . `org.apache.orc` only uses Kryo constructor, 
`writeObject`, and `readObject` from `kryo-shaded` library. There is no change 
for them.

**WRITE**
```
(new Kryo()).writeObject(out, sarg);
```
**READ**
```
... = (new Kryo()).readObject(new Input(sargBytes), 
SearchArgumentImpl.class);
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2

2018-08-31 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22179#discussion_r214480126
  
--- Diff: pom.xml ---
@@ -1770,6 +1770,10 @@
 org.apache.hive
 hive-storage-api
   
+  
+ com.esotericsoftware
+kryo-shaded
--- End diff --

I guess my question is whether Orc complied against kryo-shaded 3.x 
necessarily works with kryo-shaded 4.x, because there were API and behavior 
changes. Our current tests don't seem to surface any such problems, but who 
knows if there's something they don't test.

I agree that our distribution won't include two versions, but the issue I'm 
wondering about is whether Orc will like the one later version that it gets 
bundled with.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2

2018-08-31 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22179#discussion_r214479500
  
--- Diff: pom.xml ---
@@ -1770,6 +1770,10 @@
 org.apache.hive
 hive-storage-api
   
+  
+ com.esotericsoftware
+kryo-shaded
--- End diff --

@srowen  In short, the current Spark always uses the same Kryo version for 
read/write `SearchArgument` and it's used only on runtime.

1. Old OrcFileFormat always uses 
`org.spark-project.hive:hive-exec:1.2.1.spark2` which uses the shaded one in 
`hive-exec`.
- `com.esotericsoftware.kryo:kryo:2.21`. 

2. New OrcFileFormat uses `org.apache.orc` which uses the one provided by 
Spark.
- `com.esotericsoftware:kryo-shaded:3.0.3`  (All Spark/Orc/Hive uses 
this version for now)

3. New OrcFileFormat (in this PR) uses `org.apache.orc` which uses the one 
provided by Spark.
- `com.esotericsoftware:kryo-shaded:4.0.2`

So, (1) is unchanged by this PR. (2) and (3) also doesn't use a mixed 
version of Kryo. So, it should be fine because Apache Spark doesn't allow a 
mixed Spark version(master and executor). BTW, during investigation, there was 
some performance issue in `createFilter`. I'll file a new JIRA for that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2

2018-08-30 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22179#discussion_r214127809
  
--- Diff: pom.xml ---
@@ -1770,6 +1770,10 @@
 org.apache.hive
 hive-storage-api
   
+  
+ com.esotericsoftware
+kryo-shaded
--- End diff --

Thank you for pinging me, @srowen . ORC uses Kryo only for writing/reading 
one ORC configuration, `orc.kryo.sarg`. The followings are the Spark's code 
indirect code path. I guess Kryo provides backward compatibility at lest, but 
I'll take a look at this PR today.
- For new ORCFileFormat, 
[OrcInputFormat.setSearchArgument](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala#L159)
- For old ORCFileFormat, 
[SearchArgument.toKryo](https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala#L129)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2

2018-08-30 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22179#discussion_r214120461
  
--- Diff: pom.xml ---
@@ -1770,6 +1770,10 @@
 org.apache.hive
 hive-storage-api
   
+  
+ com.esotericsoftware
+kryo-shaded
--- End diff --

CC @wangyum yes good point. ORC also uses kryo-shaded, and uses 3.0.3. In 
theory that could cause a break, but the tests pass. That's a positive sign but 
not bulletproof. @dongjoon-hyun do you have any insight into how ORC uses kryo? 
Is it a code path that wouldn't matter to Spark?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2

2018-08-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22179#discussion_r214117831
  
--- Diff: pom.xml ---
@@ -1770,6 +1770,10 @@
 org.apache.hive
 hive-storage-api
   
+  
+ com.esotericsoftware
+kryo-shaded
--- End diff --

why this change?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2

2018-08-21 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/22179

[SPARK-23131][BUILD] Upgrade Kryo to 4.0.2

## What changes were proposed in this pull request?

Upgrade chill to 0.9.3, Kryo to 4.0.2, to get bug fixes and improvements.
More details:
https://github.com/twitter/chill/releases/tag/v0.9.3

https://github.com/twitter/chill/commit/cc3910d501a844f3c882249fef8fc2560b95b6dd

## How was this patch tested?

Existing tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-23131

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22179.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22179


commit 8c28e078526445a31099ca5f1ae71ce76d782004
Author: Yuming Wang 
Date:   2018-08-22T01:44:09Z

Upgrade chill from 0.8.4 to 0.9.3




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org