[GitHub] spark pull request #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22179#discussion_r214762021 --- Diff: core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala --- @@ -412,6 +412,26 @@ class KryoSerializerSuite extends SparkFunSuite with SharedSparkContext { assert(!ser2.getAutoReset) } + test("ClassCastException when writing a Map after previously " + --- End diff -- Since this is a bug fix test case, could you add `SPARK-25176` like `SPARK-25176 ClassCastException ...`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22179#discussion_r214524736 --- Diff: pom.xml --- @@ -1770,6 +1770,10 @@ org.apache.hive hive-storage-api + + com.esotericsoftware +kryo-shaded --- End diff -- I was also worried that part, but it's used only in a run-time `SearchArgument` serialization. There was no usage with ORC `file`s. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22179#discussion_r214514359 --- Diff: pom.xml --- @@ -1770,6 +1770,10 @@ org.apache.hive hive-storage-api + + com.esotericsoftware +kryo-shaded --- End diff -- OK, if that's the extent of the usage, then I believe that ORC is OK with Kryo 4. That much is not a problem. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22179#discussion_r214491757 --- Diff: pom.xml --- @@ -1770,6 +1770,10 @@ org.apache.hive hive-storage-api + + com.esotericsoftware +kryo-shaded --- End diff -- Yes. I checked that, @srowen . `org.apache.orc` only uses Kryo constructor, `writeObject`, and `readObject` from `kryo-shaded` library. There is no change for them. **WRITE** ``` (new Kryo()).writeObject(out, sarg); ``` **READ** ``` ... = (new Kryo()).readObject(new Input(sargBytes), SearchArgumentImpl.class); ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22179#discussion_r214480126 --- Diff: pom.xml --- @@ -1770,6 +1770,10 @@ org.apache.hive hive-storage-api + + com.esotericsoftware +kryo-shaded --- End diff -- I guess my question is whether Orc complied against kryo-shaded 3.x necessarily works with kryo-shaded 4.x, because there were API and behavior changes. Our current tests don't seem to surface any such problems, but who knows if there's something they don't test. I agree that our distribution won't include two versions, but the issue I'm wondering about is whether Orc will like the one later version that it gets bundled with. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22179#discussion_r214479500 --- Diff: pom.xml --- @@ -1770,6 +1770,10 @@ org.apache.hive hive-storage-api + + com.esotericsoftware +kryo-shaded --- End diff -- @srowen In short, the current Spark always uses the same Kryo version for read/write `SearchArgument` and it's used only on runtime. 1. Old OrcFileFormat always uses `org.spark-project.hive:hive-exec:1.2.1.spark2` which uses the shaded one in `hive-exec`. - `com.esotericsoftware.kryo:kryo:2.21`. 2. New OrcFileFormat uses `org.apache.orc` which uses the one provided by Spark. - `com.esotericsoftware:kryo-shaded:3.0.3` (All Spark/Orc/Hive uses this version for now) 3. New OrcFileFormat (in this PR) uses `org.apache.orc` which uses the one provided by Spark. - `com.esotericsoftware:kryo-shaded:4.0.2` So, (1) is unchanged by this PR. (2) and (3) also doesn't use a mixed version of Kryo. So, it should be fine because Apache Spark doesn't allow a mixed Spark version(master and executor). BTW, during investigation, there was some performance issue in `createFilter`. I'll file a new JIRA for that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22179#discussion_r214127809 --- Diff: pom.xml --- @@ -1770,6 +1770,10 @@ org.apache.hive hive-storage-api + + com.esotericsoftware +kryo-shaded --- End diff -- Thank you for pinging me, @srowen . ORC uses Kryo only for writing/reading one ORC configuration, `orc.kryo.sarg`. The followings are the Spark's code indirect code path. I guess Kryo provides backward compatibility at lest, but I'll take a look at this PR today. - For new ORCFileFormat, [OrcInputFormat.setSearchArgument](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala#L159) - For old ORCFileFormat, [SearchArgument.toKryo](https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala#L129) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22179#discussion_r214120461 --- Diff: pom.xml --- @@ -1770,6 +1770,10 @@ org.apache.hive hive-storage-api + + com.esotericsoftware +kryo-shaded --- End diff -- CC @wangyum yes good point. ORC also uses kryo-shaded, and uses 3.0.3. In theory that could cause a break, but the tests pass. That's a positive sign but not bulletproof. @dongjoon-hyun do you have any insight into how ORC uses kryo? Is it a code path that wouldn't matter to Spark? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22179#discussion_r214117831 --- Diff: pom.xml --- @@ -1770,6 +1770,10 @@ org.apache.hive hive-storage-api + + com.esotericsoftware +kryo-shaded --- End diff -- why this change? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/22179 [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2 ## What changes were proposed in this pull request? Upgrade chill to 0.9.3, Kryo to 4.0.2, to get bug fixes and improvements. More details: https://github.com/twitter/chill/releases/tag/v0.9.3 https://github.com/twitter/chill/commit/cc3910d501a844f3c882249fef8fc2560b95b6dd ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-23131 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22179.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22179 commit 8c28e078526445a31099ca5f1ae71ce76d782004 Author: Yuming Wang Date: 2018-08-22T01:44:09Z Upgrade chill from 0.8.4 to 0.9.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org