This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 7546b4405d5 [SPARK-42164][CORE] Register partitioned-table-related classes to KryoSerializer 7546b4405d5 is described below commit 7546b4405d5b35626e98b28bfc8031d2100172d1 Author: Dongjoon Hyun <dongj...@apache.org> AuthorDate: Mon Jan 23 21:39:30 2023 -0800 [SPARK-42164][CORE] Register partitioned-table-related classes to KryoSerializer ### What changes were proposed in this pull request? This PR aims to register partitioned-table-related classes to `KryoSerializer`. Specifically, `CREATE TABLE` and `MSCK REPAIR TABLE` uses this classes. ### Why are the changes needed? To support partitioned-tables more easily with `KryoSerializer`. Previously, it fails like the following. ``` java.lang.IllegalArgumentException: Class is not registered: org.apache.spark.util.HadoopFSUtils$SerializableBlockLocation ``` ``` java.lang.IllegalArgumentException: Class is not registered: org.apache.spark.util.HadoopFSUtils$SerializableFileStatus ``` ``` java.lang.IllegalArgumentException: Class is not registered: org.apache.spark.sql.execution.command.PartitionStatistics ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs and manually tests. **TEST TABLE** ``` $ tree /tmp/t /tmp/t ├── p=1 │ └── users.orc ├── p=10 │ └── users.orc ├── p=11 │ └── users.orc ├── p=2 │ └── users.orc ├── p=3 │ └── users.orc ├── p=4 │ └── users.orc ├── p=5 │ └── users.orc ├── p=6 │ └── users.orc ├── p=7 │ └── users.orc ├── p=8 │ └── users.orc └── p=9 └── users.orc ``` **CREATE PARTITIONED TABLES AND RECOVER PARTITIONS** ``` $ bin/spark-shell -c spark.kryo.registrationRequired=true -c spark.serializer=org.apache.spark.serializer.KryoSerializer -c spark.sql.sources.parallelPartitionDiscovery.threshold=1 scala> sql("CREATE TABLE t USING ORC LOCATION '/tmp/t'").show() ++ || ++ ++ scala> sql("MSCK REPAIR TABLE t").show() ++ || ++ ++ ``` Closes #39713 from dongjoon-hyun/SPARK-42164. Authored-by: Dongjoon Hyun <dongj...@apache.org> Signed-off-by: Dongjoon Hyun <dongj...@apache.org> --- core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala b/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala index b7035feba84..5499732660b 100644 --- a/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala +++ b/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala @@ -510,6 +510,10 @@ private[serializer] object KryoSerializer { // SQL / ML / MLlib classes once and then re-use that filtered list in newInstance() calls. private lazy val loadableSparkClasses: Seq[Class[_]] = { Seq( + "org.apache.spark.util.HadoopFSUtils$SerializableBlockLocation", + "[Lorg.apache.spark.util.HadoopFSUtils$SerializableBlockLocation;", + "org.apache.spark.util.HadoopFSUtils$SerializableFileStatus", + "org.apache.spark.sql.catalyst.expressions.BoundReference", "org.apache.spark.sql.catalyst.expressions.SortOrder", "[Lorg.apache.spark.sql.catalyst.expressions.SortOrder;", @@ -536,6 +540,7 @@ private[serializer] object KryoSerializer { "org.apache.spark.sql.types.DecimalType", "org.apache.spark.sql.types.Decimal$DecimalAsIfIntegral$", "org.apache.spark.sql.types.Decimal$DecimalIsFractional$", + "org.apache.spark.sql.execution.command.PartitionStatistics", "org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTaskResult", "org.apache.spark.sql.execution.joins.EmptyHashedRelation$", "org.apache.spark.sql.execution.joins.LongHashedRelation", --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org