LantaoJin edited a comment on issue #24157: [SPARK-27216][CORE] Upgrade RoaringBitmap to 0.7.45 URL: https://github.com/apache/spark/pull/24157#issuecomment-478593168 Since `SQLQueryWithKryoSuite` is overkill for this PR but useful to illustrate this problem. I will delete it from code and keep it in comment here. After upgraded to latest version, below UT could pass. ```scala package org.apache.spark.sql import org.apache.spark.internal.config import org.apache.spark.internal.config.Kryo._ import org.apache.spark.internal.config.SERIALIZER import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.test.SharedSQLContext class SQLQueryWithKryoSuite extends QueryTest with SharedSQLContext { override protected def sparkConf = super.sparkConf .set(SERIALIZER, "org.apache.spark.serializer.KryoSerializer") .set(KRYO_USE_UNSAFE, true) test("kryo unsafe data quality issue") { // This issue can be reproduced when // 1. Enable KryoSerializer // 2. Set spark.kryo.unsafe to true // 3. Use HighlyCompressedMapStatus since it uses RoaringBitmap // 4. Set spark.sql.shuffle.partitions to 6000, 6000 can trigger issue based the supplied data // 5. Comment the zero-size blocks fetch fail exception in ShuffleBlockFetcherIterator // or this job will failed with FetchFailedException. withSQLConf( SQLConf.SHUFFLE_PARTITIONS.key -> "6000", config.SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_COMPRESS.key -> "-1") { withTempView("t") { val df = spark.read.parquet(testFile("test-data/dates.parquet")).toDF("date") df.createOrReplaceTempView("t") checkAnswer( sql("SELECT COUNT(*) FROM t"), sql( """ |SELECT SUM(a) FROM |( |SELECT COUNT(*) a, date |FROM t |GROUP BY date |) """.stripMargin)) } } } } ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
