LantaoJin commented on issue #24157: [SPARK-27216][CORE] Upgrade RoaringBitmap 
to 0.7.45
URL: https://github.com/apache/spark/pull/24157#issuecomment-478593168
 
 
   Since `SQLQueryWithKryoSuite` is overkill for this PR but useful to 
illustrate this problem. I will delete it from code and keep it in comment here.
   ```scala
   package org.apache.spark.sql
   
   import org.apache.spark.internal.config
   import org.apache.spark.internal.config.Kryo._
   import org.apache.spark.internal.config.SERIALIZER
   import org.apache.spark.sql.internal.SQLConf
   import org.apache.spark.sql.test.SharedSQLContext
   
   class SQLQueryWithKryoSuite extends QueryTest with SharedSQLContext {
   
     override protected def sparkConf = super.sparkConf
       .set(SERIALIZER, "org.apache.spark.serializer.KryoSerializer")
       .set(KRYO_USE_UNSAFE, true)
   
     test("kryo unsafe data quality issue") {
       // This issue can be reproduced when
       // 1. Enable KryoSerializer
       // 2. Set spark.kryo.unsafe to true
       // 3. Use HighlyCompressedMapStatus since it uses RoaringBitmap
       // 4. Set spark.sql.shuffle.partitions to 6000, 6000 can trigger issue 
based the supplied data
       // 5. Comment the zero-size blocks fetch fail exception in 
ShuffleBlockFetcherIterator
       //    or this job will failed with FetchFailedException.
       withSQLConf(
         SQLConf.SHUFFLE_PARTITIONS.key -> "6000",
         config.SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_COMPRESS.key -> "-1") {
         withTempView("t") {
           val df = 
spark.read.parquet(testFile("test-data/dates.parquet")).toDF("date")
           df.createOrReplaceTempView("t")
           checkAnswer(
             sql("SELECT COUNT(*) FROM t"),
             sql(
               """
                 |SELECT SUM(a) FROM
                 |(
                 |SELECT COUNT(*) a, date
                 |FROM t
                 |GROUP BY date
                 |)
               """.stripMargin))
         }
       }
     }
   }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to