Re: [PR] [SPARK-53422][SQL][TEST] Make SPARK-30269 test case robust [spark]

via GitHub Thu, 28 Aug 2025 20:15:46 -0700


pan3793 commented on code in PR #52168:
URL: https://github.com/apache/spark/pull/52168#discussion_r2308947087



##########
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala:
##########
@@ -1616,11 +1616,10 @@ class StatisticsSuite extends 
StatisticsCollectionTestBase with TestHiveSingleto
         Seq(tbl, ext_tbl).foreach { tblName =>
           sql(s"INSERT INTO $tblName VALUES (1, 'a', '2019-12-13')")
 
-          val expectedSize = 690
           // analyze table
           sql(s"ANALYZE TABLE $tblName COMPUTE STATISTICS NOSCAN")
           var tableStats = getTableStats(tblName)
-          assert(tableStats.sizeInBytes == expectedSize)
+          val expectedSize = tableStats.sizeInBytes

Review Comment:
   I read the original PR, the intention of this test is to make sure partition 
stats get updated, even though it equals existing table stats. The number of 
table's sizeInBytes does not really matter here.
   
   Generally, asserting the size of binary data files like Parquet/ORC does not 
make sense, it can vary due to metadata change, as you pointed out, this is 
likely caused by the version string change, and also might be affected by the 
compression codec, the compressed data length might be different in different 
snappy version or platform.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-53422][SQL][TEST] Make SPARK-30269 test case robust [spark]

Reply via email to