gszadovszky commented on a change in pull request #26804:
URL: https://github.com/apache/spark/pull/26804#discussion_r561694102
##########
File path:
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala
##########
@@ -225,7 +226,9 @@ class StreamSuite extends StreamTest {
val df = spark.readStream.format(classOf[FakeDefaultSource].getName).load()
Seq("", "parquet").foreach { useV1Source =>
- withSQLConf(SQLConf.USE_V1_SOURCE_LIST.key -> useV1Source) {
+ withSQLConf(
+ SQLConf.USE_V1_SOURCE_LIST.key -> useV1Source,
+ ParquetOutputFormat.PAGE_WRITE_CHECKSUM_ENABLED -> "false") {
Review comment:
@wangyum, I've checked the code change of PARQUET-1580 (again) and still
don't understand why it would cause such an issue. By disabling the CRC write
you only achieve to not to write an optional field in the page headers. It
should not impact any kind of ordering. If it really does it means that this
ordering relies on some parameters that it shouldn't. It also means that any
other potential change in the file metadata might impact this ordering.
Maybe I'm overlooking something in our code base so any comment is welcomed
but if not I would suggest revisiting these unit tests.
Meanwhile, I am not experienced in Spark code so if you are fine with this
workaround in a unit test I am not against it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]