mallman commented on a change in pull request #24327: [SPARK-27418][SQL]
Migrate Parquet to File Data Source V2
URL: https://github.com/apache/spark/pull/24327#discussion_r289650717
##########
File path:
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetLogRedirector.java
##########
@@ -25,11 +25,11 @@
// Redirects the JUL logging for parquet-mr versions <= 1.8 to SLF4J logging
using
// SLF4JBridgeHandler. Parquet-mr versions >= 1.9 use SLF4J directly
-final class ParquetLogRedirector implements Serializable {
+public final class ParquetLogRedirector implements Serializable {
Review comment:
> Spark uses Parquet >= 1.9. Is this still needed? Why was it made public?
I had to dig into the "archive" to remember why we did this. This whole
rigmarole is explained in b533fa2.
> Why would the version a file was written with matter? Does Spark use 1.6.0
to read those files?
The version of the Parquet writer matters because the Parquet reader was
complaining about invalid metadata written by the Parquet 1.6 writer. Another,
prior commit, 6ce1b67, had the side-effect of breaking the workaround that
allowed Spark users to specify logging configuration for Parquet through
Spark's logging configuration. We punted a fix for this regression to b533fa2.
According to the top level `pom.xml` at commit b533fa2, Spark was using
Parquet 1.8.1. Based on that and the comment in this code
```
// Redirects the JUL logging for parquet-mr versions <= 1.8 to SLF4J logging
using
// SLF4JBridgeHandler. Parquet-mr versions >= 1.9 use SLF4J directly
```
it looks like this workaround was necessary at the time. Given that Spark
now uses Parquet 1.10.1, it appears this workaround is no longer necessary—this
code can be deleted.
N.B. that according to the commit message of b533fa2, this workaround was
only applicable for the case of converted partitioned Hive tables. Therefore, I
presume that the version of the Hive Parquet library Spark uses is irrelevant.
All that said, there are no unit tests to guard against regression. To be
safe I would recommend going through the manual, 4-step test procedure I
outlined in b533fa2 if this workaround is removed.
Hope this helps.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]