Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/19769#discussion_r151609532
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
---
@@ -355,9 +361,31 @@ class ParquetFileFormat
fileSplit.getLocations,
null)
+ // PARQUET_INT96_TIMESTAMP_CONVERSION says to apply timezone
conversions to int96 timestamps'
+ // *only* if the file was created by something other than
"parquet-mr", so check the actual
+ // writer here for this file. We have to do this per-file, as each
file in the table may
+ // have different writers. Sadly, this also means we have to clone
the hadoopConf, as
+ // different threads may want different values. We have to use the
hadoopConf as its
+ // our only way to pass value to ParquetReadSupport.init
+ val localHadoopConf =
--- End diff --
yes, but [`SpecificRecordReaderBase` treats it as a generic `ReadSupport`
and instantiates reflectively with a zero-arg
constructor](https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java#L141)
(I'm not sure why). I can work around that, its just not as clean as you
might hope, though it does avoid the silly copy of the entire hadoop conf.
I'll push another update for that.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]