Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19769#discussion_r151609532
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
 ---
    @@ -355,9 +361,31 @@ class ParquetFileFormat
               fileSplit.getLocations,
               null)
     
    +      // PARQUET_INT96_TIMESTAMP_CONVERSION says to apply timezone 
conversions to int96 timestamps'
    +      // *only* if the file was created by something other than 
"parquet-mr", so check the actual
    +      // writer here for this file.  We have to do this per-file, as each 
file in the table may
    +      // have different writers.  Sadly, this also means we have to clone 
the hadoopConf, as
    +      // different threads may want different values.  We have to use the 
hadoopConf as its
    +      // our only way to pass value to ParquetReadSupport.init
    +      val localHadoopConf =
    --- End diff --
    
    yes, but [`SpecificRecordReaderBase` treats it as a generic `ReadSupport` 
and instantiates reflectively with a zero-arg 
constructor](https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java#L141)
 (I'm not sure why).  I can work around that, its just not as clean as you 
might hope, though it does avoid the silly copy of the entire hadoop conf.  
I'll push another update for that.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to