Github user mallman commented on a diff in the pull request:
https://github.com/apache/spark/pull/21320#discussion_r199631341
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
---
@@ -47,16 +47,25 @@ import org.apache.spark.sql.types._
*
* Due to this reason, we no longer rely on [[ReadContext]] to pass
requested schema from [[init()]]
* to [[prepareForRead()]], but use a private `var` for simplicity.
+ *
+ * @param parquetMrCompatibility support reading with parquet-mr or
Spark's built-in Parquet reader
*/
-private[parquet] class ParquetReadSupport(val convertTz: Option[TimeZone])
+private[parquet] class ParquetReadSupport(val convertTz: Option[TimeZone],
+ parquetMrCompatibility: Boolean)
extends ReadSupport[UnsafeRow] with Logging {
private var catalystRequestedSchema: StructType = _
+ /**
+ * Construct a [[ParquetReadSupport]] with [[convertTz]] set to [[None]]
and
+ * [[parquetMrCompatibility]] set to [[false]].
+ *
+ * We need a zero-arg constructor for SpecificParquetRecordReaderBase.
But that is only
+ * used in the vectorized reader, where we get the convertTz value
directly, and the value here
+ * is ignored. Further, we set [[parquetMrCompatibility]] to [[false]]
as this constructor is only
+ * called by the Spark reader.
--- End diff --
I don't understand your confusion. I think the comment makes it very clear
why we need to set that parameter to false. How can I make it better? Or can
you be more specific about what is unclear to you?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]