Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22343#discussion_r216216422
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetOptions.scala
---
@@ -69,12 +69,25 @@ class ParquetOptions(
.get(MERGE_SCHEMA)
.map(_.toBoolean)
.getOrElse(sqlConf.isParquetSchemaMergingEnabled)
+
+ /**
+ * How to resolve duplicated field names. By default, parquet data
source fails when hitting
+ * duplicated field names in case-insensitive mode. When converting hive
parquet table to parquet
+ * data source, we need to ask parquet data source to pick the first
matched field - the same
+ * behavior as hive parquet table - to keep behaviors consistent.
+ */
+ val duplicatedFieldsResolutionMode: String = {
+ parameters.getOrElse(DUPLICATED_FIELDS_RESOLUTION_MODE,
--- End diff --
The conversion itself happens per query but my impression is that the
different values don't usually happen in per-query. I mean, I was wondering if
users want to set this query by query.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]