Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22197#discussion_r212848096
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
---
@@ -350,25 +356,38 @@ private[parquet] class ParquetFilters(
}
/**
- * Returns a map from name of the column to the data type, if predicate
push down applies.
+ * Returns a map, which contains parquet field name and data type, if
predicate push down applies.
*/
- private def getFieldMap(dataType: MessageType): Map[String,
ParquetSchemaType] = dataType match {
- case m: MessageType =>
- // Here we don't flatten the fields in the nested schema but just
look up through
- // root fields. Currently, accessing to nested fields does not push
down filters
- // and it does not support to create filters for them.
-
m.getFields.asScala.filter(_.isPrimitive).map(_.asPrimitiveType()).map { f =>
- f.getName -> ParquetSchemaType(
- f.getOriginalType, f.getPrimitiveTypeName, f.getTypeLength,
f.getDecimalMetadata)
- }.toMap
- case _ => Map.empty[String, ParquetSchemaType]
+ private def getFieldMap(dataType: MessageType): Map[String,
ParquetField] = {
+ // Here we don't flatten the fields in the nested schema but just look
up through
+ // root fields. Currently, accessing to nested fields does not push
down filters
+ // and it does not support to create filters for them.
+ val primitiveFields =
+
dataType.getFields.asScala.filter(_.isPrimitive).map(_.asPrimitiveType()).map {
f =>
+ f.getName -> ParquetField(f.getName,
+ ParquetSchemaType(f.getOriginalType,
+ f.getPrimitiveTypeName, f.getTypeLength, f.getDecimalMetadata))
+ }
+ if (caseSensitive) {
+ primitiveFields.toMap
+ } else {
+ // Don't consider ambiguity here, i.e. more than one field is
matched in case insensitive
+ // mode, just skip pushdown for these fields, they will trigger
Exception when reading,
+ // See: SPARK-25132.
--- End diff --
can we do the dedup before parquet filter pushdown and parquet column
pruning? Then we can simplify the code in both cases.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]