huaxingao commented on a change in pull request #34346:
URL: https://github.com/apache/spark/pull/34346#discussion_r734080846



##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetScanBuilder.scala
##########
@@ -114,7 +114,11 @@ case class ParquetScanBuilder(
         // not push down complex type
         // not push down Timestamp because INT96 sort order is undefined,
         // Parquet doesn't return statistics for INT96
-        case StructType(_) | ArrayType(_, _) | MapType(_, _, _) | 
TimestampType =>
+        // not push down Parquet Binary because min/max could be truncated
+        // (https://issues.apache.org/jira/browse/PARQUET-1685), Parquet Binary
+        // could be Spark StringType, BinaryType or DecimalType
+        case StructType(_) | ArrayType(_, _) | MapType(_, _, _) | TimestampType
+            | StringType | BinaryType | DecimalType() =>
           false

Review comment:
       Sorry I mis-understood the comment. Will fix this shortly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to