Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r148722822 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -127,8 +127,8 @@ private[parquet] class ParquetRowConverter( extends ParquetGroupConverter(updater) with Logging { assert( - parquetType.getFieldCount == catalystType.length, - s"""Field counts of the Parquet schema and the Catalyst schema don't match: + parquetType.getFieldCount <= catalystType.length, --- End diff -- In `ParquetReadSupport.scala`, when `parquetMrCompatibility` is `true`, we intersect the clipped parquet schema with the underlying parquet file's schema. This can result in a requested parquet schema with fewer fields than the requested catalyst schema. For example, in the case of a partitioned table where we select a column which doesn't exist in the schema of one partition's files, we will remove the missing columns from the requested parquet schema. This scenario is illustrated and tested by the "partial schema intersection - select missing subfield" test in `ParquetSchemaPruningSuite.scala`.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org