AngersZhuuuu edited a comment on pull request #34308: URL: https://github.com/apache/spark/pull/34308#issuecomment-965262139
After deep check this case, this exception should be caused when we have parquet files with different schema, then if we don't set mergeSchema, then it will directly use the first file's schema to read data, so first file is long type and next file is int type for one column, when read the second file, it will read long data but the column's descriptor is int type, then will throw such Unsupported decoding problem. But such caused should have been denied by @sunchao 's this pr https://github.com/apache/spark/pull/32777 And such case will be denied when call `ParquetVectorUpdaterFactory.getUpdator()` and will throw exception will file path. ``` [info] Cause: org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file file:///Users/yi.zhu/Documents/project/Angerszhuuuu/spark/target/tmp/spark-3eccc50d-9d9c-4970-9674-87de46ea1192/test-002.parquet/part-00000-4332031b-e514-4b95-b52a-e8d798c999e6-c000.parquet. Column: [a], Expected: bigint, Found: INT32 [info] at org.apache.spark.sql.errors.QueryExecutionErrors$.unsupportedSchemaColumnConvertError(QueryExecutionErrors.scala:586) [info] at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:172) [info] at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93) ``` Thanks all for your help. @sunchao @cloud-fan @sadikovi . Hope your confirm and then I will close this one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
