cloud-fan commented on pull request #31284: URL: https://github.com/apache/spark/pull/31284#issuecomment-769215659
Another simpler idea is to fix the schema inference: https://github.com/apache/spark/blob/v3.1.1-rc1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala#L112 For INT64, we should make sure the inferred `DecimalType` is a long decimal. Then we will allocate long column vectors and get rid of this issue. It's also probably more efficient, as there is no down-casting and the space waste is not a big deal. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
