tgravescs commented on pull request #31284: URL: https://github.com/apache/spark/pull/31284#issuecomment-771714621
@razajafri I think @cloud-fan pointed to the code you could need to change, where the parquet format infers the schema based on the decimal precision. It think he is saying just have the makeDecimalType(Decimal.MAX_INT_DIGITS) infer too INT64. https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala#L136 moves to: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala#L147 @cloud-fan it seems like a waste to store as long when you are just wasting space. I guess your thought is its more expensive to downcast it then save on space? I guess that might also depend on how big your column is though. That also doesn't match the other code @revans2 pointed to above where it stores decimals small enough as Int. seems like the change as is makes it consistent with those whereas changing to use Long would be inconsistent. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
