[GitHub] [spark] tgravescs commented on pull request #31284: [SPARK-34167][SQL]Reading parquet with IntDecimal written as a LongDecimal blows up

GitBox Tue, 02 Feb 2021 07:27:48 -0800


tgravescs commented on pull request #31284:
URL: https://github.com/apache/spark/pull/31284#issuecomment-771714621

@razajafri I think @cloud-fan pointed to the code you could need to change,
where the parquet format infers the schema based on the decimal precision. It
think he is saying just have the makeDecimalType(Decimal.MAX_INT_DIGITS) infer
too INT64.
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala#L136

moves to:

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala#L147

@cloud-fan it seems like a waste to store as long when you are just wasting
space. I guess your thought is its more expensive to downcast it then save on
space? I guess that might also depend on how big your column is though.
That also doesn't match the other code @revans2 pointed to above where it
stores decimals small enough as Int. seems like the change as is makes it
consistent with those whereas changing to use Long would be inconsistent.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] tgravescs commented on pull request #31284: [SPARK-34167][SQL]Reading parquet with IntDecimal written as a LongDecimal blows up

Reply via email to