tgravescs commented on pull request #31284:
URL: https://github.com/apache/spark/pull/31284#issuecomment-771714621


   @razajafri I think @cloud-fan pointed to the code you could need to change, 
where the parquet format infers the schema based on the decimal precision.  It 
think he is saying just have the makeDecimalType(Decimal.MAX_INT_DIGITS) infer 
too INT64. 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala#L136
 
   moves to:
   
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala#L147
   
   
   @cloud-fan  it seems like a waste to store as long when you are just wasting 
space. I guess your thought is its more expensive to downcast it then save on 
space?    I guess that might also depend on how big your column is though.
     That also doesn't match the other code @revans2 pointed to above where it 
stores decimals small enough as Int. seems like the change as is makes it 
consistent with those whereas changing to use Long would be inconsistent.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to