[GitHub] [spark] razajafri commented on pull request #31284: [SPARK-34167][SQL]Reading parquet with IntDecimal written as a LongDecimal blows up

GitBox Wed, 27 Jan 2021 23:02:09 -0800


razajafri commented on pull request #31284:
URL: https://github.com/apache/spark/pull/31284#issuecomment-768845356



   > @razajafri, do you mind clarifying PR description? For exmaple, I thought 
you meant writing out to files or somewhere by:
   > 
   > > Spark should read it as a long but write it as an int by downcasting it 
and calling the appropriate method to set the integer value
   > 
   > The change itself looks making sense but my question is how parquet-mr 
handles this case. Vectorized readers are supposed to contribute back to 
Parquet side so it would be great to know how they handle and we match it. It's 
more because it looks a bit odd to me that we should manipulate the column 
descriptor.
   
   Sorry about the confusion, I have updated the documentation. 
   
   For your other question, the bug is in how we are writing value to 
`WritableColumnVector` not while reading from `VectorizedValueReader`. This is 
why Parquet-mr doesn't have to worry about this. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] razajafri commented on pull request #31284: [SPARK-34167][SQL]Reading parquet with IntDecimal written as a LongDecimal blows up

Reply via email to