[GitHub] [spark] revans2 commented on pull request #31284: [SPARK-34167][SQL]Reading parquet with IntDecimal written as a LongDecimal blows up


revans2 commented on pull request #31284:
URL: https://github.com/apache/spark/pull/31284#issuecomment-769165342



   > Why did Spark allocate an int column vector in the first place? If the 
parqeut field is INT64, we should use long column vector.
   
   There is an assumption in WriteableColumnVector that if the precision allows 
for an int32 then it will be stored as such.
   
   
https://github.com/apache/spark/blob/3a361cd837eeea5b5c82b0f90f5d1987a8a30328/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java#L361-L386
   
   This is also reflected in OnHeapColumnVector and OffHeapColumnVector
   
   
https://github.com/apache/spark/blob/3a361cd837eeea5b5c82b0f90f5d1987a8a30328/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java#L543-L556
   
   
https://github.com/apache/spark/blob/3a361cd837eeea5b5c82b0f90f5d1987a8a30328/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java#L552-L557
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] revans2 commented on pull request #31284: [SPARK-34167][SQL]Reading parquet with IntDecimal written as a LongDecimal blows up

Reply via email to