revans2 commented on pull request #31284: URL: https://github.com/apache/spark/pull/31284#issuecomment-769165342
> Why did Spark allocate an int column vector in the first place? If the parqeut field is INT64, we should use long column vector. There is an assumption in WriteableColumnVector that if the precision allows for an int32 then it will be stored as such. https://github.com/apache/spark/blob/3a361cd837eeea5b5c82b0f90f5d1987a8a30328/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java#L361-L386 This is also reflected in OnHeapColumnVector and OffHeapColumnVector https://github.com/apache/spark/blob/3a361cd837eeea5b5c82b0f90f5d1987a8a30328/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java#L543-L556 https://github.com/apache/spark/blob/3a361cd837eeea5b5c82b0f90f5d1987a8a30328/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java#L552-L557 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
