HeartSaVioR opened a new pull request #27761: [SPARK-30993][SQL][2.4] Use its 
sql type for UDT when checking the type of length (fixed/var) or mutable
URL: https://github.com/apache/spark/pull/27761
 
 
   ### What changes were proposed in this pull request?
   
   This patch fixes the bug of UnsafeRow which misses to handle the UDT 
specifically, in `isFixedLength` and `isMutable`. These methods don't check its 
SQL type for UDT, always treating UDT as variable-length, and non-mutable.
   
   It doesn't bring any issue if UDT is used to represent complicated type, but 
when UDT is used to represent some type which is matched with fixed length of 
SQL type, it exposes the chance of correctness issues, as these informations 
sometimes decide how the value should be handled.
   
   We got report from user mailing list which suspected as mapGroupsWithState 
looks like handling UDT incorrectly, but after some investigation it was from 
GenerateUnsafeRowJoiner in shuffle phase.
   
   
https://github.com/apache/spark/blob/0e2ca11d80c3921387d7b077cb64c3a0c06b08d7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeRowJoiner.scala#L32-L43
   
   Here updating position should not happen on fixed-length column, but due to 
this bug, the value of UDT having fixed-length as sql type would be modified, 
which actually corrupts the value.
   
   ### Why are the changes needed?
   
   Misclassifying of the type of length for UDT can corrupt the value when the 
row is presented to the input of GenerateUnsafeRowJoiner, which brings 
correctness issue.
   
   ### Does this PR introduce any user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   New UT added.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to