[GitHub] [spark] JoshRosen commented on pull request #35066: [SPARK-37784][SQL] Correctly handle UDTs in CodeGenerator.addBufferedState()

GitBox Wed, 29 Dec 2021 19:23:49 -0800


JoshRosen commented on pull request #35066:
URL: https://github.com/apache/spark/pull/35066#issuecomment-1002856449



   Please let me know if you have suggestions for good ways to write a 
regression test for this bug.  So far I've been unable to adapt my existing 
reproduction into something which fails in CI. 
   
   Given enough time, I might be able to contrive a failing regression test by 
manually instantiating a SortMergeJoinExec operator and controlling its input 
iterators such that the non-copied values are mutated when the iterator 
advances (I'd use the SparkPlanTest helpers for this).
   
   OTOH this particular helper function changes very infrequently, so I think 
the risk of future regression might be small enough that it might be okay to 
forgo writing the more complicated test. If anyone has strong opinions here 
then please let me know.
   
   ----
   
   I'm now curious about whether there could be other similar UDT-related bugs 
in our code generation. I plan to search through the code for all other places 
where we generate copy() / clone() logic to check whether they properly handle 
UDTs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] JoshRosen commented on pull request #35066: [SPARK-37784][SQL] Correctly handle UDTs in CodeGenerator.addBufferedState()

Reply via email to