HeartSaVioR opened a new pull request, #38798:
URL: https://github.com/apache/spark/pull/38798

   ### What changes were proposed in this pull request?
   
   This PR fixes the issue for applyInPandasWithState, which is triggered with 
the columns of grouping keys are not placed in order from earliest. If the 
condition is met, user function may get "incorrect" value of the key, including 
`None`.
   
   This is because the projection for the value is co-used between normal input 
row and row for timed-out state. The projection assumed that the schema for the 
row is same as output schema of the child node, whereas row for timed-out state 
is constructed via concatenating key row + null value row.
   
   This PR creates a separate projection for the row for timed-out state, so 
that the projection can pick up the values for grouping columns correctly.
   
   ### Why are the changes needed?
   
   Without this fix, user function may get "incorrect" value of the key, 
including `None`.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. This feature is not released yet.
   
   ### How was this patch tested?
   
   New test case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to