bersprockets opened a new pull request #23642: 
[SPARK-26680][SPARK-25767][SQL][BACKPORT-2.3] Eagerly create inputVars while 
conditions are appropriate
URL: https://github.com/apache/spark/pull/23642
 
 
   ## What changes were proposed in this pull request?
   
   Back port of #22789 and #23617 to branch-2.3
   
   When a user passes a Stream to groupBy, ```CodegenSupport.consume``` ends up 
lazily generating ```inputVars``` from a Stream, since the field ```output``` 
will be a Stream. At the time ```output.zipWithIndex.map``` is called, 
conditions are correct. However, by the time the map operation actually 
executes, conditions are no longer appropriate. The closure used by the map 
operation ends up using a reference to the partially created ```inputVars```. 
As a result, a StackOverflowError occurs.
   
   This PR ensures that ```inputVars``` is eagerly created while conditions are 
appropriate. It seems this was also an issue with the code path for creating 
```inputVars``` from ```outputVars``` (SPARK-25767). I simply extended the 
solution for that code path to encompass both code paths.
   
   ## How was this patch tested?
   
   SQL unit tests
   new test
   python tests
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to