peter-toth commented on PR #40744:
URL: https://github.com/apache/spark/pull/40744#issuecomment-1568607994

   > Thanks @peter-toth. I tested this patch locally. But it seem it throws 
`StackOverflowError`. How to reproduce:
   > 
   > ```
   > ./dev/make-distribution.sh --tgz  -Phive -Phive-thriftserver
   > tar -zxf spark-3.5.0-SNAPSHOT-bin-3.3.5.tgz
   > cd spark-3.5.0-SNAPSHOT-bin-3.3.5
   > bin/spark-sql
   > ```
   > 
   > ```
   > spark-sql (default)> WITH RECURSIVE t(n) AS (
   >                    >     VALUES (1)
   >                    > UNION ALL
   >                    >     SELECT n+1 FROM t WHERE n < 100
   >                    > )
   >                    > SELECT sum(n) FROM t;
   > 23/05/30 13:21:21 ERROR Executor: Exception in task 0.0 in stage 265.0 
(TID 199)
   > java.lang.StackOverflowError
   >    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
   > ```
   
   Thanks for testing this PR @wangyum. Iterestingly, I didn't encounter stack 
overflow when recursion level is <100. The error starts to appear at level ~170 
in my tests. I guess this depends on your default stack size. Since recursion 
works in a way that each iteration depends on the previous iteration the RDD 
lineage of the tasks are getting bigger and bigger and the deserialization of 
those tasks can throw stack overflow error at some point. Let me amend this PR 
with adding optional checkpointing so as to truncate RDD linage and be able to 
deal with deeper recursion...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to