This is super important.

I always thought that this was an implementation problem in the sense that the 
overflow check was too costly/slow.

Anyway, in most Common Lisp implementations the stack is also limited. When you 
hit the limit, a resumable exception is raised, allowing you to make the stack 
larger and continue. And of course you can set a large initial stack size per 
process.

> On 9 Nov 2023, at 10:13, Guillermo Polito <[email protected]> wrote:
> 
> Hi all,
> 
> We started (with many interruptions over the last months) working a bit with 
> Stephane on understanding what is the (positive and negative) impact of 
> stack-overflow support in Pharo.
> The key idea is that if a process consumes too much stack (potentially 
> because of an infinite recursion) then the process should stop with an 
> exception.
> 
> ## Why we want better stack consumption control
> 
> This idea comes up to solve issues that are pretty common and hit especially 
> newbies.
> For example, imagine you accidentally write an accessor such as
> 
> ```
> A >> foo
>    ^ self foo
> ```
> 
> Students do this all the time, and I’ve also seen it in experienced people 
> who go too fast :).
> More importantly, such recursions could happen also with not-so-obvious 
> indirect recursions (a sends b, b sends c, c sends a), and these could hit 
> anybody.
> 
> This is aggravated because the current execution model allows us to have 
> infinite stacks —meaning: limited by available memory only.
> This is indeed a nice feature for many use cases but it has its own drawbacks 
> when one of these kind of recursions are hit:
>  - code just loops forever taking space in the stack
>  - when there is no more stack space, context objects are created and moved 
> to the heap
>  - but those contexts are strongly held, so they are never GCed and take up 
> extra space
>  - even worse! they are there adding more work to the GC every time and 
> making the GC run more often looking for space that is not there
> 
> ## Why Ctrl-dot does not always work
> 
> Of course, super users know there is this “Ctrl dot” hidden feature that 
> should help you recover from this.
> First, let's take out of the equation that this is only known by super users.
> Now, in this situation, when Ctrl-dot is hit it will trigger a handler that 
> suspends the problematic process and opens a debugger on it.
> But it could happen that,
>  - the stack is so big that the debugger is very sluggish (best-case scenario)
>  - the VM is just flooded doing GCs so maybe the Ctrl dot event does not even 
> arrive at Pharo or the trigger
>  - if the recursion is hit when printing an object (which is more common than 
> you could imagine), opening the debugger could trigger a new recursion and 
> never give back the control to the user
> 
> ## What are we working on
> 
> The main idea here is: Can we have a simple and efficient way to prevent such 
> kinds of situations?
> 
> After many discussions around detecting recursion, we kinda arrived at the 
> simple solution of just detecting a stack overflow.
> The solution is easy to understand (because it’s like other languages work) 
> and easy to implement because there is already support for that.
> But this leaves open two questions:
>  - what happens when people want to use the “infinite stack” feature?
>  - when should a process stack overflow? What is a sensitive default value?
> 
> Our draft implementation here 
> https://github.com/pharo-project/pharo-vm/pull/710 does the following to cope 
> with this:
>  - we can now parametrize the size of the stack (of each stack page to be 
> more accurate) when the VM starts up
>  - the stack overflow check can be disabled per process
> 
> We also are running experiments to see what could be a sensitive stack size 
> for our normal usages. Here, for example, we ran almost all test cases in 
> Pharo separately (one suite per line below), and we observed how many tests 
> broke (x-axis) with different stack sizes (y-axis).
> Here we see that most test suites require at least 20-24k to run properly, 
> some go up to 36k of stack before converging (i.e., the number of broken 
> tests does not change).
> 
> <ImagenPegada-10.tiff>
> You’ll notice in the graph that There are some scenarios that break all the 
> time. This is because exception handling itself is recursive and may produce 
> more stack overflows depending on the size of the stack between the exception 
> and the exception handler.
> So some more work is still required, mostly changing Pharo libraries to 
> properly support this. For example:
>  - should tests run in a fresh process with a fresh stack?
>  - should the exception mechanism use less recursion?
>  - resumable exceptions add stack pressure because they do not “unstack” 
> until the exception is finally handled, meaning that the stack used by 
> exception handling just adds up to the stack of the original code, can we do 
> better here?
> 
> Probably there are more interesting questions here, that’s the “why" behind 
> this email.
> I’m interested in opinions and scenarios you may come up with that should be 
> taken into account.
> 
> Cheers,
> Guille

Reply via email to