This is super important. I always thought that this was an implementation problem in the sense that the overflow check was too costly/slow.
Anyway, in most Common Lisp implementations the stack is also limited. When you hit the limit, a resumable exception is raised, allowing you to make the stack larger and continue. And of course you can set a large initial stack size per process. > On 9 Nov 2023, at 10:13, Guillermo Polito <[email protected]> wrote: > > Hi all, > > We started (with many interruptions over the last months) working a bit with > Stephane on understanding what is the (positive and negative) impact of > stack-overflow support in Pharo. > The key idea is that if a process consumes too much stack (potentially > because of an infinite recursion) then the process should stop with an > exception. > > ## Why we want better stack consumption control > > This idea comes up to solve issues that are pretty common and hit especially > newbies. > For example, imagine you accidentally write an accessor such as > > ``` > A >> foo > ^ self foo > ``` > > Students do this all the time, and I’ve also seen it in experienced people > who go too fast :). > More importantly, such recursions could happen also with not-so-obvious > indirect recursions (a sends b, b sends c, c sends a), and these could hit > anybody. > > This is aggravated because the current execution model allows us to have > infinite stacks —meaning: limited by available memory only. > This is indeed a nice feature for many use cases but it has its own drawbacks > when one of these kind of recursions are hit: > - code just loops forever taking space in the stack > - when there is no more stack space, context objects are created and moved > to the heap > - but those contexts are strongly held, so they are never GCed and take up > extra space > - even worse! they are there adding more work to the GC every time and > making the GC run more often looking for space that is not there > > ## Why Ctrl-dot does not always work > > Of course, super users know there is this “Ctrl dot” hidden feature that > should help you recover from this. > First, let's take out of the equation that this is only known by super users. > Now, in this situation, when Ctrl-dot is hit it will trigger a handler that > suspends the problematic process and opens a debugger on it. > But it could happen that, > - the stack is so big that the debugger is very sluggish (best-case scenario) > - the VM is just flooded doing GCs so maybe the Ctrl dot event does not even > arrive at Pharo or the trigger > - if the recursion is hit when printing an object (which is more common than > you could imagine), opening the debugger could trigger a new recursion and > never give back the control to the user > > ## What are we working on > > The main idea here is: Can we have a simple and efficient way to prevent such > kinds of situations? > > After many discussions around detecting recursion, we kinda arrived at the > simple solution of just detecting a stack overflow. > The solution is easy to understand (because it’s like other languages work) > and easy to implement because there is already support for that. > But this leaves open two questions: > - what happens when people want to use the “infinite stack” feature? > - when should a process stack overflow? What is a sensitive default value? > > Our draft implementation here > https://github.com/pharo-project/pharo-vm/pull/710 does the following to cope > with this: > - we can now parametrize the size of the stack (of each stack page to be > more accurate) when the VM starts up > - the stack overflow check can be disabled per process > > We also are running experiments to see what could be a sensitive stack size > for our normal usages. Here, for example, we ran almost all test cases in > Pharo separately (one suite per line below), and we observed how many tests > broke (x-axis) with different stack sizes (y-axis). > Here we see that most test suites require at least 20-24k to run properly, > some go up to 36k of stack before converging (i.e., the number of broken > tests does not change). > > <ImagenPegada-10.tiff> > You’ll notice in the graph that There are some scenarios that break all the > time. This is because exception handling itself is recursive and may produce > more stack overflows depending on the size of the stack between the exception > and the exception handler. > So some more work is still required, mostly changing Pharo libraries to > properly support this. For example: > - should tests run in a fresh process with a fresh stack? > - should the exception mechanism use less recursion? > - resumable exceptions add stack pressure because they do not “unstack” > until the exception is finally handled, meaning that the stack used by > exception handling just adds up to the stack of the original code, can we do > better here? > > Probably there are more interesting questions here, that’s the “why" behind > this email. > I’m interested in opinions and scenarios you may come up with that should be > taken into account. > > Cheers, > Guille
