[Pharo-dev] Re: [Pharo-vm] Stack overflow support?

2023-11-09 Thread David Mason
Tail-call-elimination is certainly not a total solution to the stack
overflow problem. However it would catch the `foo ^ self foo` example. And
it would reduce he demand for stack.

I like what Sven mentioned about a resumable exception. You could
incrementally expand your stack in your application.

As for debugging, it's not clear how much of a problem the lack of a return
context actually is. We won't know until TCE is implemented and we get some
experience with it. For the record Zag Smalltalk is doing TCE.

../Dave

On Thu, 9 Nov 2023 at 09:18, Guille Polito 
wrote:

>
>
> El 9 nov. 2023, a las 13:35, David Mason  escribió:
>
> Tail call elimination would reduce stack usage significantly. See
> Tail Call Elimination in OpenSmalltalk, Ralston & Mason, IWST2019
>
> This really should be part of any stack solution. Particularly people
> coming from a functional language background are likely to use it. Also
> double dispatch would benefit significantly.
>
>
> Yes, I agree that tail-call elimination will largely improve performance.
> But it will also harm debugging unless there is a way to reconstruct the
> stack when there is a problem, which I suspect could be very expensive.
>
> Anyways, this solves a performance problem, not the recursion problem that
> motivates my mail, right?
>
>
> ../Dave
>
> On Thu, 9 Nov 2023 at 04:38, Guillermo Polito 
> wrote:
>
>> Hi all,
>>
>> We started (with many interruptions over the last months) working a bit
>> with Stephane on understanding what is the (positive and negative) impact
>> of stack-overflow support in Pharo.
>> The key idea is that if a process consumes too much stack (potentially
>> because of an infinite recursion) then the process should stop with an
>> exception.
>>
>> ## Why we want better stack consumption control
>>
>> This idea comes up to solve issues that are pretty common and hit
>> especially newbies.
>> For example, imagine you accidentally write an accessor such as
>>
>> ```
>> A >> foo
>>^ self foo
>> ```
>>
>> Students do this all the time, and I’ve also seen it in experienced
>> people who go too fast :).
>> More importantly, such recursions could happen also with not-so-obvious
>> indirect recursions (a sends b, b sends c, c sends a), and these could hit
>> anybody.
>>
>> This is aggravated because the current execution model allows us to have
>> infinite stacks —meaning: limited by available memory only.
>> This is indeed a nice feature for many use cases but it has its own
>> drawbacks when one of these kind of recursions are hit:
>>  - code just loops forever taking space in the stack
>>  - when there is no more stack space, context objects are created and
>> moved to the heap
>>  - but those contexts are strongly held, so they are never GCed and take
>> up extra space
>>  - even worse! they are there adding more work to the GC every time and
>> making the GC run more often looking for space that is not there
>>
>> ## Why Ctrl-dot does not always work
>>
>> Of course, super users know there is this “Ctrl dot” hidden feature that
>> should help you recover from this.
>> First, let's take out of the equation that this is only known by super
>> users.
>> Now, in this situation, when Ctrl-dot is hit it will trigger a handler
>> that suspends the problematic process and opens a debugger on it.
>> But it could happen that,
>>  - the stack is so big that the debugger is very sluggish (best-case
>> scenario)
>>  - the VM is just flooded doing GCs so maybe the Ctrl dot event does not
>> even arrive at Pharo or the trigger
>>  - if the recursion is hit when printing an object (which is more common
>> than you could imagine), opening the debugger could trigger a new recursion
>> and never give back the control to the user
>>
>> ## What are we working on
>>
>> The main idea here is: Can we have a simple and efficient way to prevent
>> such kinds of situations?
>>
>> After many discussions around detecting recursion, we kinda arrived at
>> the simple solution of just detecting a stack overflow.
>> The solution is easy to understand (because it’s like other languages
>> work) and easy to implement because there is already support for that.
>> But this leaves open two questions:
>>  - what happens when people want to use the “infinite stack” feature?
>>  - when should a process stack overflow? What is a sensitive default
>> value?
>>
>> Our draft implementation here
>> https://github.com/pharo-project/pharo-vm/pull/710 does the following to
>> cope with this:
>>  - we can now parametrize the size of the stack (of each stack page to be
>> more accurate) when the VM starts up
>>  - the stack overflow check can be disabled per process
>>
>> We also are running experiments to see what could be a sensitive stack
>> size for our normal usages. Here, for example, we ran almost all test cases
>> in Pharo separately (one suite per line below), and we observed how many
>> tests broke (x-axis) with different stack sizes (y-axis).
>> Here we see that 

[Pharo-dev] Re: [Pharo-vm] Stack overflow support?

2023-11-09 Thread Guille Polito


> El 9 nov. 2023, a las 13:35, David Mason  escribió:
> 
> Tail call elimination would reduce stack usage significantly. See 
> Tail Call Elimination in OpenSmalltalk, Ralston & Mason, IWST2019
> 
> This really should be part of any stack solution. Particularly people coming 
> from a functional language background are likely to use it. Also double 
> dispatch would benefit significantly.

Yes, I agree that tail-call elimination will largely improve performance.
But it will also harm debugging unless there is a way to reconstruct the stack 
when there is a problem, which I suspect could be very expensive.

Anyways, this solves a performance problem, not the recursion problem that 
motivates my mail, right?

> 
> ../Dave
> 
> On Thu, 9 Nov 2023 at 04:38, Guillermo Polito  > wrote:
> Hi all,
> 
> We started (with many interruptions over the last months) working a bit with 
> Stephane on understanding what is the (positive and negative) impact of 
> stack-overflow support in Pharo.
> The key idea is that if a process consumes too much stack (potentially 
> because of an infinite recursion) then the process should stop with an 
> exception.
> 
> ## Why we want better stack consumption control
> 
> This idea comes up to solve issues that are pretty common and hit especially 
> newbies.
> For example, imagine you accidentally write an accessor such as
> 
> ```
> A >> foo
>^ self foo
> ```
> 
> Students do this all the time, and I’ve also seen it in experienced people 
> who go too fast :).
> More importantly, such recursions could happen also with not-so-obvious 
> indirect recursions (a sends b, b sends c, c sends a), and these could hit 
> anybody.
> 
> This is aggravated because the current execution model allows us to have 
> infinite stacks —meaning: limited by available memory only.
> This is indeed a nice feature for many use cases but it has its own drawbacks 
> when one of these kind of recursions are hit:
>  - code just loops forever taking space in the stack
>  - when there is no more stack space, context objects are created and moved 
> to the heap
>  - but those contexts are strongly held, so they are never GCed and take up 
> extra space
>  - even worse! they are there adding more work to the GC every time and 
> making the GC run more often looking for space that is not there
> 
> ## Why Ctrl-dot does not always work
> 
> Of course, super users know there is this “Ctrl dot” hidden feature that 
> should help you recover from this.
> First, let's take out of the equation that this is only known by super users.
> Now, in this situation, when Ctrl-dot is hit it will trigger a handler that 
> suspends the problematic process and opens a debugger on it.
> But it could happen that,
>  - the stack is so big that the debugger is very sluggish (best-case scenario)
>  - the VM is just flooded doing GCs so maybe the Ctrl dot event does not even 
> arrive at Pharo or the trigger
>  - if the recursion is hit when printing an object (which is more common than 
> you could imagine), opening the debugger could trigger a new recursion and 
> never give back the control to the user
> 
> ## What are we working on
> 
> The main idea here is: Can we have a simple and efficient way to prevent such 
> kinds of situations?
> 
> After many discussions around detecting recursion, we kinda arrived at the 
> simple solution of just detecting a stack overflow.
> The solution is easy to understand (because it’s like other languages work) 
> and easy to implement because there is already support for that.
> But this leaves open two questions:
>  - what happens when people want to use the “infinite stack” feature?
>  - when should a process stack overflow? What is a sensitive default value?
> 
> Our draft implementation here 
> https://github.com/pharo-project/pharo-vm/pull/710 
>  does the following to 
> cope with this:
>  - we can now parametrize the size of the stack (of each stack page to be 
> more accurate) when the VM starts up
>  - the stack overflow check can be disabled per process
> 
> We also are running experiments to see what could be a sensitive stack size 
> for our normal usages. Here, for example, we ran almost all test cases in 
> Pharo separately (one suite per line below), and we observed how many tests 
> broke (x-axis) with different stack sizes (y-axis).
> Here we see that most test suites require at least 20-24k to run properly, 
> some go up to 36k of stack before converging (i.e., the number of broken 
> tests does not change).
> 
> 
> 
> You’ll notice in the graph that There are some scenarios that break all the 
> time. This is because exception handling itself is recursive and may produce 
> more stack overflows depending on the size of the stack between the exception 
> and the exception handler.
> So some more work is still required, mostly changing Pharo libraries to 
> properly support this. For 

[Pharo-dev] Re: Stack overflow support?

2023-11-09 Thread Nicolas Anquetil



On 09/11/2023 10:13, Guillermo Polito wrote:
[...]

## Why Ctrl-dot does not always work

not related to the topic, but may I take the opportunity to mention that 
ctrl-dot does not work on Linux.


So this is not a solution, ever

sorry for the noise

nicolas


Cheers,
Guille


--
Nicolas Anquetil
RMod team -- Inria Lille


[Pharo-dev] Re: Stack overflow support?

2023-11-09 Thread Sven Van Caekenberghe
This is super important.

I always thought that this was an implementation problem in the sense that the 
overflow check was too costly/slow.

Anyway, in most Common Lisp implementations the stack is also limited. When you 
hit the limit, a resumable exception is raised, allowing you to make the stack 
larger and continue. And of course you can set a large initial stack size per 
process.

> On 9 Nov 2023, at 10:13, Guillermo Polito  wrote:
> 
> Hi all,
> 
> We started (with many interruptions over the last months) working a bit with 
> Stephane on understanding what is the (positive and negative) impact of 
> stack-overflow support in Pharo.
> The key idea is that if a process consumes too much stack (potentially 
> because of an infinite recursion) then the process should stop with an 
> exception.
> 
> ## Why we want better stack consumption control
> 
> This idea comes up to solve issues that are pretty common and hit especially 
> newbies.
> For example, imagine you accidentally write an accessor such as
> 
> ```
> A >> foo
>^ self foo
> ```
> 
> Students do this all the time, and I’ve also seen it in experienced people 
> who go too fast :).
> More importantly, such recursions could happen also with not-so-obvious 
> indirect recursions (a sends b, b sends c, c sends a), and these could hit 
> anybody.
> 
> This is aggravated because the current execution model allows us to have 
> infinite stacks —meaning: limited by available memory only.
> This is indeed a nice feature for many use cases but it has its own drawbacks 
> when one of these kind of recursions are hit:
>  - code just loops forever taking space in the stack
>  - when there is no more stack space, context objects are created and moved 
> to the heap
>  - but those contexts are strongly held, so they are never GCed and take up 
> extra space
>  - even worse! they are there adding more work to the GC every time and 
> making the GC run more often looking for space that is not there
> 
> ## Why Ctrl-dot does not always work
> 
> Of course, super users know there is this “Ctrl dot” hidden feature that 
> should help you recover from this.
> First, let's take out of the equation that this is only known by super users.
> Now, in this situation, when Ctrl-dot is hit it will trigger a handler that 
> suspends the problematic process and opens a debugger on it.
> But it could happen that,
>  - the stack is so big that the debugger is very sluggish (best-case scenario)
>  - the VM is just flooded doing GCs so maybe the Ctrl dot event does not even 
> arrive at Pharo or the trigger
>  - if the recursion is hit when printing an object (which is more common than 
> you could imagine), opening the debugger could trigger a new recursion and 
> never give back the control to the user
> 
> ## What are we working on
> 
> The main idea here is: Can we have a simple and efficient way to prevent such 
> kinds of situations?
> 
> After many discussions around detecting recursion, we kinda arrived at the 
> simple solution of just detecting a stack overflow.
> The solution is easy to understand (because it’s like other languages work) 
> and easy to implement because there is already support for that.
> But this leaves open two questions:
>  - what happens when people want to use the “infinite stack” feature?
>  - when should a process stack overflow? What is a sensitive default value?
> 
> Our draft implementation here 
> https://github.com/pharo-project/pharo-vm/pull/710 does the following to cope 
> with this:
>  - we can now parametrize the size of the stack (of each stack page to be 
> more accurate) when the VM starts up
>  - the stack overflow check can be disabled per process
> 
> We also are running experiments to see what could be a sensitive stack size 
> for our normal usages. Here, for example, we ran almost all test cases in 
> Pharo separately (one suite per line below), and we observed how many tests 
> broke (x-axis) with different stack sizes (y-axis).
> Here we see that most test suites require at least 20-24k to run properly, 
> some go up to 36k of stack before converging (i.e., the number of broken 
> tests does not change).
> 
> 
> You’ll notice in the graph that There are some scenarios that break all the 
> time. This is because exception handling itself is recursive and may produce 
> more stack overflows depending on the size of the stack between the exception 
> and the exception handler.
> So some more work is still required, mostly changing Pharo libraries to 
> properly support this. For example:
>  - should tests run in a fresh process with a fresh stack?
>  - should the exception mechanism use less recursion?
>  - resumable exceptions add stack pressure because they do not “unstack” 
> until the exception is finally handled, meaning that the stack used by 
> exception handling just adds up to the stack of the original code, can we do 
> better here?
> 
> Probably there are more interesting questions here, that’s the “why"