Thanks. We will digest and discuss internally. S
> On 14 Nov 2023, at 07:19, Daniel Slomovits <[email protected]> wrote: > > Hello Stephane, > > A simple example like that—really any case caused by common types of mistake, > even if the method involved is more complex—wouldn't get to run for a couple > seconds. It would error out after a small fraction of a second, since the > interrupt is issued immediately upon exceeding the limit (which by default > is—oops, 64k slots, 256kB, at most ~10k stack frames, but realistically less > due to args and temps), and it just doesn't take that long to execute most > recursive methods 10k levels deep. > > At this point it would raise the limit a little for that process and signal a > StackOverflow exception. In production this likely would mean killing the > process but that's up to application code. In development you get a walkback > like any other exception. Because the stack is kept to a relatively > reasonable size, I've never had debugger performance be an issue—it's > noticeably more sluggish, but we're talking about taking 200-300ms to respond > instead of imperceptible. Similarly for GC pressure—even if you have dozens > of processes all maxed out on stack, this would still be less than the memory > used to start a base image. > > Certainly there are scenarios that can cause problems—an error while printing > an object will stop you from opening a debugger, yes. However a StackOverflow > in particular isn't any worse than any other exception—in all cases you'll > get a walkback, hit "Debug", and get another walkback instead of a debugger. > If you hit "Terminate" at that point, it might leave the original process in > a zombie state and/or a hidden debugger window, but you can kill it from the > Process Monitor/close it from the Window menu and be fine. And even this is > because Dolphin doesn't safeguard printing in the development tools the way > Pharo does—if Pharo adopted stack overflow handling like Dolphin's, a stack > overflow likely wouldn't even stop you from opening a debugger, it would just > make it a little slow, then something would show up with "error printing: a > StackOverflow" or the like. > > Hope that helps. > > Daniel > > On Sun, Nov 12, 2023 at 4:53 AM stephane ducasse <[email protected] > <mailto:[email protected]>> wrote: >> Hi daniel >> >> Thanks for the feedback. >> May be you wrote it but I could not really understand. >> >> How dolphin handled >> >>>> ``` >>>> A >> foo >>>> ^ self foo >>>> ``` >> >> That is let to run a couple of seconds? >> Did they kill the process? >> >> In Pharo we do have an interrupt but >> >>>> But it could happen that, >>>> - the stack is so big that the debugger is very sluggish (best-case >>>> scenario) >>>> - the VM is just flooded doing GCs so maybe the Ctrl dot event does not >>>> even arrive at Pharo or the trigger >>>> - if the recursion is hit when printing an object (which is more common >>>> than you could imagine), opening the debugger could trigger a new >>>> recursion and never give back the control to the user >> >> >> S >> >>> On 11 Nov 2023, at 05:10, Daniel Slomovits <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> I think this is a great idea. I've mostly used Dolphin Smalltalk, which is >>> actually a strict stack machine under the hood (it has a context-like >>> introspection API but the stack is explicitly the canonical form), so it's >>> more-or-less forced to implement a limit of some kind. When I started using >>> Pharo I triggered a couple stack overflows by mistake, and was frustrated >>> by the fact that at first what happened was...nothing, everything seemed >>> fine, my code just didn't work. And then half a minute later Pharo gets >>> extremely slow and I notice it's using 2GB of memory and by then it's too >>> late and I have to kill the image. Getting a more-or-less immediate error >>> would be much more user-friendly IMO. >>> >>> A couple things to learn from Dolphin's implementation, I think: >>> When a stack overflow is detected, the resulting interrupt raises that >>> process' stack limit by a significant amount (though by less than the >>> original limit—IOW it doesn't double, but it's not just a couple more >>> frames either) before signaling the exception, precisely so that exception >>> handling can occur without triggering another stack-overflow event. A >>> further refinement could be that if a second stack overflow is detected, we >>> directly invoke more basic recovery—this could mean an emergency evaluator, >>> terminating the offending process and opening a post-mortem with a textual >>> stack dump (ugh! but at least it's predictable), etc. >>> I don't think we should worry too much about refining what exactly the >>> limit is. 10x as much stack as 99% of code will ever use, is still a tiny >>> amount compared to consuming all available memory with Contexts. At least, >>> if I'm understanding the graph/data correctly. That's 36kB of stack space, >>> right? Not 36k frames/contexts deep? With each context being six slots plus >>> args/temps, 36kB is 500-750 frames on a 64-bit VM (in stack >>> representation—contexts add object-header overhead but we don't reify them >>> unless we have to). For reference, Dolphin's limit is 64kB, but that's a >>> 32-bit VM, so the equivalent for 64-bit would be 128kB...but because Pharo >>> can spill contexts to the stack, the limit could easily be 1MB, or a fixed >>> number of frames designed to approximate that—still a tiny amount of memory >>> overall, and still will be hit near-instantly by true infinite recursion, >>> but lots of breathing room for most use cases. >>> Actually, this did get me to thinking...the stack depth of a Pharo process >>> is not necessarily easy/cheap to compute in the general case, without >>> caching a lot of information on intermediate contexts. In most cases the >>> context chain acts as a proper stack, but not always—methods like >>> Process>>on:do:, and some even more esoteric ones I forget off the top of >>> my head, make modifications far away from the top context and may splice >>> context chains together in odd ways. Perhaps a more flexible limit would be >>> better—one that is triggered by allocating more than a certain number of >>> contexts total, and examines running Processes in detail to find the >>> culprit at that point. >>> >>> On Thu, Nov 9, 2023 at 4:38 AM Guillermo Polito <[email protected] >>> <mailto:[email protected]>> wrote: >>>> Hi all, >>>> >>>> We started (with many interruptions over the last months) working a bit >>>> with Stephane on understanding what is the (positive and negative) impact >>>> of stack-overflow support in Pharo. >>>> The key idea is that if a process consumes too much stack (potentially >>>> because of an infinite recursion) then the process should stop with an >>>> exception. >>>> >>>> ## Why we want better stack consumption control >>>> >>>> This idea comes up to solve issues that are pretty common and hit >>>> especially newbies. >>>> For example, imagine you accidentally write an accessor such as >>>> >>>> ``` >>>> A >> foo >>>> ^ self foo >>>> ``` >>>> >>>> Students do this all the time, and I’ve also seen it in experienced people >>>> who go too fast :). >>>> More importantly, such recursions could happen also with not-so-obvious >>>> indirect recursions (a sends b, b sends c, c sends a), and these could hit >>>> anybody. >>>> >>>> This is aggravated because the current execution model allows us to have >>>> infinite stacks —meaning: limited by available memory only. >>>> This is indeed a nice feature for many use cases but it has its own >>>> drawbacks when one of these kind of recursions are hit: >>>> - code just loops forever taking space in the stack >>>> - when there is no more stack space, context objects are created and >>>> moved to the heap >>>> - but those contexts are strongly held, so they are never GCed and take >>>> up extra space >>>> - even worse! they are there adding more work to the GC every time and >>>> making the GC run more often looking for space that is not there >>>> >>>> ## Why Ctrl-dot does not always work >>>> >>>> Of course, super users know there is this “Ctrl dot” hidden feature that >>>> should help you recover from this. >>>> First, let's take out of the equation that this is only known by super >>>> users. >>>> Now, in this situation, when Ctrl-dot is hit it will trigger a handler >>>> that suspends the problematic process and opens a debugger on it. >>>> But it could happen that, >>>> - the stack is so big that the debugger is very sluggish (best-case >>>> scenario) >>>> - the VM is just flooded doing GCs so maybe the Ctrl dot event does not >>>> even arrive at Pharo or the trigger >>>> - if the recursion is hit when printing an object (which is more common >>>> than you could imagine), opening the debugger could trigger a new >>>> recursion and never give back the control to the user >>>> >>>> ## What are we working on >>>> >>>> The main idea here is: Can we have a simple and efficient way to prevent >>>> such kinds of situations? >>>> >>>> After many discussions around detecting recursion, we kinda arrived at the >>>> simple solution of just detecting a stack overflow. >>>> The solution is easy to understand (because it’s like other languages >>>> work) and easy to implement because there is already support for that. >>>> But this leaves open two questions: >>>> - what happens when people want to use the “infinite stack” feature? >>>> - when should a process stack overflow? What is a sensitive default value? >>>> >>>> Our draft implementation here >>>> https://github.com/pharo-project/pharo-vm/pull/710 does the following to >>>> cope with this: >>>> - we can now parametrize the size of the stack (of each stack page to be >>>> more accurate) when the VM starts up >>>> - the stack overflow check can be disabled per process >>>> >>>> We also are running experiments to see what could be a sensitive stack >>>> size for our normal usages. Here, for example, we ran almost all test >>>> cases in Pharo separately (one suite per line below), and we observed how >>>> many tests broke (x-axis) with different stack sizes (y-axis). >>>> Here we see that most test suites require at least 20-24k to run properly, >>>> some go up to 36k of stack before converging (i.e., the number of broken >>>> tests does not change). >>>> >>>> <ImagenPegada-10.tiff> >>>> >>>> You’ll notice in the graph that There are some scenarios that break all >>>> the time. This is because exception handling itself is recursive and may >>>> produce more stack overflows depending on the size of the stack between >>>> the exception and the exception handler. >>>> So some more work is still required, mostly changing Pharo libraries to >>>> properly support this. For example: >>>> - should tests run in a fresh process with a fresh stack? >>>> - should the exception mechanism use less recursion? >>>> - resumable exceptions add stack pressure because they do not “unstack” >>>> until the exception is finally handled, meaning that the stack used by >>>> exception handling just adds up to the stack of the original code, can we >>>> do better here? >>>> >>>> Probably there are more interesting questions here, that’s the “why" >>>> behind this email. >>>> I’m interested in opinions and scenarios you may come up with that should >>>> be taken into account. >>>> >>>> Cheers, >>>> Guille >>> _______________________________________________ >>> Pharo-vm mailing list -- [email protected] >>> <mailto:[email protected]> >>> To unsubscribe send an email to [email protected] >>> <mailto:[email protected]> >> >> -------------------------------------------- >> Stéphane Ducasse >> http://stephane.ducasse.free.fr <http://stephane.ducasse.free.fr/> / >> http://www.pharo.org <http://www.pharo.org/> >> 03 59 35 87 52 >> Assistant: Aurore Dalle >> FAX 03 59 57 78 50 >> TEL 03 59 35 86 16 >> S. Ducasse - Inria >> 40, avenue Halley, >> Parc Scientifique de la Haute Borne, Bât.A, Park Plaza >> Villeneuve d'Ascq 59650 >> France >> >> >> > _______________________________________________ > Pharo-vm mailing list -- [email protected] > To unsubscribe send an email to [email protected] -------------------------------------------- Stéphane Ducasse http://stephane.ducasse.free.fr / http://www.pharo.org 03 59 35 87 52 Assistant: Aurore Dalle FAX 03 59 57 78 50 TEL 03 59 35 86 16 S. Ducasse - Inria 40, avenue Halley, Parc Scientifique de la Haute Borne, Bât.A, Park Plaza Villeneuve d'Ascq 59650 France
