Re: [Pharo-dev] [Moose-dev] Re: Too frequent crashes :-(

2016-12-19 Thread Alexandre Bergel
Thanks Peter for your email.

Alexandre


> On Dec 18, 2016, at 10:10 PM, Peter Uhnak  wrote:
> 
> Hi Alex,
> 
> I certainly understand your frustration, I felt it too on Windows to the 
> point where I stopped using Pharo for couple of weeks out of rage, and then 
> spend in total at least 40+ hours digging into the VM until I added usable 
> workaround for Windows. Not to mention how frustrating was the fixing it.
> 
> Obviously on Mac there are still some unresolved pathways, but it will not 
> magically fix itself.
> 
> The reason for the crash (bad object pinning and moving of canvas memory) has 
> been known for some time now, so more crash.dmps do not give any more insight.
> 
> If this is to be resolved then one of the two have to happen:
> 
> A) Someone who really understands VM/image memory management / GC / pinning 
> fixes the issue.
> 
> B) Someone with Mac (which I don't have) digs around BitBlt code (or wherever 
> it was) and adds a similar workaround.
> 
> Considering the issue has emerged more than a year ago (Spur switch), I don't 
> think (A) is going to happen any time soon, so I guess the only chance is to 
> get elbows greasy and fix it yourself (B) (or you make one of your students 
> suffer :)).
> 
> I didn't have a single Roassal/BitBlt related crash on Windows since my fix 
> was added, so there should be a way to add a workaround for Mac too.
> 
> (there are of course crashes related to FT, but the story is arguably the 
> same).
> 
> TL;DR: I don't think it's on a todo list of anyone who actually understands 
> this, so the only way to fix it is by yourself or find a concrete person that 
> would be willing to dig into it.
> 
> Peter
> 

-- 
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel  http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.






Re: [Pharo-dev] [Moose-dev] Re: Too frequent crashes :-(

2016-12-19 Thread Stephane Ducasse
Hi peter

I can understand your frustration. I would be really like to see what I can
do to help.
But I do not know. Except showing your mail to the vm guys.
Thanks for having send it!


Stef


On Mon, Dec 19, 2016 at 2:03 PM, Peter Uhnak  wrote:

> On Mon, Dec 19, 2016 at 08:12:29AM +0800, Ben Coman wrote:
> > Can yo point to where you added you workaround?
>
> The fix is a single line, because I hate myself.
>
> interpreterProxy failed ifTrue:[^nil].
>
> https://github.com/pharo-project/pharo-vm/commit/
> 9bf66cf656b176d988e1b0ba74fc37da467e6192
>
> To give you more info:
>
> The problem is that memory of canvas forms are not properly pinned, so
> during garbage collection the form is being moved, but if at the same time
> the canvas form is being updated and moved, you are accessing wrong memory
> -> crash.
>
> My fix will return prematurely if an error occurs and throws
> PrimitiveFailed in the image before any wrong memory is accessed. On
> Roassal side the PrimitiveFailed is catched and a paint cycle is skipped
> --- this is good enough, as it results only in ocassional flicker that
> immediately fixes itself instead of crashing the image.
>
> It seems that on Mac there are also other places in the BitBlt code where
> the surface is being accessed without a check.
>
> Also be careful not to be misled by the crash dump stack. It took me quite
> a while to figure out that GrafPort is already operating on wrong data, so
> it's not GrafPort's fault, but BitBlt's; of course both should possibly be
> investigated with respect to the mac crash.
>
> Final note, personally I found it much easier the debug and manipulate the
> resulting C code (and recompiling just that), then to modify the Slang code
> and rebuild the source code and recompile it all (but again, I don't know
> what is the proper way to work with the VM code).
>
> I used this script to trigger the crash https://gist.github.com/
> peteruhnak/024650ed2594301558df4da913549b54
> As the crash depends on memory consumption and "proper" garbage collection
> cycle, it wasn't the easiest to reproduce, however the script above usually
> managed to crash it. Having a more reliable way would be nice, but simply
> triggering GC (nor full GC) wasn't enough because the memory wasn't in the
> "right" state.
>
> Peter
>
>


Re: [Pharo-dev] [Moose-dev] Re: Too frequent crashes :-(

2016-12-19 Thread Peter Uhnak
On Mon, Dec 19, 2016 at 08:12:29AM +0800, Ben Coman wrote:
> Can yo point to where you added you workaround?

The fix is a single line, because I hate myself.

interpreterProxy failed ifTrue:[^nil].

https://github.com/pharo-project/pharo-vm/commit/9bf66cf656b176d988e1b0ba74fc37da467e6192

To give you more info:

The problem is that memory of canvas forms are not properly pinned, so during 
garbage collection the form is being moved, but if at the same time the canvas 
form is being updated and moved, you are accessing wrong memory -> crash.

My fix will return prematurely if an error occurs and throws PrimitiveFailed in 
the image before any wrong memory is accessed. On Roassal side the 
PrimitiveFailed is catched and a paint cycle is skipped --- this is good 
enough, as it results only in ocassional flicker that immediately fixes itself 
instead of crashing the image.

It seems that on Mac there are also other places in the BitBlt code where the 
surface is being accessed without a check.

Also be careful not to be misled by the crash dump stack. It took me quite a 
while to figure out that GrafPort is already operating on wrong data, so it's 
not GrafPort's fault, but BitBlt's; of course both should possibly be 
investigated with respect to the mac crash.

Final note, personally I found it much easier the debug and manipulate the 
resulting C code (and recompiling just that), then to modify the Slang code and 
rebuild the source code and recompile it all (but again, I don't know what is 
the proper way to work with the VM code).

I used this script to trigger the crash 
https://gist.github.com/peteruhnak/024650ed2594301558df4da913549b54
As the crash depends on memory consumption and "proper" garbage collection 
cycle, it wasn't the easiest to reproduce, however the script above usually 
managed to crash it. Having a more reliable way would be nice, but simply 
triggering GC (nor full GC) wasn't enough because the memory wasn't in the 
"right" state.

Peter



Re: [Pharo-dev] [Moose-dev] Re: Too frequent crashes :-(

2016-12-18 Thread Peter Uhnak
Hi Alex,

I certainly understand your frustration, I felt it too on Windows to the point 
where I stopped using Pharo for couple of weeks out of rage, and then spend in 
total at least 40+ hours digging into the VM until I added usable workaround 
for Windows. Not to mention how frustrating was the fixing it.

Obviously on Mac there are still some unresolved pathways, but it will not 
magically fix itself.

The reason for the crash (bad object pinning and moving of canvas memory) has 
been known for some time now, so more crash.dmps do not give any more insight.

If this is to be resolved then one of the two have to happen:

A) Someone who really understands VM/image memory management / GC / pinning 
fixes the issue.

B) Someone with Mac (which I don't have) digs around BitBlt code (or wherever 
it was) and adds a similar workaround.

Considering the issue has emerged more than a year ago (Spur switch), I don't 
think (A) is going to happen any time soon, so I guess the only chance is to 
get elbows greasy and fix it yourself (B) (or you make one of your students 
suffer :)).

I didn't have a single Roassal/BitBlt related crash on Windows since my fix was 
added, so there should be a way to add a workaround for Mac too.

(there are of course crashes related to FT, but the story is arguably the same).

TL;DR: I don't think it's on a todo list of anyone who actually understands 
this, so the only way to fix it is by yourself or find a concrete person that 
would be willing to dig into it.

Peter