Hi Alistair Did you read that paper https://hal.inria.fr/hal-01152610 <https://hal.inria.fr/hal-01152610> ? Because may be it can give you ideas of possible problems.
Pablo is on vacation until next monday. Guille working wednesday, thursday and friday afternoons S > On 25 Nov 2019, at 10:18, Alistair Grant <[email protected]> wrote: > > Hi All, > > At the moment I'm spending pretty much all of my working time tracking > down VM crashes. It sounds like there may be others working on the same > issue (Guille?, Pablo?), if so it would be good to be able to exchange > notes and hopefully reach a resolution a little earlier. > > So my status... > > I'm currently focusing on two corruptions that I've seen: > > 1. Frame Pointers aren't being updated when the receiver or rcvr/clsr > they point to is moved during scavenging / compaction. > 2. The current frame pointer (framePointer) contains an address that is > in a free stack page. > > > I'm doing all the investigation using a Pharo minimal image, so there's > no FreeType, and from what I've seen FFI isn't being used. > > The script I'm using to reproduce the crash is at: > https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/444#issuecomment-555001612 > (the good part about this is that even the memory addresses are > consistent across runs, so it is highly reproducible). > > > For the Frame Pointers not being updated, what I'm seeing is that after > a scavenge has finished copying all the referenced objects, but before > the survivor spaces are exchanged, the call stack looks like: > > (gdb) call printCallStack() > 0x7ffffffe3f70 I SessionManager>launchSnapshot:andQuit: 0x1508860: > a(n) SessionManager > 0x7ffffffe3fe0 I [] in SessionManager>snapshot:andQuit: 0x1508860: > a(n) SessionManager > 0x7ffffffe4020 I [] in INVALID RECEIVER>newProcess 0x118a728 > 0x118a728 is a forwarded object to 0x488fca0 of slot size 7 > hdr8 ..... > > Once the survivor spaces have been exchanged: > > (gdb) call printCallStack() > 0x7ffffffe3f70 I SessionManager>launchSnapshot:andQuit: 0x1508860: > a(n) SessionManager > 0x7ffffffe3fe0 I [] in SessionManager>snapshot:andQuit: 0x1508860: > a(n) SessionManager > 0x7ffffffe4020 I [] in INVALID RECEIVER>newProcess 0x118a728 is in new > space > > > For the framePointer containing an address in a free stack page: adding > a check during the scavenge shows that the framePointer is in a free > page. > > I'm assuming it is never valid for the framePointer to be in a free > stack page, and that the receiver and rcvr/clsr should never be in new > space. If my assumptions are wrong please let me know. > > If you'd like any more information, please let me know. > > Thanks, > Alistair > -------------------------------------------- Stéphane Ducasse http://stephane.ducasse.free.fr / http://www.pharo.org 03 59 35 87 52 Assistant: Julie Jonas FAX 03 59 57 78 50 TEL 03 59 35 86 16 S. Ducasse - Inria 40, avenue Halley, Parc Scientifique de la Haute Borne, Bât.A, Park Plaza Villeneuve d'Ascq 59650 France
