More on the problem that Eliot is speaking about: it can happen in these
conditions:
if is SIGIO is delivered while executing any trampoline transiting from
SmalltalkToCStackSwitch.
What happens here?
We must load 64bits contents of memory (cStackPointerAddress) to the stack
pointer register (SPReg = $rsp) - See genLoadCStackPointer.
This is done by using the CogRTLOpcode abstract instruction MoveAwR
But we have no matching instruction in IA-32 X64... We can only load a
64bits memory content in $rax
So the idea is to generate this sequence
(CogX64Compiler>>concretizeMoveAwR):
xchgq %rsp, %rax
movabsq 0x10027c338, %rax ; cStackPointerAddress
xchgq %rsp, %rax
That's clever because it preserves $rax which could be in use when we want
to MoveAwR.
But it has an unfortunate side effect: the stack pointer temporarily gets
the contents of $rax and can thus temporarily point anywhere.
What happens when performing Squeak SocketTest is that we use some
trampolines (I guess for invoking primitives for example) and we generate
SIGIO (for some reasons, there are a lot of SIGIO generated on my
particular macos machine, so I can trigger the bug more easily than Eliot).
We previously installed a handler for SIGIO via signal(). When we use
signal(), the handler shares the stack pointer with user program.
If the event is delivered in between the two xchgq instructions above, the
signal handler will then use a corrupted stack pointer pointing anywhere
(depending no contents of $rax) when the VM enter the signal handler
function, it uses stack pointer to save some states, and corrupt a memory
zone, segfault or whatever.
In the case described by the opensmalltalk vm-dev thread, $rax was pointing
to the generated code zone (jitted methodZone), so we corrupted the
generated code and soon get punished for that. But it's probable that there
might be other (rare) occurrences of this bug.
Not sure if it is causing the bugs described by Sean, but it's important to
use the fix from Eliot ASAP and retry.
There might be other occurrence of signal(SIGIO,forceInterruptCheck) in
minheadless flavour, I did not check if Eliot also corrected it, if not it
should also be corrected ASAP, as should every usage of signal() be
replaced by ussage of sigaction() with appropriate flags to use
sigaltstack() - see Eliot's commit details.
Le dim. 6 oct. 2019 à 12:36, Eliot Miranda <[email protected]> a
écrit :
> Hi Sean, Hi All,
>
> this may be because of the issue described here:
> http://forum.world.st/Difficult-to-debug-VM-crash-with-full-blocks-and-Sista-V1-tt5103810.html
>
> This issue is characterized by the system crashing soon after start up
> when some significant i/o is done, typically either to files or sockets.
> It affects macOS only and may indeed affect only 64-bits. We have strong
> evidence that it is caused by the dynamic linker being invoked in the
> signal handler for SIGIO when the signal is delivered while the VM is
> executing JITted code. The symptom that causes the crash is corruption of
> a particular jitted method’s machine code, eg Delay class>>#startEventLoop,
> and we believe that the corruption is caused by the linker when it
> misinterprets a jitted Smalltalk stack frame as an ABI-compliant stack
> frame and attempts to scan code to link it.
>
> Our diagnosis is speculative; this is extremely hard to reproduce.
> Typically in repeating a crashing run SIGIO may no longer be delivered at
> the same point because any remote server has now woken up and delivers
> results sooner, etc. However, Nicolas Cellier and I are both confident
> that we have correctly identified the bug.
>
> The fix is simple; SIGIO should be delivered on a dedicated signal stack
> (see sigaltstack(2)). I committed a fix yesterday evening and we should
> see within a week or so if these crashes have disappeared.
>
> I encourage the Pharo vm maintainers to build and release vms that include
> https://github.com/OpenSmalltalk/opensmalltalk-vm/commit/c24970eb2859a474065c6f69060c0324aef2b211
> asap.
>
>
> Cheers,
> Eliot
> _,,,^..^,,,_ (phone)
>
> On Oct 3, 2019, at 1:24 PM, Sean P. DeNigris <[email protected]>
> wrote:
>
> Segmentation fault Thu Oct 3 15:52:33 2019
>
>
> VM: 201901051900 https://github.com/OpenSmalltalk/opensmalltalk-vm.git
> Date: Sat Jan 5 20:00:11 2019 CommitHash: 7a3c6b6
> Plugins: 201901051900
> https://github.com/OpenSmalltalk/opensmalltalk-vm.git
>
> C stack backtrace & registers:
> rax 0x0000000124380000 rbx 0x00007ffeebd00050 rcx 0x0000000000468260 rdx
> 0x0000000000dd6800
> rdi 0x0000000124cee5a0 rsi 0x0000000124cee5a0 rbp 0x00007ffeebcffe50 rsp
> 0x00007ffeebcffe50
> r8 0x00007fff3f2cefe5 r9 0x0000000000000b00 r10 0x0000000000006000 r11
> 0xfffffffffcd8d5a0
> r12 0x0000000000000002 r13 0x0000000035800000 r14 0x00007ffeebd00064 r15
> 0x0000000000002800
> rip 0x00007fff630f7d09
> 0 libsystem_platform.dylib 0x00007fff630f7d09
> _platform_memmove$VARIANT$Haswell + 41
> 1 Pharo 0x0000000103f52642 reportStackState
> + 952
> 2 Pharo 0x0000000103f52987 sigsegv + 174
> 3 libsystem_platform.dylib 0x00007fff630fab3d _sigtramp + 29
> 4 ??? 0x0000058900000a00 0x0 +
> 6085968660992
> 5 libGLImage.dylib 0x00007fff3f2ce29e
> glgProcessPixelsWithProcessor + 2149
> 6 AMDRadeonX5000GLDriver 0x000000010db16db1
> glrATIStoreLevels
> + 1600
> 7 AMDRadeonX5000GLDriver 0x000000010db52c83
> glrAMD_GFX9_LoadSysTextureStandard + 45
> 8 AMDRadeonX5000GLDriver 0x000000010db519bb glrUpdateTexture
> + 1346
> 9 libGPUSupportMercury.dylib 0x00007fff5181279d
> gpusLoadCurrentTextures + 591
> 10 AMDRadeonX5000GLDriver 0x000000010db5a099
> gldUpdateDispatch
> + 397
> 11 GLEngine 0x00007fff3ff72078
> gleDoDrawDispatchCore + 629
> 12 GLEngine 0x00007fff3ff16369
> glDrawArraysInstanced_STD_Exec + 264
> 13 GLEngine 0x00007fff3ff1625a
> glDrawArrays_UnpackThread + 40
> 14 GLEngine 0x00007fff3ff6dce1 gleCmdProcessor
> +
> 77
> 15 libdispatch.dylib 0x00007fff62ec2dcf
> _dispatch_client_callout + 8
> 16 libdispatch.dylib 0x00007fff62ecea2c
> _dispatch_lane_barrier_sync_invoke_and_complete + 60
> 17 GLEngine 0x00007fff3fec4b85
> glFlush_ExecThread + 15
> 18 Pharo 0x0000000103f4cc62
> -[sqSqueakOSXOpenGLView drawRect:flush:] + 314
> 19 Pharo 0x0000000103f4cb22 -
> ...
>
> Smalltalk stack dump:
> 0x7ffeebd14238 M DelaySemaphoreScheduler>unscheduleAtTimingPriority
> 0x10fab3ad0: a(n) DelaySemaphoreScheduler
> 0x7ffeebd14270 M [] in
> DelaySemaphoreScheduler(DelayBasicScheduler)>runBackendLoopAtTimingPriority
> 0x10fab3ad0: a(n) DelaySemaphoreScheduler
> 0x1125923f8 s BlockClosure>ensure:
> 0x111e88d30 s
> DelaySemaphoreScheduler(DelayBasicScheduler)>runBackendLoopAtTimingPriority
> 0x112590a50 s [] in
> DelaySemaphoreScheduler(DelayBasicScheduler)>startTimerEventLoopPriority:
> 0x111e88e08 s [] in BlockClosure>newProcess
>
> Most recent primitives
> @
> actualScreenSize
> millisecondClockValue
> tempAt:
>
>
>
> -----
> Cheers,
> Sean
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Developers-f1294837.html
>
>