On 11 January 2011 04:42, Eliot Miranda <[email protected]> wrote: > Hi Igor, Hi All, > > On Sun, Jan 2, 2011 at 4:11 PM, Igor Stasenko <[email protected]> wrote: >> >> On 3 January 2011 00:15, Eliot Miranda <[email protected]> wrote: >> > Hi Martin, Hi All, >> > so find new VMs in VM.r2341/. The linux crashes (certainly the one >> > you >> > suffered from Martin) seem to be caused by an optimization bug (but they >> > could be caused by bad code generation, creating something that assumes >> > ordering constraints which C doesn't guarantee). I suspect the former >> > because I don't see the crash when running exactly the same VM and image >> > from a different directory; provoking the crash requires a particular >> > path >> > (go figure; I haven't pinned this down yet). >> > So my "fix" is preventing a complex function being inlined into the main >> > interpreter loop, removing the sources of some warnings, and lowering >> > the >> > optimization level of the gcc3x-cointerp.c file to -O1 from -O2 (my >> > build >> > environment, CentOS Linux 5.3, uses gcc 4.1.2). I'm not proud of this >> > "fix". I've violated the Deutsch criterion by not diagnosing the cause >> > of >> > the bug so I can't stand behind this fix; it's a hack that appears to >> > work >> > and may have merely pushed the real bug further underground. Alas I >> > don't >> > have time to do a better job. Hopefully it'll get those of you on linux >> > going again. >> >> Eliot, if you remember, i also had crash issues on linux, and pinned down >> it to >> removing optimization from <something>heartbeat.c while keeping >> gcc3x-cointerp.c to use >> same optimization flags as for rest of files. > > Indeed, quite right. I happened to add a flag to turn off the heartbeat so > I could debug the crash Matthew was seeing in starting up > Squeak4.2-10856-beta.image (since single-stepping through machine code > always gets interrupted by the heartbeat, it being an interval timer) and lo > and behold the bug went away. This is very worrying because it appears to > imply that there's a serious bug in the linux kernel/gcc since delivering a > software interrupt shouldn't corrupt registers, but it clearly does. I'll > try and pass it by someone who's an expert in this area. > Anyway, now find a new linux VM in VM.r2346/ that seems fine with the > interpreter and the cogit compiled at -02 but the heartbeat compiled at -O2. > Running this VM on CentOS 5.3 under Parallels I get > 2839 run, 2796 passes, 7 expected failures, 24 failures, 11 errors, 0 > unexpected passes > for the full test suite in Squeak4.2-10856-beta.image. > So have at it.
Great! Nice to see that we can deal with elusive bugs :) > best > Eliot >> >> I will start coding cmake config for Cog during next week and will be >> able to check my previous >> observations again. >> >> > best >> > Eliot >> > On Sat, Jan 1, 2011 at 11:16 PM, <[email protected]> wrote: >> >> -- >> Best regards, >> Igor Stasenko AKA sig. >> > > -- Best regards, Igor Stasenko AKA sig.
