The changes between 1.1 and 1.1.1 are the issues in [1]. None seems related... did I miss something?
One change that I don't understand, although it probably is unrelated, is in [2]: LargePositiveInteger removeSelector: #=! LargePositiveInteger removeSelector: #bitAnd:! LargePositiveInteger removeSelector: #bitOr:! LargePositiveInteger removeSelector: #bitShift:! LargePositiveInteger removeSelector: #bitXor:! LargePositiveInteger removeSelector: #'~='! Why would one want to remove these primitive calls from large integers? Cheers, Adrian [1] http://code.google.com/p/pharo/issues/list?can=1&q=Milestone%3D1.1.1&colspec=ID+Type+Status+Summary+Milestone+Difficulty&cells=tiles [2] http://code.google.com/p/pharo/issues/attachmentText?id=2912&aid=-2442931684430823333&name=NecessaryImageChangesForCogToWork.Pharo1.1.cs&token=4a16b7709abc303c3826e5be2743eeb7 On Dec 7, 2010, at 09:52 , Mariano Martinez Peck wrote: > ---------- Forwarded message ---------- > From: David T. Lewis <[email protected]> > Date: Tue, Dec 7, 2010 at 2:06 AM > Subject: Re: [Vm-dev] Image freeze because handleTimerEvent and Seaside > process gone?! > To: Squeak Virtual Machine Development Discussion < > [email protected]>, [email protected] > > > > On Mon, Dec 06, 2010 at 12:33:59PM -0800, Andreas Raab wrote: >> >> At a guess, I'd say it's either one of two issues: >> >> 1) Your STOP/CONT handling. This sounds suspicious and it could affect >> the timer handling. I'm assuming that the issue happens after receiving >> the CONT signal, no? If you can, you might want to a) make sure that you >> only get the STOP signal when the VM is in ioRelinquish() and not (for >> example) currently executing the delay process and b) consider to dump >> the call stacks whenever the VM gets the CONT signal to see what the >> status is. >> >> 2) Some set of incomplete process/delay/semaphore changes in Pharo. One >> of the problems with processes and delays is that this part of the >> system reacts very badly to random "cleaning". I.e., changing "foo == >> nil" to "foo isNil" can have dramatic effects (since it introduces a >> suspension point) with just the kind of weird issue you're seeing. > > Actually #2 does seem like a likely culprit. I found a Pharo 1.1 image > and loaded the CommandShell and OSProcess test suites. The CommandShell > tests put a heavy load on process switching, and are rather timing > dependent. On Pharo 1.1 I get intermittent and non-reproducible errors > and test failures, and I can't get a clean run of the test suite. The > errors seem to be different each time. > > On Pharo 1.1.1 and 1.2 I can get clean runs of the CommandShell/OSProcess > tests, so I think there must be some issues in Pharo 1.1. If you are > using PharoCore 1.1 now and have the option of moving to Pharo 1.1.1 > or 1.2, I suspect you may see the problems go away. > > Dave > > >> >> With regards to these processes not being printed, that's a side effect >> of how printAllStacks gathers the processes - it will not print >> suspended processes which explains why the UI process doesn't print and >> most likely handleTimerEvent is suspended in a debugger. >> >> Depending on how important this issue is you can also try to dissect the >> object memory itself. If you call writeImageFile (or is it >> writeImageFileIO?) from gdb it will dump the .image file and you can use >> the simulator to look at it more closely. Most likely you'll be able to >> find the processes and look at their stacks. >> >> Cheers, >> - Andreas >> >> On 12/6/2010 2:55 AM, Adrian Lienhard wrote: >>> >>> Hi all, >>> >>> We've been experiencing an "interesting" problem: the image freezes and >>> does not response to HTTP requests anymore after it has been running for >>> days. >>> >>> Here some basic information about our setup: >>> >>> Squeak VM 4.0.3-2202 compiled with gcc 4.3.2 >>> PharoCore 1.1 >>> OS Debian Lenny amd64 (CPUs are 4 Intel Xeon E5530 2.40GHz) >>> >>> - We have never seen the problem with the Squeak VM 3.9-9 and Squeak 3.9 >>> on the identical machine and with the same application source (modulo > some >>> adaptations to make it run on Pharo). >>> - We run the VM with -mmap 512m -vm-sound-null -vm-display-null, and the >>> UI process is suspended (Project uiProcess suspend) >>> - VM does not hog the CPU and memory usage is normal >>> - The meantime between failure is several weeks and we haven't managed to >>> reproduce the problem >>> - The application mainly serves HTTP requests. When the image does not >>> receive requests for some time we send it a STOP signal, when a request >>> comes in it is sent a CONT signal. >>> - lsof shows >>> TCP *:9093 (LISTEN) >>> TCP server:9093->server:46930 (CLOSE_WAIT) >>> >>> Below is a GDB backtrace and the Smalltalk stacks from an image that was >>> frozen (the VM had been running for almost 100 hours): >>> >>> ============================================================= >>> (gdb) bt >>> #0 0x08072020 in ?? () >>> #1<signal handler called> >>> #2 0xb766f5e0 in malloc () from /lib/libc.so.6 >>> #3<function called from gdb> >>> #4 0xb76c50c8 in select () from /lib/libc.so.6 >>> #5 0x08071063 in aioPoll () >>> #6 0xb778bb8d in ?? () from > /usr/lib/squeak/4.0.3-2202//so.vm-display-null >>> #7 0x000003e8 in ?? () >>> #8 0x997b5a34 in ?? () >>> #9 0xbfe7cb28 in ?? () >>> #10 0x08074575 in ioRelinquishProcessorForMicroseconds () >>> Backtrace stopped: frame did not save the PC >>> >>> (gdb) call printCallStack() >>> -1719969228>idleProcess >>> -1719969320>startUp >>> -1740134028 BlockClosure>newProcess >>> $3 = -1755344892 >>> >>> (gdb) call (int) printAllStacks() >>> Process >>> -1719969228>idleProcess >>> -1719969320>startUp >>> -1740134028 BlockClosure>newProcess >>> >>> Process >>> -1740113860>finalizationProcess >>> -1740113952>restartFinalizationProcess >>> -1740113532 BlockClosure>newProcess >>> >>> Process >>> -1740134424 SmalltalkImage>lowSpaceWatcher >>> -1740134516 SmalltalkImage>installLowSpaceWatcher >>> -1740134300 BlockClosure>newProcess >>> >>> Process >>> -1719451488 Delay>wait >>> -1719451580 BlockClosure>ifCurtailed: >>> -1719451704 Delay>wait >>> -1719451796 InputEventPollingFetcher>waitForInput >>> -1740126940 InputEventFetcher>eventLoop >>> -1740127032 InputEventFetcher>installEventLoop >>> -1740126816 BlockClosure>newProcess >>> >>> Process >>> -1719557780 UnixOSProcessAccessor>grimReaperProcess >>> -1740113624 BlockClosure>repeat >>> -1740113716 UnixOSProcessAccessor>grimReaperProcess >>> -1740117340 BlockClosure>newProcess >>> >>> [omitted many newlines between output above] >>> ============================================================= >>> >>> What is striking from the above process listing is that two processes are >>> missing: the handleTimerEvent process and the Seaside process (that is, >>> the TCP listener loop). How comes these processes vanished? >>> >>> This may be related to Pharo or to the Squeak VM. >>> >>> Has anybody else seen this problem? Any idea how to debug/fix this issue >>> is very much appreciated! >>> >>> Cheers, >>> Adrian >>> >>> >>> CCed to pharo-dev since this may be related to Pharo; please respond on >>> the squeak-vm list >>> >>> >>>
