On Wed, May 18, 2016 at 9:49 PM, Martin McClure <[email protected]> wrote:
> On 05/18/2016 03:17 PM, Martin McClure wrote: > >> On 05/18/2016 08:49 AM, Mariano Martinez Peck wrote: >> >>> Hi guys, >>> >>> I am seeing a problem in Pharo 5.0 regarding Delay >> wait. I cannot >>> explain how this could happened but it does, and it happened to me a couple >>> of times (but not fully reproducible). >>> >>> >> Hmm. The schedulerResumptionTime is, somehow, being (approximately) >> doubled. It's not clear how that can happen, but I'll look a little more. >> >> > Mario, is there any chance that you might be saving the image during one > of these Delays? > > > This one smells like a race condition, and I think I see something that > *might* explain it. But I don't have any more time to spend on this one, so > I'll leave the rest to someone else. I hope this is helpful: > > The only way I immediately see for the schedulerResumptionTime to become > approximately doubled is if the Delay's resumption time is adjusted by > #restoreResumptionTimes without previously having been adjusted by > #saveResumptionTimes. > > The only time either of those are sent, that I can see, is on saving the > image. Both are normally sent, (save before the snapshot, restore > afterwards), but there may be a hole there. > > Martin, first off, thanks for the research!!! Now....your email made me remember something:* I did get VM crash when saving the image a couple of times. The VM crashed when saving the image. *If I re-opened the image, it looks like if the image was indeed saved (so the snapshot primitive itself did work), but I suspect not all shutdown code could have been run correctly. The VM crash looks like the FreeTypeFace >> pvtDestroyHandle which, as far as I know, it's a "known crash" (I attach crash dump). From what I can see, if I follow all the stack, the crash starts from the WeakArray >> startUp: . That means that...depending on the order of the startup list...the Scheduler may not have been run after the crash. Now.... WeakArray initialization does: SessionManager default registerSystemClassNamed: self name. While... Delay class >> startUp "Restart active delay, if any, when resuming a snapshot." Scheduler startUp. And the Delay registration is SessionManager default registerSystemClassNamed: self name atPriority: 20. So...that seems correct... I can verify this by: SessionManager default systemCategory prioritizedList Anyway...not sure if this adds something, but just wanted to note this. > #saveResumptionTimes is only sent (by this scheduler class) when the > accessProtect semaphore is held, but #handleTimerEvent: is executed in the > timing Process *without* the protection of accessProtect, in the case of > the VM signaling the timingSemaphore. If the VM signals the > timingSemaphore, #handleTimerEvent: could run in the middle of > #saveResumptionTimes. If some Delay expires because of that timer event, > our Delay could move from being the first suspended delay to being the > active delay. If that happens after we've adjusted the active delay, but > before we've processed the suspended delays, that Delay will not get > adjusted, and will show the symptoms that Mariano is seeing. > > Also, I'm not sure how the Heap that holds the suspendedDelays will react > to being modified in the middle of an enumeration. That might open a larger > window for the problem. > > Regards, > > -Martin > > -- Mariano http://marianopeck.wordpress.com
crash.dmp
Description: Binary data
