Ben, for the record, I am using DelayMillisecondScheduler for a day and a half and so far no problem.
On Thu, May 19, 2016 at 9:19 AM, Mariano Martinez Peck < [email protected]> wrote: > > > On Wed, May 18, 2016 at 9:49 PM, Martin McClure <[email protected]> > wrote: > >> On 05/18/2016 03:17 PM, Martin McClure wrote: >> >>> On 05/18/2016 08:49 AM, Mariano Martinez Peck wrote: >>> >>>> Hi guys, >>>> >>>> I am seeing a problem in Pharo 5.0 regarding Delay >> wait. I cannot >>>> explain how this could happened but it does, and it happened to me a couple >>>> of times (but not fully reproducible). >>>> >>>> >>> Hmm. The schedulerResumptionTime is, somehow, being (approximately) >>> doubled. It's not clear how that can happen, but I'll look a little more. >>> >>> >> Mario, is there any chance that you might be saving the image during one >> of these Delays? >> >> >> This one smells like a race condition, and I think I see something that >> *might* explain it. But I don't have any more time to spend on this one, so >> I'll leave the rest to someone else. I hope this is helpful: >> >> The only way I immediately see for the schedulerResumptionTime to become >> approximately doubled is if the Delay's resumption time is adjusted by >> #restoreResumptionTimes without previously having been adjusted by >> #saveResumptionTimes. >> >> The only time either of those are sent, that I can see, is on saving the >> image. Both are normally sent, (save before the snapshot, restore >> afterwards), but there may be a hole there. >> >> > Martin, first off, thanks for the research!!! > > Now....your email made me remember something:* I did get VM crash when > saving the image a couple of times. The VM crashed when saving the image. *If > I re-opened the image, it looks like if the image was indeed saved (so the > snapshot primitive itself did work), but I suspect not all shutdown code > could have been run correctly. > > The VM crash looks like the FreeTypeFace >> pvtDestroyHandle which, as > far as I know, it's a "known crash" (I attach crash dump). From what I can > see, if I follow all the stack, the crash starts from the WeakArray > >> startUp: . > That means that...depending on the order of the startup list...the > Scheduler may not have been run after the crash. > > Now.... WeakArray initialization does: > > SessionManager default > registerSystemClassNamed: self name. > While... > > Delay class >> startUp "Restart active delay, if any, when resuming a > snapshot." Scheduler startUp. > > And the Delay registration is > > SessionManager default > registerSystemClassNamed: self name > atPriority: 20. > > So...that seems correct... > > I can verify this by: > > SessionManager default systemCategory prioritizedList > > Anyway...not sure if this adds something, but just wanted to note this. > > > >> #saveResumptionTimes is only sent (by this scheduler class) when the >> accessProtect semaphore is held, but #handleTimerEvent: is executed in the >> timing Process *without* the protection of accessProtect, in the case of >> the VM signaling the timingSemaphore. If the VM signals the >> timingSemaphore, #handleTimerEvent: could run in the middle of >> #saveResumptionTimes. If some Delay expires because of that timer event, >> our Delay could move from being the first suspended delay to being the >> active delay. If that happens after we've adjusted the active delay, but >> before we've processed the suspended delays, that Delay will not get >> adjusted, and will show the symptoms that Mariano is seeing. >> >> Also, I'm not sure how the Heap that holds the suspendedDelays will react >> to being modified in the middle of an enumeration. That might open a larger >> window for the problem. >> >> Regards, >> >> -Martin >> >> > > > -- > Mariano > http://marianopeck.wordpress.com > -- Mariano http://marianopeck.wordpress.com
