On 05/18/2016 03:17 PM, Martin McClure wrote:
On 05/18/2016 08:49 AM, Mariano Martinez Peck wrote:
Hi guys,
I am seeing a problem in Pharo 5.0 regarding Delay >> wait. I cannot
explain how this could happened but it does, and it happened to me a
couple of times (but not fully reproducible).
Hmm. The schedulerResumptionTime is, somehow, being (approximately)
doubled. It's not clear how that can happen, but I'll look a little more.
Mario, is there any chance that you might be saving the image during one
of these Delays?
This one smells like a race condition, and I think I see something that
*might* explain it. But I don't have any more time to spend on this one,
so I'll leave the rest to someone else. I hope this is helpful:
The only way I immediately see for the schedulerResumptionTime to become
approximately doubled is if the Delay's resumption time is adjusted by
#restoreResumptionTimes without previously having been adjusted by
#saveResumptionTimes.
The only time either of those are sent, that I can see, is on saving the
image. Both are normally sent, (save before the snapshot, restore
afterwards), but there may be a hole there.
#saveResumptionTimes is only sent (by this scheduler class) when the
accessProtect semaphore is held, but #handleTimerEvent: is executed in
the timing Process *without* the protection of accessProtect, in the
case of the VM signaling the timingSemaphore. If the VM signals the
timingSemaphore, #handleTimerEvent: could run in the middle of
#saveResumptionTimes. If some Delay expires because of that timer event,
our Delay could move from being the first suspended delay to being the
active delay. If that happens after we've adjusted the active delay, but
before we've processed the suspended delays, that Delay will not get
adjusted, and will show the symptoms that Mariano is seeing.
Also, I'm not sure how the Heap that holds the suspendedDelays will
react to being modified in the middle of an enumeration. That might open
a larger window for the problem.
Regards,
-Martin