Ben, for the record, I am using DelayMillisecondScheduler for a day and a
half and so far no problem.


On Thu, May 19, 2016 at 9:19 AM, Mariano Martinez Peck <
[email protected]> wrote:

>
>
> On Wed, May 18, 2016 at 9:49 PM, Martin McClure <[email protected]>
> wrote:
>
>> On 05/18/2016 03:17 PM, Martin McClure wrote:
>>
>>> On 05/18/2016 08:49 AM, Mariano Martinez Peck wrote:
>>>
>>>> Hi guys,
>>>>
>>>> I am seeing a problem in Pharo 5.0 regarding Delay >> wait. I cannot
>>>> explain how this could happened but it does, and it happened to me a couple
>>>> of times (but not fully reproducible).
>>>>
>>>>
>>> Hmm. The schedulerResumptionTime is, somehow, being (approximately)
>>> doubled. It's not clear how that can happen, but I'll look a little more.
>>>
>>>
>> Mario, is there any chance that you might be saving the image during one
>> of these Delays?
>>
>>
>> This one smells like a race condition, and I think I see something that
>> *might* explain it. But I don't have any more time to spend on this one, so
>> I'll leave the rest to someone else. I hope this is helpful:
>>
>> The only way I immediately see for the schedulerResumptionTime to become
>> approximately doubled is if the Delay's resumption time is adjusted by
>> #restoreResumptionTimes without previously having been adjusted by
>> #saveResumptionTimes.
>>
>> The only time either of those are sent, that I can see, is on saving the
>> image. Both are normally sent, (save before the snapshot, restore
>> afterwards), but there may be a hole there.
>>
>>
> Martin, first off, thanks for the research!!!
>
> Now....your email made me remember something:* I did get VM crash when
> saving the image a couple of times. The VM crashed when saving the image. *If
> I re-opened the image, it looks like if the image was indeed saved (so the
> snapshot primitive itself did work), but I suspect not all shutdown code
> could have been run correctly.
>
> The VM crash looks like the FreeTypeFace >> pvtDestroyHandle  which, as
> far as I know, it's a "known crash" (I attach crash dump). From what I can
> see, if I follow all the stack, the crash starts from the WeakArray
> >> startUp: .
> That means that...depending on the order of the startup list...the
> Scheduler may not have been run after the crash.
>
> Now.... WeakArray initialization does:
>
> SessionManager default
> registerSystemClassNamed: self name.
> While...
>
> Delay class >> startUp "Restart active delay, if any, when resuming a
> snapshot." Scheduler startUp.
>
> And the Delay registration is
>
> SessionManager default
> registerSystemClassNamed: self name
> atPriority: 20.
>
> So...that seems correct...
>
> I can verify this by:
>
> SessionManager default systemCategory prioritizedList
>
> Anyway...not sure if this adds something, but just wanted to note this.
>
>
>
>> #saveResumptionTimes is only sent (by this scheduler class) when the
>> accessProtect semaphore is held, but #handleTimerEvent: is executed in the
>> timing Process *without* the protection of accessProtect, in the case of
>> the VM signaling the timingSemaphore. If the VM signals the
>> timingSemaphore, #handleTimerEvent: could run in the middle of
>> #saveResumptionTimes. If some Delay expires because of that timer event,
>> our Delay could move from being the first suspended delay to being the
>> active delay. If that happens after we've adjusted the active delay, but
>> before we've processed the suspended delays, that Delay will not get
>> adjusted, and will show the symptoms that Mariano is seeing.
>>
>> Also, I'm not sure how the Heap that holds the suspendedDelays will react
>> to being modified in the middle of an enumeration. That might open a larger
>> window for the problem.
>>
>> Regards,
>>
>> -Martin
>>
>>
>
>
> --
> Mariano
> http://marianopeck.wordpress.com
>



-- 
Mariano
http://marianopeck.wordpress.com

Reply via email to