Re: [Pharo-dev] Problem with delay waiting (OSSubprocess code) on Pharo 5.0

Mariano Martinez Peck Thu, 19 May 2016 05:20:54 -0700

On Wed, May 18, 2016 at 9:49 PM, Martin McClure <[email protected]>
wrote:

> On 05/18/2016 03:17 PM, Martin McClure wrote:
>
>> On 05/18/2016 08:49 AM, Mariano Martinez Peck wrote:
>>
>>> Hi guys,
>>>
>>> I am seeing a problem in Pharo 5.0 regarding Delay >> wait. I cannot
>>> explain how this could happened but it does, and it happened to me a couple
>>> of times (but not fully reproducible).
>>>
>>>
>> Hmm. The schedulerResumptionTime is, somehow, being (approximately)
>> doubled. It's not clear how that can happen, but I'll look a little more.
>>
>>
> Mario, is there any chance that you might be saving the image during one
> of these Delays?
>
>
> This one smells like a race condition, and I think I see something that
> *might* explain it. But I don't have any more time to spend on this one, so
> I'll leave the rest to someone else. I hope this is helpful:
>
> The only way I immediately see for the schedulerResumptionTime to become
> approximately doubled is if the Delay's resumption time is adjusted by
> #restoreResumptionTimes without previously having been adjusted by
> #saveResumptionTimes.
>
> The only time either of those are sent, that I can see, is on saving the
> image. Both are normally sent, (save before the snapshot, restore
> afterwards), but there may be a hole there.
>
>
Martin, first off, thanks for the research!!!

Now....your email made me remember something:* I did get VM crash when
saving the image a couple of times. The VM crashed when saving the image. *If
I re-opened the image, it looks like if the image was indeed saved (so the
snapshot primitive itself did work), but I suspect not all shutdown code
could have been run correctly.

The VM crash looks like the FreeTypeFace >> pvtDestroyHandle  which, as far
as I know, it's a "known crash" (I attach crash dump). From what I can see,
if I follow all the stack, the crash starts from the WeakArray >> startUp: .
That means that...depending on the order of the startup list...the
Scheduler may not have been run after the crash.

Now.... WeakArray initialization does:

SessionManager default
registerSystemClassNamed: self name.
While...

Delay class >> startUp "Restart active delay, if any, when resuming a
snapshot." Scheduler startUp.

And the Delay registration is

SessionManager default
registerSystemClassNamed: self name
atPriority: 20.

So...that seems correct...

I can verify this by:

SessionManager default systemCategory prioritizedList

Anyway...not sure if this adds something, but just wanted to note this.

> #saveResumptionTimes is only sent (by this scheduler class) when the
> accessProtect semaphore is held, but #handleTimerEvent: is executed in the
> timing Process *without* the protection of accessProtect, in the case of
> the VM signaling the timingSemaphore. If the VM signals the
> timingSemaphore, #handleTimerEvent: could run in the middle of
> #saveResumptionTimes. If some Delay expires because of that timer event,
> our Delay could move from being the first suspended delay to being the
> active delay. If that happens after we've adjusted the active delay, but
> before we've processed the suspended delays, that Delay will not get
> adjusted, and will show the symptoms that Mariano is seeing.
>
> Also, I'm not sure how the Heap that holds the suspendedDelays will react
> to being modified in the middle of an enumeration. That might open a larger
> window for the problem.
>
> Regards,
>
> -Martin
>
>

-- 
Mariano
http://marianopeck.wordpress.com

crash.dmp
Description: Binary data

Re: [Pharo-dev] Problem with delay waiting (OSSubprocess code) on Pharo 5.0

Reply via email to