Hi Marc,
On Jul 26, 2010, at 8:31 AM, Marc Guenther wrote:
> On 23.07.2010, at 16:55, Chuck Hill wrote:
> On Jul 23, 2010, at 5:08 AM, Marc Guenther wrote:
>>
>>> Hi,
>>>
>>> We recently had some deadlock problems involving EC and OSC locking, and I
>>> would like to validate my thoughts with the list.
>>>
>>> As a very basic rule with everything multithreaded, whenever two threads
>>> want access to the same resources, they have to lock them in a predefined
>>> identical order. Otherwise you have that classic deadlock possibility,
>>> where T1 has A and wants B, and T2 has B and wants A.
>>>
>>> This, for EOF, means the following:
>>>
>>> Whenever you hold an OSC lock, you are NOT allowed to lock any EC anymore!
>>
>> I think of it as taking escalating locks: EC -> DB Context -> OSC
>
> To ensure this (at least the EC -> OSC stuff), I added this assertion to our
> EC subclass:
>
> @Override
> public void lock() {
> // make sure the OSC is not locked, or we ourselves are already locked
> assert !isOSCLocked() || this._lock.isHeldByCurrentThread();
> super.lock();
> }
>
> /** if the OSC is locked and we are not */
> private boolean isOSCLocked() {
> EOObjectStore rootOS = rootObjectStore();
> if (rootOS instanceof EOObjectStoreCoordinator) {
> ReentrantLock lck = ((EOObjectStoreCoordinator) rootOS)._lock();
> return lck != null && lck.isHeldByCurrentThread();
> }
> return false;
> }
>
> This immediately catched our problem, and so far neither found nor caused any
> others. Don't know yet if it will cause any trouble.
>
>>> Why not? Cause the normal order is the other way round. First you lock the
>>> EC, then, when doing something with it, you lock the OSC.
>>>
>>> For example, our specific deadlock occurred, cause we had a Thread1:
>>> - lock the OSC
>>> - in a loop, do some operations involving locking of lots of ECs
>>> (we have some stuff in there, which posts notifications to all ECs)
>>> - Unlock OSC
>>
>> That seems like a long time to have the OSC locked. Why are you locking the
>> OSC?
>
> Actually, we were locking the DBCntxt. We are reloading some EOModels, and it
> seemed to be necessary to invalidate objects:
>
> dbc.invalidateObjectsWithGlobalIDs(dbc.database().snapshots().allKeys());
>
> Unfortunately, the actual removeModel() and addModel() calls were also in
> that try block. We moved them further down, and now the deadlock seems to be
> gone.I'd want to hold the OSC lock while modifying a model group (reloading models). >>> Now, while T1 was busy in its loop, some T2 was fetching a single EO: >>> - lock the EC >>> - fetch EO causing OSC to lock >>> -> which waits for T2 to finish >>> >>> And, as soon as the T1 loop reaches the locked EC of T2 >>> >>> -> Deadlock >>> >>> >>> Now, the questions I have: >>> - Is this analysis correct? >> >> Yes. >> >> >>> - What do we do about it? >> >> Lock in the correct order. Don't lock and hold locked the OSC. Use >> different EOF stacks (hence different OSC). Don't touch the same EC in more >> than one thread concurrently. > > At least the last one isn't really in our control, is it? All ECs are > automatically touched outside of their threads whenever this ObjectsChanged > notification is fired around. Only if they are not locked. It is a conditional operation: lock if possible, or queue if locked. That is probably the only safe operation across threads. > I remember a very hard to track down deadlock several years ago (WO 5.0), > that was caused by this notification. > >>> If I understand all this correctly, my above rule means: >>> >>> _Whenever you have the need to explicitely lock an OSC, you can't do any >>> operation, which would cause an formerly unlocked (by you) EC to be locked._ >>> >>> Which might be more than you think. IIRC, you are not allowed to make any >>> changes to any EO. Because that causes EOObjectsChangedInStoreNotifications >>> to be fired around to all ECs. >> >> That happens when saveChanges() is called. saveChanges() locks the OSC. > > Ah. And while it has the OSC lock, it locks all ECs in turn? Isn't that > exactly the case I described above? Why does this ever work? Or does it lock > the ECs after it released the OSC again? > > I guess it uses some kind of magic. It definitely didn't trigger my assertion > above. > >>> Which in turn, causes all these ECs to be locked. Which violates the rule. >> >> If they are already locked, it will only queue the notification, not wait >> for a lock. > > Which explains the magic :) This stuff is definitely not for the faint > hearted. No. :-) This part of EOF is very complex. It is like a finely engineered machine. >>> Now, if you really need to do this (you NEED to lock an OSC, and you NEED >>> to use ECs while having it locked), you could think, Oh it's OK, I just >>> lock all of them first. But then again, in what order do you lock the ECs? >>> Because if you do it in random order, you again have the possibility of >>> deadlock. >> >> That sounds like you may be sharing ECs between threads in a dangerous >> pattern. > > Yes. I guess it all boils down to: just don't do that :) Yes, you are very likely to cause problems doing that. Avoiding the pattern is the best choice. >>> So, what is everybody doing? >>> - I don't lock the OSC myself. Why would I want to? >> >>> - I do, but I am VERY careful about what I do with it being locked. >>> - I do, and so far I have been lucky. >> >> Never any problems, provided you lock it at the appropriate time. > > OK. Thanks for the help. It seems to be working fine now. > > Now I'm back to investigate this missing snapshot issue that's still haunting > us. That is a fun one. invalidateObjectsWithGlobalIDs can cause that, especially if the OSC is not locked. Chuck -- Chuck Hill Senior Consultant / VP Development Practical WebObjects - for developers who want to increase their overall knowledge of WebObjects or who are trying to solve specific problems. http://www.global-village.net/products/practical_webobjects
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list ([email protected]) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to [email protected]
