Re: [Valgrind-users] helgrind markup for lock-ordering

Philippe Waroquiers Tue, 06 Nov 2012 13:58:26 -0800

On Tue, 2012-11-06 at 13:43 +0100, David Faure wrote:
> On Monday 05 November 2012 23:19:42 Philippe Waroquiers wrote:
> > On Mon, 2012-11-05 at 18:59 +0100, David Faure wrote:
> > > The testcase http://www.davidfaure.fr/2012/qmutex_trylock.cpp
> > > (from https://bugs.kde.org/show_bug.cgi?id=243232)
> > > shows that an optimization inside Qt leads to a helgrind warning about
> > > wrong lock ordering, making the use of that feature impossible for
> > > detecting actual problems elsewhere (i.e. I have to use
> > > --track-lockorders=no all the time).
> > > 
> > > Technically if we ignore the distinction between lock and tryLock,
> > > helgrind is right, we did lock the mutexes in reverse order. But because
> > > it's a tryLock, it couldn't possibly deadlock.
> > > 
> > > Should helgrind simply ignore all pthread_mutex_trylock calls, for the
> > > lockorder detection, even if they succeed? I think so, actually (by
> > > definition they couldn't deadlock, which is what track-lockorders is all
> > > about).
> > A deadlock can appear if there is a mixture of trylock and lock on
> > the same lock. So, trylock cannot just be ignored.
> > E.g.
> >  Thread 1:    trylock mutex1
> >               lock mutex2
> > 
> >  Thread 2:    lock mutex2
> >               lock mutex1
> > might deadlock.
> 
> True.
> This means that only "trylock in second place" should be ignored. More on 
> this 
> below.
More generally, I guess that you mean "trylock in last place" should
be ignored (rather than the special case of 2nd place).
This might be difficult to implement as each time a lock is taken,
helgrind checks for order violation. I suspect a later lock operation
might then transform a "trylock in last place" to a "trylock which is
now not anymore in last place".
But of course, when the trylock operation has just been done,
this trylock is "last place" and so if we would ignore it, then
this would be similar to always ignore the trylock, which is not ok.
Currently, helgrind maintains a graph of lock order.
I suspect we might need different graph node types and/or edge types
to cope with trylock. For sure, more investigations needed looking
in depth at the current algorithm.


> 
> > Even without mixture, isn't the below slightly bizarre/dangerous ?
> >   Thread 1:  trylock mutex1
> >              trylock mutex2
> > 
> >   Thread 2:  trylock mutex2
> >              trylock mutex1
> 
> No deadlock can ever happen here.
Yes, no deadlock can occur. However, this is really a really doubful
construction. The question is: should helgrind report a "lock order"
warning for such constructs ?

> 
> > If the 2nd trylock fails, what is the "plan B" ?
> 
> If the program then accesses shared data, a race condition will happen and 
> will be detected by helgrind anyway. So ignoring the ordering of these 
> trylocks is ok, I would think. Of course helgrind must record "we got the 
> lock", for the race condition detection feature, but it shouldn't warn about 
> the wrong order of the locking, since it can't possibly deadlock.
The idea of helgrind is that it detects lock order problems and/or
race condition problems *even* if no deadlock happens and/or if no
race condition really happened.
Maybe it is very unlikely that the trylock fails. Still would be nice
to discover the bug. And if the trylock does not fail, then the race
condition will then not be detected by helgrind.

> Not that the above would be good programming practice, of course, but 
> helgrind 
> can't say anything about it if all the locks were acquired. It will warn in 
> another run, where some trylock fails, and a race ensues.
> 
> > It seems that a task must unlock all locks and restart
> > from "scratch" in the above case.
> 
> Yes this is exactly that QOrderedMutexLocker does.
> 
>    Thread 1:  lock mutex1
>               lock mutex2
>  
>    Thread 2:  lock mutex2
>               trylock mutex1
>               if that failed,
>                     unlock mutex2
>                     lock mutex1
>                     lock mutex2
QOrderedMutexLocker really retries locking in an other order 
than the operations done by Thread 2 ?
Or is that a typo ?

> 
> If the trylock fails (because thread1 was first), then it unlocks and 
> restarts 
> from scratch. I can't see a deadlock risk with that, so ideally helgrind 
> shouldn't warn.
> 
> > I guess we might need an option such as:
> >    --trylock-logic={normal-lock|local-retry|full-retry}
> > normal-lock = current behaviour
> > local-retry means the task would re-trylock
> > full-retry means that the plan B is to unlock all locks
> > and retry everything.
> 
> I don't see how this can be a global option. Some piece of code (like 
> QOrderedMutexLocker) might have the "full retry" logic above, but other 
> pieces 
> of code might do something different - e.g. something wrong. It doesn't make 
> sense to me to tell helgrind "this is what all the code paths are going to do 
> about tryLock", that's impossible to predict in a complex program.
For sure, generally, an application can do a big variety of behaviours.
I suspect however that an application might (this is sane at least) try
to obey a certain "design pattern" such as: 
   in all cases, if a trylock fails, unlock all and retry all.
The command line option will then cover such cases without annotations.

> 
> Let me present another case:
> 
>    Thread 1:  lock mutex1
>               lock mutex2
>  
>    Thread 2:  lock mutex2
>               trylock mutex1
>               if that fails, unlock mutex2 and give up
> 
> This could happen for a non-critical operation that can be canceled if it 
> can't be done immediately. Again, no deadlock possible, so helgrind shouldn't 
> warn about a successful trylock being "out of order". And yet this isn't a 
> "full retry", so I don't think --trylock-logic=full-retry is the solution.
> 
> Deadlock can only happen if both threads use normal locking as their
> second operation. A trylock as the second operation doesn't deadlock.
Replace second by last. But how do we know that an operation is the last
one ? I think this is a non trivial change in the current algorithm.

Pḧilippe



------------------------------------------------------------------------------
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Re: [Valgrind-users] helgrind markup for lock-ordering

Reply via email to