On Wed, 2017-05-31 at 18:26 +0000, William Good wrote:
> So it is actually two different locks that just happen to occupy the
> same address at different times?  Usually, helgrind indicates when
> each lock was first observed but there is no mention of a second lock.
To verify this hypothesis, you might run with -v -v -v.
Each time a lock is pthread_mutex_init-ed, you should see a line
such as:
client request: code 48470103,  addr 0x5400040,  len 0
the request corresponds to the enum client request defined
in helgrind.h : 0x103 =  256 + 3, which is
    _VG_USERREQ__HG_PTHREAD_MUTEX_INIT_POST

If you see such a line twice with the same addr, then that
indicates we had 2 initialisations of a mutex at the same
addr.
And the comment below makes me believe helgrind does
not handle that very cleanly.


>   No my reproducer is fairly large
That is not a surprise :).
If the problem is effectively linked to re-creation of
another mutex at the same addr, then i think a small
reproducer should be easy to write.

But let's first confirm you see 2 initialisations 


You might also try with --tool=drd, to see if drd confirms
the race condition.

Philippe

> 
> 
> 
> ______________________________________________________________________
> From: Philippe Waroquiers <philippe.waroqui...@skynet.be>
> Sent: Monday, May 29, 2017 5:20 PM
> To: William Good
> Cc: valgrind-users@lists.sourceforge.net
> Subject: Re: [Valgrind-users] Helgrind detects race with same lock 
>  
> You might have been unlucky and have a lock that was freed and then
> re-used.
> 
> See extract of mk_LockP_from_LockN comments:
>    So we check that each LockN is a member of the admin_locks double
>    linked list of all Lock structures.  That stops us prodding around
>    in potentially freed-up Lock structures.  However, it's not quite a
>    proper check: if a new Lock has been reallocated at the same
>    address as one which was previously freed, we'll wind up copying
>    the new one as the basis for the LockP, which is completely bogus
>    because it is unrelated to the previous Lock that lived there.
>    Let's hope that doesn't happen too often.
> 
> Do you have a small reproducer for the below ?
> Philippe
> 
> 
> On Mon, 2017-05-29 at 17:33 +0000, William Good wrote:
> > Hello,
> > 
> > I am trying to understand this helgrind output.  It says there is a
> > data-race on a read.  However both threads hold the same lock.  How
> > can this be a race when both threads hold the lock during the
> access?
> > 
> > 
> > ==31341==
> > ----------------------------------------------------------------
> > ==31341==
> > ==31341==  Lock at 0x5990828 was first observed
> > ==31341==    at 0x4C31A76: pthread_mutex_init (hg_intercepts.c:779)
> > ==31341==    by 0x4026AF: thread_pool_submit (threadpool.c:85)
> > ==31341==    by 0x402012: qsort_internal_parallel (quicksort.c:142)
> > ==31341==    by 0x402040: qsort_internal_parallel (quicksort.c:151)
> > ==31341==    by 0x402040: qsort_internal_parallel (quicksort.c:151)
> > ==31341==    by 0x402450: thread_work (threadpool.c:233)
> > ==31341==    by 0x4C3083E: mythread_wrapper (hg_intercepts.c:389)
> > ==31341==    by 0x4E42DC4: start_thread
> > (in /usr/lib64/libpthread-2.17.so)
> > ==31341==    by 0x5355CEC: clone (in /usr/lib64/libc-2.17.so)
> > ==31341==  Address 0x5990828 is 40 bytes inside a block of size 152
> > alloc'd
> > ==31341==    at 0x4C2CD95: calloc (vg_replace_malloc.c:711)
> > ==31341==    by 0x4026A1: thread_pool_submit (threadpool.c:84)
> > ==31341==    by 0x402012: qsort_internal_parallel (quicksort.c:142)
> > ==31341==    by 0x402040: qsort_internal_parallel (quicksort.c:151)
> > ==31341==    by 0x402040: qsort_internal_parallel (quicksort.c:151)
> > ==31341==    by 0x40279F: future_get (threadpool.c:112)
> > ==31341==    by 0x402048: qsort_internal_parallel (quicksort.c:152)
> > ==31341==    by 0x402040: qsort_internal_parallel (quicksort.c:151)
> > ==31341==    by 0x402450: thread_work (threadpool.c:233)
> > ==31341==    by 0x4C3083E: mythread_wrapper (hg_intercepts.c:389)
> > ==31341==    by 0x4E42DC4: start_thread
> > (in /usr/lib64/libpthread-2.17.so)
> > ==31341==    by 0x5355CEC: clone (in /usr/lib64/libc-2.17.so)
> > ==31341==  Block was alloc'd by thread #3
> > ==31341==
> > ==31341== Possible data race during read of size 4 at 0x5990880 by
> > thread #2
> > ==31341== Locks held: 1, at address 0x5990828
> > ==31341==    at 0x4023A9: thread_work (threadpool.c:229)
> > ==31341==    by 0x4C3083E: mythread_wrapper (hg_intercepts.c:389)
> > ==31341==    by 0x4E42DC4: start_thread
> > (in /usr/lib64/libpthread-2.17.so)
> > ==31341==    by 0x5355CEC: clone (in /usr/lib64/libc-2.17.so)
> > ==31341==
> > ==31341== This conflicts with a previous write of size 4 by thread
> #3
> > ==31341== Locks held: 1, at address 0x5990828
> > ==31341==    at 0x4027B3: future_get (threadpool.c:114)
> > ==31341==    by 0x402048: qsort_internal_parallel (quicksort.c:152)
> > ==31341==    by 0x402040: qsort_internal_parallel (quicksort.c:151)
> > ==31341==    by 0x40279F: future_get (threadpool.c:112)
> > ==31341==    by 0x402048: qsort_internal_parallel (quicksort.c:152)
> > ==31341==    by 0x40279F: future_get (threadpool.c:112)
> > ==31341==    by 0x402048: qsort_internal_parallel (quicksort.c:152)
> > ==31341==    by 0x402040: qsort_internal_parallel (quicksort.c:151)
> > ==31341==  Address 0x5990880 is 128 bytes inside a block of size 152
> > alloc'd
> > ==31341==    at 0x4C2CD95: calloc (vg_replace_malloc.c:711)
> > ==31341==    by 0x4026A1: thread_pool_submit (threadpool.c:84)
> > ==31341==    by 0x402012: qsort_internal_parallel (quicksort.c:142)
> > ==31341==    by 0x402040: qsort_internal_parallel (quicksort.c:151)
> > ==31341==    by 0x402040: qsort_internal_parallel (quicksort.c:151)
> > ==31341==    by 0x40279F: future_get (threadpool.c:112)
> > ==31341==    by 0x402048: qsort_internal_parallel (quicksort.c:152)
> > ==31341==    by 0x402040: qsort_internal_parallel (quicksort.c:151)
> > ==31341==    by 0x402450: thread_work (threadpool.c:233)
> > ==31341==    by 0x4C3083E: mythread_wrapper (hg_intercepts.c:389)
> > ==31341==    by 0x4E42DC4: start_thread
> > (in /usr/lib64/libpthread-2.17.so)
> > ==31341==    by 0x5355CEC: clone (in /usr/lib64/libc-2.17.so)
> > ==31341==  Block was alloc'd by thread #3
> > ==31341==
> > ==31341==
> > ----------------------------------------------------------------
> > 
> >
> ------------------------------------------------------------------------------
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> > _______________________________________________ Valgrind-users
> mailing list Valgrind-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/valgrind-users 
> 
> Valgrind-users Info Page - SourceForge
> lists.sourceforge.net
> To see the collection of prior postings to the list, visit the
> Valgrind-users Archives. Using Valgrind-users: To post a message to
> all the list ...
> 
> 
> 
> 
> 



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Reply via email to