On Fri, May 22, 2020 at 12:07:36 -0400, Robert Foley wrote:
> This patch series continues the work done by Emilio Cota and others to add
> Thread Sanitizer (TSan) support to QEMU.
> 
> The starting point for this work was Emilio's branch here:
> https://github.com/cota/qemu/commits/tsan
> specifically this commit: 0be125fc0afd47218b34d2019abdd19b644f3199
> 
> The purpose of this patch is not to fix all the TSan warnings, but to enable
> the TSan support so that QEMU developers can start using the tool.  
> We found this tool useful and even ran it on our recent changes in
> the cpu-locks series.
> Clearly there is work to do here to clean up all the warnings. :)  
> We have made a start to cleaning up these warnings by getting a VM to boot 
> cleanly with no TSan warnings.  
> We have also made an effort to introduce enough of the TSan suppression
> mechanisms, so that others can continue this work.

Thank you for doing this work! Great to see this finally coming along.

What are your plans wrt the per-cpu-lock branch? Given that some of
the races reported here are fixed there, I think it would make sense to
defer those patches to the per-cpu-lock series (i.e. patches 2/19, parts
of 13/19, and 18/19) so that this series goes in first (it is a lot
less likely to break anything).

Also, I would not worry too much about rushing to bring warnings to
0; what's important is that with the warnings we now know where to
focus on, and then we can carefully fix races. In particular I think
all annotations we add must have a comment, since otherwise we are
at the risk of blindlessly silencing warnings of real races.

> Issues:
> - When running docker-test-quick under TSan there are several tests which hang
>   - The unit tests which seem to hang under TSan:
>     test-char, test-qdev-global-props, and test-qga.  
>   - If we comment out those tests, check-unit finishes, albeit with 
>     a couple of warnings. :)

I've noticed another issue (that I did not notice on my previous
machine), which is that tsan blows up when in qht we lock all
of the bucket's locks. We then get this assert from tsan, since
it has a static limit of 64 mutexes held at any given time:

FATAL: ThreadSanitizer CHECK failed: 
/build/llvm-toolchain-10-yegZYJ/llvm-toolchain-10-10.0.0/compiler-rt/lib/sanitizer_common/sanitizer_deadlock_detector.h:67
 "((n_all_locks_)) < 
(((sizeof(all_locks_with_contexts_)/sizeof((all_locks_with_contexts_)[0]))))" 
(0x40, 0x40)
    #0 __tsan::TsanCheckFailed(char const*, int, char const*, unsigned long 
long, unsigned long long) <null> (qemu-system-aarch64+0x49f9f5)
    #1 __sanitizer::CheckFailed(char const*, int, char const*, unsigned long 
long, unsigned long long) <null> (qemu-system-aarch64+0x4b651f)
    #2 __sanitizer::DeadlockDetectorTLS<__sanitizer::TwoLevelBitVector<1ul, 
__sanitizer::BasicBitVector<unsigned long> > >::addLock(unsigned long, unsigned 
long, unsigned int) <null> (qemu-system-aarch64+0x4aacbc)
    #3 __sanitizer::DD::MutexAfterLock(__sanitizer::DDCallback*, 
__sanitizer::DDMutex*, bool, bool) <null> (qemu-system-aarch64+0x4aa22e)
    #4 __tsan::MutexPostLock(__tsan::ThreadState*, unsigned long, unsigned 
long, unsigned int, int) <null> (qemu-system-aarch64+0x49ded8)
    #5 __tsan_mutex_post_lock <null> (qemu-system-aarch64+0x48175a)
    #6 qemu_spin_lock /home/cota/src/qemu/include/qemu/thread.h:245:5 
(qemu-system-aarch64+0xb1283b)
    #7 qht_map_lock_buckets /home/cota/src/qemu/util/qht.c:253:9 
(qemu-system-aarch64+0xb1283b)

A workaround for now is to run qemu with TSAN_OPTIONS=detect_deadlocks=0,
although selectively disabling tsan for qht_map_lock/unlock_buckets would
be ideal (not sure if it's possible).

Another warning I'm reliably seen is:
WARNING: ThreadSanitizer: double lock of a mutex (pid=489006)
    #0 pthread_mutex_lock <null> (qemu-system-aarch64+0x457596)
    #1 qemu_mutex_lock_impl /home/cota/src/qemu/util/qemu-thread-posix.c:79:11 
(qemu-system-aarch64+0xaf7e3c)

  Location is heap block of size 328 at 0x7b4800114900 allocated by main thread:
    #0 calloc <null> (qemu-system-aarch64+0x438b80)
    #1 g_malloc0 <null> (libglib-2.0.so.0+0x57d30)

  Mutex M57 (0x7b4800114960) created at:
    #0 pthread_mutex_init <null> (qemu-system-aarch64+0x43b74d)
    #1 qemu_rec_mutex_init /home/cota/src/qemu/util/qemu-thread-posix.c:119:11 
(qemu-system-aarch64+0xaf815b)

But this one seems safe to ignore.

Thanks,
                E.

Reply via email to