Re: [Valgrind-users] Atos can find line of an address, but valgrind doesn't print it.
On 01/21/2013 01:38 PM, Nick Overdijk wrote: Ah of course. I didn't do that, I ran it myself manually on stage1. such as: dsymutil stage1 valgrind stage1 That's because the call to valgrind is a bit hidden in some scripts that are also used on linux. Should my way work? I think it should; not sure why it doesn't. But anyway, does -dsymutil=yes help? J -- Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. ON SALE this month only -- learn more at: http://p.sf.net/sfu/learnnow-d2d ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] Qt5 support in helgrind
==9070== Lock at 0xD81F6F8 was first observed ==9070==at 0x4C3077F: QMutex::QMutex(QMutex::RecursionMode) (hg_intercepts.c:2186) ==9070==by 0x4C307A4: QMutex::QMutex(QMutex::RecursionMode) (hg_intercepts.c:2192) ==9070==by 0x585A9CE: QPostEventList::QPostEventList() (qthread_p.h:110) [...] Should I just duplicate the code of the wrappers instead? Or is there a more clever solution I'm missing? One alternative approach -- which doesn't really solve the problem, but which you might want to look at -- is to put the real code in a worker function and call that from all entry points. For example, see how sem_wait_WRK is used in hg_intercepts.c. That doesn't get rid of the second stack frame, but it does at least avoid the impression that there's some kind of strange recursion going on. Personally I quite like this scheme, in that it doesn't duplicate code. If you turned the __attribute__((noinline)) into __attribute__((always_inline)) then I think you'd get rid of the extra frame without having to duplicate the code by hand. What I don't know is whether some implementations might should differ from the Qt4 implementation... Right. That's the real problem. I don't think there's an easy way to figure this out, short of comparing the Qt4 and Qt5 implementations of the relevant functions. Once you have a patch you're satisfied with, I'd be happy to commit it. J -- Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. ON SALE this month only -- learn more at: http://p.sf.net/sfu/learnnow-d2d ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] Qt5 support in helgrind
On Wednesday 23 January 2013 12:24:30 Julian Seward wrote: ==9070== Lock at 0xD81F6F8 was first observed ==9070==at 0x4C3077F: QMutex::QMutex(QMutex::RecursionMode) (hg_intercepts.c:2186) ==9070==by 0x4C307A4: QMutex::QMutex(QMutex::RecursionMode) (hg_intercepts.c:2192) ==9070== by 0x585A9CE: QPostEventList::QPostEventList() (qthread_p.h:110) [...] Should I just duplicate the code of the wrappers instead? Or is there a more clever solution I'm missing? One alternative approach -- which doesn't really solve the problem, but which you might want to look at -- is to put the real code in a worker function and call that from all entry points. For example, see how sem_wait_WRK is used in hg_intercepts.c. That doesn't get rid of the second stack frame, but it does at least avoid the impression that there's some kind of strange recursion going on. Personally I quite like this scheme, in that it doesn't duplicate code. OK. If you turned the __attribute__((noinline)) into __attribute__((always_inline)) then I think you'd get rid of the extra frame without having to duplicate the code by hand. I tried, and it works, but it makes gcc warn: hg_intercepts.c:2048:13: warning: always_inline function might not be inlinable [-Wattributes] What I don't know is whether some implementations might should differ from the Qt4 implementation... Right. That's the real problem. I don't think there's an easy way to figure this out, short of comparing the Qt4 and Qt5 implementations of the relevant functions. Yeah. But in fact, the implementation of QMutex itself having changed, doesn't matter for helgrind's intercepts. The API is the same, so the expected behavior is the same, so helgrind can react the same. So I think we're fine for now. Once you have a patch you're satisfied with, I'd be happy to commit it. Please find the patch attached. -- David Faure, fa...@kde.org, http://www.davidfaure.fr Working on KDE, in particular KDE Frameworks 5 Index: hg_intercepts.c === --- hg_intercepts.c (revision 13107) +++ hg_intercepts.c (working copy) @@ -2033,9 +2033,15 @@ ret_ty I_WRAP_SONAME_FNNAME_ZU(libQtCoreZdsoZa,f)(args); \ ret_ty I_WRAP_SONAME_FNNAME_ZU(libQtCoreZdsoZa,f)(args) +// soname is libQt5Core.so.4 ; match against libQt5Core.so* +#define QT5_FUNC(ret_ty, f, args...) \ + ret_ty I_WRAP_SONAME_FNNAME_ZU(libQt5CoreZdsoZa,f)(args); \ + ret_ty I_WRAP_SONAME_FNNAME_ZU(libQt5CoreZdsoZa,f)(args) + //--- // QMutex::lock() -QT4_FUNC(void, _ZN6QMutex4lockEv, void* self) +//__attribute__((always_inline)) // works, but makes gcc warn... +static void QMutex_lock_WRK(void* self) { OrigFn fn; VALGRIND_GET_ORIG_FN(fn); @@ -2056,9 +2062,16 @@ } } +QT4_FUNC(void, _ZN6QMutex4lockEv, void* self) { +QMutex_lock_WRK(self); +} +QT5_FUNC(void, _ZN6QMutex4lockEv, void* self) { +QMutex_lock_WRK(self); +} + //--- // QMutex::unlock() -QT4_FUNC(void, _ZN6QMutex6unlockEv, void* self) +static void QMutex_unlock_WRK(void* self) { OrigFn fn; VALGRIND_GET_ORIG_FN(fn); @@ -2080,10 +2093,17 @@ } } +QT4_FUNC(void, _ZN6QMutex6unlockEv, void* self) { +QMutex_unlock_WRK(self); +} +QT5_FUNC(void, _ZN6QMutex6unlockEv, void* self) { +QMutex_unlock_WRK(self); +} + //--- // bool QMutex::tryLock() // using 'long' to mimic C++ 'bool' -QT4_FUNC(long, _ZN6QMutex7tryLockEv, void* self) +static long QMutex_tryLock_WRK(void* self) { OrigFn fn; long ret; @@ -2110,10 +2130,17 @@ return ret; } +QT4_FUNC(long, _ZN6QMutex7tryLockEv, void* self) { +return QMutex_tryLock_WRK(self); +} +QT5_FUNC(long, _ZN6QMutex7tryLockEv, void* self) { +return QMutex_tryLock_WRK(self); +} + //--- // bool QMutex::tryLock(int) // using 'long' to mimic C++ 'bool' -QT4_FUNC(long, _ZN6QMutex7tryLockEi, void* self, long arg2) +static long QMutex_tryLock_int_WRK(void* self, long arg2) { OrigFn fn; long ret; @@ -2141,6 +2168,12 @@ return ret; } +QT4_FUNC(long, _ZN6QMutex7tryLockEi, void* self, long arg2) { +return QMutex_tryLock_int_WRK(self, arg2); +} +QT5_FUNC(long, _ZN6QMutex7tryLockEi, void* self, long arg2) { +return QMutex_tryLock_int_WRK(self, arg2); +} //--- // It's not really very clear what the args are here. But from @@ -2151,9 +2184,7 @@ // is that of the mutex and the second is either zero or one, // probably being the recursion mode, therefore. // QMutex::QMutex(QMutex::RecursionMode) (C1ENS variant) -QT4_FUNC(void*, _ZN6QMutexC1ENS_13RecursionModeE, - void* mutex, - long recmode) +static void* QMutex_constructor_WRK(void* mutex, long recmode) {
Re: [Valgrind-users] helgrind and atomic operations
On Monday 27 August 2012 15:25:14 Marc Mutz wrote: If atomic loads and stores on x86 are implemented with a volatile cast, then the compiler can't reorder stuff around them, either. Not more than with a std::atomic, at least. QAtomic does that. For load-relaxed, Thiago thinks that a normal read (non-volatile) is correct and I couldn't prove him wrong. I was talking to Julian about this again today, and he pointed me to this writeup: http://software.intel.com/en-us/blogs/2007/11/30/volatile-almost-useless-for-multi-threaded-programming We're looking at how to silence valgrind about Qt atomic ops, but before that, it would actually be good to be sure that what Qt does it correct, on x86 Where does the claim about volatile cast means the compiler can't reorder stuff around them come from? In the Qt source code I see qatomic_gcc.h (which is unused, unfortunately) calling __sync_synchronize() in loadAcquire(). Shouldn't Qt's code call that, when compiled with gcc? This does lead to different assembly, too, on x86. So the claim that x86 doesn't need memory barriers seems wrong? --- testcase_atomic_ops_helgrind.s.orig 2013-01-23 15:04:20.889417624 +0100 +++ testcase_atomic_ops_helgrind.s 2013-01-23 15:07:06.938422071 +0100 @@ -380,6 +380,7 @@ _ZN10QAtomicOpsIiE11loadAcquireERKi: movq-24(%rbp), %rax movl(%rax), %eax movl%eax, -4(%rbp) + mfence movl-4(%rbp), %eax popq%rbp .cfi_def_cfa 7, 8 @@ -403,6 +404,7 @@ _ZN10QAtomicOpsIiE12storeReleaseERii: movq-8(%rbp), %rax movl-12(%rbp), %edx movl%edx, (%rax) + mfence popq%rbp .cfi_def_cfa 7, 8 ret -- David Faure, fa...@kde.org, http://www.davidfaure.fr Working on KDE, in particular KDE Frameworks 5 -- Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. ON SALE this month only -- learn more at: http://p.sf.net/sfu/learnnow-d2d ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] helgrind and atomic operations
On 01/23/2013 03:08 PM, David Faure wrote: I was talking to Julian about this again today, and he pointed me to this writeup: http://software.intel.com/en-us/blogs/2007/11/30/volatile-almost-useless-for-multi-threaded-programming We're looking at how to silence valgrind about Qt atomic ops, but before that, it would actually be good to be sure that what Qt does it correct, on x86 Where does the claim about volatile cast means the compiler can't reorder stuff around them come from? I think that claim is incorrect. It may be that the compiler is not allowed to reorder volatile stores, but for sure it can reorder volatile stores w.r.t. non-volatile stores, which makes the volatile stores completely useless from a multithreading perspective. That writeup contains a nice example: volatile int Ready; int Message[100]; void foo( int i ) { Message[i/10] = 42; Ready = 1; } Imagine that 'message' is some chunk of info that we intend to give to another thread, and 'Ready' is a flag that we intend to use to tell the other thread when the message is ready to be read. gcc-4.7.2 -m32 -O2 produces this assembly (omitting the CFI junk): foo: movl4(%esp), %ecx movl$1717986919, %edx movl$1, Ready -- Ready = 1; movl%ecx, %eax imull %edx sarl$31, %ecx sarl$2, %edx subl%ecx, %edx movl$42, Message(,%edx,4) - Message[i / 10] = 42; ret so the volatile clearly didn't have the claimed effect. Bottom line is, as I understand it, that C++11 allows implementations (compiler + hardware) to assume code is race free, and optimise accordingly. On x86, we have the added guarantee that the hardware will deliver stores in order, but the compiler is still free to reorder ordinary stores. J -- Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. ON SALE this month only -- learn more at: http://p.sf.net/sfu/learnnow-d2d ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] helgrind and atomic operations
On Jan 23, 2013, at 8:08 AM CST, David Faure wrote: On Monday 27 August 2012 15:25:14 Marc Mutz wrote: If atomic loads and stores on x86 are implemented with a volatile cast, then the compiler can't reorder stuff around them, either. Not more than with a std::atomic, at least. QAtomic does that. For load-relaxed, Thiago thinks that a normal read (non-volatile) is correct and I couldn't prove him wrong. […] Where does the claim about volatile cast means the compiler can't reorder stuff around them come from? Julian already addressed this in a separate post, and I agree with his assessment re: access to volatile-qualified variables. In the Qt source code I see qatomic_gcc.h (which is unused, unfortunately) calling __sync_synchronize() in loadAcquire(). Shouldn't Qt's code call that, when compiled with gcc? If you don't want to write inline assembly, this might be your best bet. But on TSO systems like x86, you only need a compiler barrier. In x86 inline assembly syntax, this looks like: __asm__ __volatile__ ( ::: memory ) This prevents GCC (and compilers that correctly support this syntax) from reordering accesses across this statement or making assumptions about the state of memory across this point. The only case you need to worry about on x86 (assuming you are not fiddling with special memory) is that earlier stores could be reordered after subsequent loads. That should be the only time you need the real mfence instructions that you have below. load-acquire and store-release don't fall into that category. This does lead to different assembly, too, on x86. So the claim that x86 doesn't need memory barriers seems wrong? --- testcase_atomic_ops_helgrind.s.orig 2013-01-23 15:04:20.889417624 +0100 +++ testcase_atomic_ops_helgrind.s 2013-01-23 15:07:06.938422071 +0100 @@ -380,6 +380,7 @@ _ZN10QAtomicOpsIiE11loadAcquireERKi: movq-24(%rbp), %rax movl(%rax), %eax movl%eax, -4(%rbp) + mfence movl-4(%rbp), %eax popq%rbp .cfi_def_cfa 7, 8 @@ -403,6 +404,7 @@ _ZN10QAtomicOpsIiE12storeReleaseERii: movq-8(%rbp), %rax movl-12(%rbp), %edx movl%edx, (%rax) + mfence popq%rbp .cfi_def_cfa 7, 8 ret -Dave -- Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. ON SALE this month only -- learn more at: http://p.sf.net/sfu/learnnow-d2d ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] Atos can find line of an address, but valgrind doesn't print it.
On Jan 20, 2013, at 3:50 PM CST, Nick Overdijk wrote: I have a nice stacktrace from some memory error in valgrind, and it fails to print the source file + line number somewhere. Here's the trace: ==63113== Conditional jump or move depends on uninitialised value(s) […] ==63113== Uninitialised value was created by a stack allocation ==63113==at 0x10002D7D4: run_image(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar , std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar , std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar , std::__1::basic_ostreamchar, std::__1::char_traitschar ) (in /Users/nick/Documents/Code/diag/REPO/BIN/stage1) […] This is on my Mac, running OSX 10.7, running valgrind 3.8.1 I was wondering why atos can give me source + line number and valgrind can't. How exactly are you building the source file that contains this particular line? I've encountered some weirdness if I do a direct .c--a.out build without first creating .o files. That is: 8 gcc foo.c dsymutil a.out 8 will NOT show file/line info in valgrind, but: 8 gcc -c foo.c gcc foo.o dsymutil a.out 8 will. This sometimes happens even if gdb can show me the correct debug info. I never thought too much of it and assumed it was Darwin's fault, since the .dSYM system is such a bad idea for other reasons. I suppose that it could be a Valgrind issue though... -Dave -- Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. ON SALE this month only -- learn more at: http://p.sf.net/sfu/learnnow-d2d ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] helgrind and atomic operations
On 01/23/2013 04:22 PM, Dave Goodell wrote: If you don't want to write inline assembly, this might be your best bet. But on TSO systems like x86, you only need a compiler barrier. In x86 inline assembly syntax, this looks like: __asm__ __volatile__ ( ::: memory ) This prevents GCC (and compilers that correctly support this syntax) from reordering accesses across this statement or making assumptions about the state of memory across this point. Yeah, you're right. Thanks for bringing clarity to my (concurrently-) fried brain. To make a stab at an executive summary, then: * in intentionally racey code, we need to worry about both hardware and compiler reordering stores. That requires, at a minimum, inserting suitable load and store fences. * on x86/x86_64, we only need to worry about the compiler reordering stores, since the hardware promises not to. That means all we need is a compiler barrier, as you show -- probably on both sides of both the load and store. * regardless of the above 2 points, since C++11 allows compilers to assume code is race free, we can't rule out the possibility that the compiler does some other legal transformation which makes the code not work as the authors intended. Because of this last point, overall I'm still with Pat LoPresti, who summarised the situation admirably thus: There is no such thing as a benign data race. Ever. (v-users, 4 Dec 2012) IMO, the idea that some races are harmless is a dangerous myth that we should try to stamp out. J -- Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. ON SALE this month only -- learn more at: http://p.sf.net/sfu/learnnow-d2d ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] helgrind and atomic operations
On Jan 23, 2013, at 9:55 AM CST, Thiago Macieira wrote: On quarta-feira, 23 de janeiro de 2013 09.22.49, Dave Goodell wrote: If you don't want to write inline assembly, this might be your best bet. But on TSO systems like x86, you only need a compiler barrier. In x86 inline assembly syntax, this looks like: __asm__ __volatile__ ( ::: memory ) This prevents GCC (and compilers that correctly support this syntax) from reordering accesses across this statement or making assumptions about the state of memory across this point. Indeed, but it also prevents proper merging of stores and loads. For example, if x is an atomic variable, the compiler can and should merge these two stores: x = 1; x = 2; It's a trade-off until we have the proper implementation with std::atomic. Agreed. My comments refer to the current common state of the world, not a C11/C++11 world. That's all too new for me to be able to rely on its availability yet. I am eagerly awaiting the day that inline assembly and atomic libraries can be ditched altogether. The only case you need to worry about on x86 (assuming you are not fiddling with special memory) is that earlier stores could be reordered after subsequent loads. That should be the only time you need the real mfence instructions that you have below. load-acquire and store-release don't fall into that category. Which is why the qatomic_gcc.h implementation is a last resort for Qt. Since it does a full memory barrier including mfence, it's way too expensive. Note that x86 does have sfence and lfence instructions too. I should go ask some colleagues at Intel about when they should be used, because I haven't yet found a case where they are needed. It does, but AIUI they are only needed for write-combining (WC) memory regions, such as memory mapped video card frame buffers. Here's the listing of the current implementations of atomics in Qt and their drawbacks, in the order that they are preferred: 0) MSVC: applies only to MSVC, using intrinsics 1) Arch-specific implementation: uses assembly for the fetchAndAdd, testAndSet and fetchAndStore operations, but uses direct volatile loads/stores for loadAcquire and storeRelease, and non-volatile loads/stores for load/store, which are subject to reordering by the compiler. This is the default on x86 with GCC, Clang and ICC. The above will probably only work on Itanium machines where compilers seem to emit acquire/release suffixes on volatile load/store operations. You should probably update this implementation to include compiler barriers. 2) std::atomic implementation, if std::atomic is supported. This appeared on GCC in 4.6, but was implemented using the old intrinsics with full barrier. In GCC 4.7, it uses the new intrinsics that support more relaxed barriers. This works fine for x86, but the implementation is incredibly sub-optimal on other architectures (uses locks on ARM). 3) GCC old intrinsics implementation, the one with full barriers. From GCC's plans, the implementation in 4.8 will solve the implementation issues on other architectures, so we may start using that. That means Qt 5.2. Makes sense. In any of the implementations, all the code is inlined, so there's nothing for valgrind to react to. The best we could do is insert some of valgrind's no-op hints, on debug or special builds. Can you recommend what hints to add? Julian/Bart/etc. may have more to add here. I remember having trouble with annotating load-acquire/store-release in the past. Here's the (only partially helpful) thread on the topic: May I also suggest an out-of-line marker strategy, similar to what the Linux kernel does for catching processor faults, and the ABI does for exceptions? If we had this, we'd leave the markers enabled even in release builds. I'm not sure I'm familiar with this technique. Do you have a link to any reading (even code is fine) that I could do on the subject? -Dave -- Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. ON SALE this month only -- learn more at: http://p.sf.net/sfu/learnnow-d2d ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] helgrind and atomic operations
On Jan 23, 2013, at 10:33 AM CST, Dave Goodell wrote: Julian/Bart/etc. may have more to add here. I remember having trouble with annotating load-acquire/store-release in the past. Here's the (only partially helpful) thread on the topic: Sorry for the extra email. I accidentally whacked the send email key combination instead of the paste key combination. The link that I intended to send was: http://thread.gmane.org/gmane.comp.debugging.valgrind.devel/11471/focus=11480 -Dave -- Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. ON SALE this month only -- learn more at: http://p.sf.net/sfu/learnnow-d2d ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users