[valgrind] [Bug 479191] vgdb is blocked after several tries
https://bugs.kde.org/show_bug.cgi?id=479191 Philippe Waroquiers changed: What|Removed |Added CC||philippe.waroquiers@skynet. ||be --- Comment #2 from Philippe Waroquiers --- On what are you running ? (processor, linux version, ...). The difference between the first and second vgdb trace is that in the second case, the threads are blocked in a system call, and vgdb has to do more complex operations to wake up valgrind. Also, you launch valgrind with --vgdb-error=0. This allows to do some gdb/vgdb operations before startup. What is the reason for this in your case ? If you put two -d options, vgdb will output more debugging info. Also, you might add debugging options (-v -v -v -d -d -d) at valgrind side to see what happens there. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 473944] Handle mold linker split RW PT_LOAD segments correctly
https://bugs.kde.org/show_bug.cgi?id=473944 Philippe Waroquiers changed: What|Removed |Added Status|REPORTED|RESOLVED Resolution|--- |FIXED --- Comment #5 from Philippe Waroquiers --- Fixed in c0b2c786d -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 473944] Handle mold linker split RW PT_LOAD segments correctly
https://bugs.kde.org/show_bug.cgi?id=473944 Philippe Waroquiers changed: What|Removed |Added CC||philippe.waroquiers@skynet. ||be --- Comment #3 from Philippe Waroquiers --- (In reply to Paul Floyd from comment #2) > With the patch I get one regression failure > helgrind/tests/pth_destroy_cond (stderr) > > There is missing source information, presumably due to a failure reading > debuginfo. > > The first aspacem map is > > --54635:1: aspacem <<< SHOW_SEGMENTS: Memory layout at client startup (32 > segments) > --54635:1: aspacem 3 segment names in 3 slots > --54635:1: aspacem freelist is empty > --54635:1: aspacem (0,4,7) > /usr/home/paulf/scratch/valgrind/none/none-amd64-freebsd > --54635:1: aspacem (1,65,6) > /usr/home/paulf/scratch/valgrind/helgrind/tests/pth_destroy_cond > --54635:1: aspacem (2,134,7) /libexec/ld-elf.so.1 > --54635:1: aspacem 0: RSVN 00-1f 2097152 - SmFixed > --54635:1: aspacem 1: file 20-200fff4096 r > d=0x6d8ca7de696e301b i=2440208 o=0 (1,65) > --54635:1: aspacem 2: file 201000-201fff4096 r-x-- > d=0x6d8ca7de696e301b i=2440208 o=0 (1,65) > --54635:1: aspacem 3: file 202000-203fff8192 rw--- > d=0x6d8ca7de696e301b i=2440208 o=0 (1,65) > --54635:1: aspacem 4: RSVN 204000-0003ff 61m - SmFixed > --54635:1: aspacem 5: file 000400-0004006fff 28672 r > d=0x28a8dde4190bc5c i=1049059 o=0 (2,134) > --54635:1: aspacem 6: file 0004007000-000401cfff 90112 r-x-- > d=0x28a8dde4190bc5c i=1049059 o=24576 (2,134) > --54635:1: aspacem 7: file 000401d000-000401dfff4096 rw--- > d=0x28a8dde4190bc5c i=1049059 o=110592 (2,134) > --54635:1: aspacem 8: file 000401e000-000401efff4096 rw--- > d=0x28a8dde4190bc5c i=1049059 o=110592 (2,134) > --54635:1: aspacem 9: anon 000401f000-0004014096 rw--- > --54635:1: aspacem 10: anon 000402-0004020fff4096 rwx-- > --54635:1: aspacem 11: RSVN 0004021000-000481 8384512 - SmLower > --54635:1: aspacem 12: 000482-0037ff823m > --54635:1: aspacem 13: FILE 003800-00380abfff 704512 r > d=0x696e301b i=1844040 o=0 (0,4) > --54635:1: aspacem 14: FILE 00380ac000-0038141fff 614400 r-x-- > d=0x696e301b i=1844040 o=700416 (0,4) > --54635:1: aspacem 15: file 0038142000-0038142fff4096 r-x-- > d=0x696e301b i=1844040 o=1314816 (0,4) > --54635:1: aspacem 16: FILE 0038143000-003821bfff 32 r-x-- > d=0x696e301b i=1844040 o=1318912 (0,4) > --54635:1: aspacem 17: FILE 003821c000-003821cfff4096 rw--- > d=0x696e301b i=1844040 o=2203648 (0,4) > > objdump: > paulf> objdump -p pth_destroy_cond > > > pth_destroy_cond: file format elf64-x86-64-freebsd > > Program Header: > PHDR off0x0040 vaddr 0x00200040 paddr > 0x00200040 align 2**3 > filesz 0x0268 memsz 0x0268 flags r-- > INTERP off0x02a8 vaddr 0x002002a8 paddr > 0x002002a8 align 2**0 > filesz 0x0015 memsz 0x0015 flags r-- > LOAD off0x vaddr 0x0020 paddr > 0x0020 align 2**12 > filesz 0x08ec memsz 0x08ec flags r-- > LOAD off0x08f0 vaddr 0x002018f0 paddr > 0x002018f0 align 2**12 > filesz 0x0590 memsz 0x0590 flags r-x > LOAD off0x0e80 vaddr 0x00202e80 paddr > 0x00202e80 align 2**12 > filesz 0x0180 memsz 0x0180 flags rw- > LOAD off0x1000 vaddr 0x00203000 paddr > 0x00203000 align 2**12 > filesz 0x0098 memsz 0x00c8 flags rw- > > And the trace when running with a couple more I added > --56884-- ++*rw_load_count to 2 for > /usr/home/paulf/scratch/valgrind/helgrind/tests/pth_destroy_cond > --56884-- offset 1000 offset roundup 1000 > --56884-- prev + size 203e80 addr 203000 > > If I change the condition to > >if (previous_rw_a_phdr.p_memsz > 0 && >ehdr_m.e_type == ET_EXEC && >previous_rw_a_phdr.p_vaddr + previous_rw_a_phdr.p_filesz > == a_phdr.p_vaddr) > > then it works. Thanks for the testing. The above condition also works for the executables linked by mold 1.5.1 in my setup (RHEL 8.6) (in my case, the condition has to ensure the decrement is not done as the 2 segments are not merged). I will finalize the patch (e.g. to put some traces corresponding to what you added) and push. Thanks Philippe -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 473944] New: Handle mold linker split RW PT_LOAD segments correctly
https://bugs.kde.org/show_bug.cgi?id=473944 Bug ID: 473944 Summary: Handle mold linker split RW PT_LOAD segments correctly Classification: Developer tools Product: valgrind Version: 3.21.0 Platform: Other OS: Linux Status: REPORTED Severity: normal Priority: NOR Component: general Assignee: jsew...@acm.org Reporter: philippe.waroqui...@skynet.be Target Milestone: --- Created attachment 161282 --> https://bugs.kde.org/attachment.cgi?id=161282=edit change the condition to detect 2 PT_LOADs will be merged This is a follow-up/similar problem as reported and fixed in bug 452802. valgrind could not load the debug info of the main executable. The problem is the same as 452802. The fix of 452802 does not work as the logic to detect that the 2 PT_LOAD segments are different currently wrongly concludes that the 2 segments can be combined. Here are the details (with some remarks/comments prefixed with #). The attached patch solves the problem on my setup (RHEL 8.6, ld-2.28, mold 1.5.1), but as this completely changes the condition to consider the segments to be mergeable, would be nice to (re-)validate this e.g. with lld or other setups where split PT_LOAD are produced. ELF Header: Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 Class: ELF64 Data: 2's complement, little endian Version: 1 (current) OS/ABI:UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: Advanced Micro Devices X86-64 Version: 0x1 Entry point address: 0x25e000 Start of program headers: 64 (bytes into file) Start of section headers: 214499808 (bytes into file) Flags: 0x0 Size of this header: 64 (bytes) Size of program headers: 56 (bytes) Number of program headers: 13 Size of section headers: 64 (bytes) Number of section headers: 64 Section header string table index: 49 Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 0] NULL 0 0 0 [ 1] .interp PROGBITS 00200318 0318 001c A 0 0 1 [ 2] .note.gnu.build-i NOTE 00200334 0334 0024 A 0 0 4 [ 3] .note.ABI-tag NOTE 00200358 0358 0020 A 0 0 4 [ 4] .hash HASH 00200378 0378 7f60 0004 A 6 0 4 [ 5] .gnu.hash GNU_HASH 002082d8 82d8 02b8 A 6 0 8 [ 6] .dynsym DYNSYM 00208590 8590 00017e08 0018 A 7 1 8 [ 7] .dynstr STRTAB 00220398 00020398 0001999a A 0 0 1 [ 8] .gnu.version VERSYM 00239d32 00039d32 1fd6 0002 A 6 0 2 [ 9] .gnu.version_rVERNEED 0023bd08 0003bd08 02c0 A 7 9 8 [10] .rela.dyn RELA 0023bfc8 0003bfc8 1188 0018 A 6 0 8 [11] .rela.plt RELA 0023d150 0003d150 00013110 0018 A 635 8 [12] .plt PROGBITS 00251000 00051000 cb80 AX 0 0 16 [13] .plt.got PROGBITS 0025db80 0005db80 0230 AX 0 0 16 [14] .fini PROGBITS 0025ddb0 0005ddb0 000d AX 0 0 4 [15] .init PROGBITS 0025ddc0 0005ddc0 0020 AX 0 0 4 [16] .text PROGBITS 0025e000 0005e000 07bce0d4 AX 0 0 4096 [17] google_malloc PROGBITS 07e2c100 07c2c100 1f1a AX 0 0 64 [18] malloc_hook PROGBITS 07e2e020 07c2e020 0685 AX
[valgrind] [Bug 444488] Use glibc.pthread.stack_cache_size tunable
https://bugs.kde.org/show_bug.cgi?id=88 --- Comment #7 from Philippe Waroquiers --- (In reply to Paul Floyd from comment #6) > > I think we should check and use the existing hint. Current users of the > > hint > > will/should have the same behaviour whatever the glibc version. > > There is gnu_get_libc_version(). Now, how to call that without breaking musl. I suspect it might be problematic to call a glibc function (as a host function) during client image initialisation. So, we might have to always set the tunable env variable as part of the client initialisation, and call gnu_get_libc_version before producing an error message that the old hack did not work. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 444488] Use glibc.pthread.stack_cache_size tunable
https://bugs.kde.org/show_bug.cgi?id=88 --- Comment #5 from Philippe Waroquiers --- (In reply to Paul Floyd from comment #4) > Thanks for adding me to the CC Philippe. > > If I do this: > export GLIBC_TUNABLES="glibc.pthread.stack_cache_size=0" > > Then helgrind/tests/tls_threads fails with just > +--21937:0: sched WARNING: pthread stack cache cannot be disabled! > > Without the env var there are a load of > > +Possible data race during write of size 8 at 0x by thread #x > > errors > > Do we have a way of knowing that GLIBC_TUNABLES did something so that we > don't need to twiddle with stack_cache_actsize? If we can detect the glibc version, then we can avoid using the old stack_cache_actsize hack. > Also --sim-hints=no-nptl-pthread-stackcache isn't turned on by default. Do > we want to check for it in setup_client_env () and only put GLIBC_TUNABLES > in the environment if it is used? Or perhaps add a new simhint. I think we should check and use the existing hint. Current users of the hint will/should have the same behaviour whatever the glibc version. > > Bonus points for handling GLIBC_TUNABLES already set by the tuner, and add > or replace glibc.pthread.stack_cache_size > > This doesn't seem to help the example being discussed in the valgrind-users > list. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 444488] Use glibc.pthread.stack_cache_size tunable
https://bugs.kde.org/show_bug.cgi?id=88 --- Comment #3 from Philippe Waroquiers --- (In reply to Mark Wielaard from comment #2) > (In reply to Philippe Waroquiers from comment #1) > > In the discussion on valgrind-users mailing list, > > Paul reported tthat: > > 'It looks like "stack_cache_actsize" in libc moved to be > > _dl_stack_cache_actsize in ld-linux-x86-64.so.2' > > > > Is there a way to modify the glibc glibc.pthread.stack_cache_size tunable > > from valgrind ? > > tunables are set by the GLIBC_TUNABLES environment variable > https://www.gnu.org/software/libc/manual/html_node/Tunables.html > > We can set/add to that GLIBC_TUNABLES environment variable in > coregrind/m_initimg/initimg-linux.c setup_client_env () where we also set > the LD_PRELOAD environment variable. When running with a newer glibc, we should also avoid producing an error message that the old way to disable the stack cache is not working. Likely this implies to detect the version of glibc. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 444488] Use glibc.pthread.stack_cache_size tunable
https://bugs.kde.org/show_bug.cgi?id=88 Philippe Waroquiers changed: What|Removed |Added CC||pjfl...@wanadoo.fr -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 444488] Use glibc.pthread.stack_cache_size tunable
https://bugs.kde.org/show_bug.cgi?id=88 Philippe Waroquiers changed: What|Removed |Added CC||philippe.waroquiers@skynet. ||be --- Comment #1 from Philippe Waroquiers --- In the discussion on valgrind-users mailing list, Paul reported tthat: 'It looks like "stack_cache_actsize" in libc moved to be _dl_stack_cache_actsize in ld-linux-x86-64.so.2' Is there a way to modify the glibc glibc.pthread.stack_cache_size tunable from valgrind ? Or do we assume that the user has to tune this ? Or do we do an alternate implementation of the current valgrind hack using _dl_stack_cache_actsize in ld-linux-x86-64.so.2 ? -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 458915] syscall sometimes returns its number instead of return code when vgdb is attached
https://bugs.kde.org/show_bug.cgi?id=458915 Philippe Waroquiers changed: What|Removed |Added Resolution|--- |FIXED Status|REPORTED|RESOLVED --- Comment #27 from Philippe Waroquiers --- (In reply to David Vasek from comment #26) > I work on this issue together with Libor. If you checkout this branch: > https://gitlab.nic.cz/knot/knot-dns/-/commits/valgrind_vgdb_bug , it > includes a few changes that should help you with the debugging. Thanks for the above. With this, I was able to identify the likely culprit for the bug. I have pushed a fix for this as 348775f34 The valgrind regression tests were run on debian/amd64, centos/ppc64 and ubunty/arm64. Also, the ctl/valgrind knot test has run 80 times without encountering the abort (while before the fix, each run of ctl/valgrind was triggering the bug). It would be nice if you could check with the latest state of the git repository if everything works as expected now. Thanks for the clear and detailed instructions to reproduce the bug, which was critically needed to reproduce this race condition. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 460142] Auxiliary stack traces
https://bugs.kde.org/show_bug.cgi?id=460142 Philippe Waroquiers changed: What|Removed |Added CC||philippe.waroquiers@skynet. ||be --- Comment #1 from Philippe Waroquiers --- (In reply to Simon Richter from comment #0) > The three stack traces I get (allocation, deallocation and use) show what is > happening, but it is difficult for me to find the point where the string is > given to Python -- the function to intern the string is called quite often, > so I can't just easily break there. Waiting for valgrind to provide some support to record auxiliary stack traces as you suggest, what you might do is to capture this stack trace and print it together with the ptr for the memory allocated and given to python. Then when valgrind reports a 'use after free' error, you can search in the program output the last stack trace that mention this pointer. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 458915] syscall sometimes returns its number instead of return code when vgdb is attached
https://bugs.kde.org/show_bug.cgi?id=458915 --- Comment #25 from Philippe Waroquiers --- (In reply to Libor Peltan from comment #24) > You will probably need to run the test several time until it reproduces. It > may also happen on some machines that it does never reproduce. For me, it > easily reproduces on my laptop, but hardly on a powerful server. Thanks for the detailed instructions. I was able to reproduce the bug, happens more or less 1 on 5 trials. I will further work on this time permitting (valgrind is a week-end activity, so only a few hours from time to time). One question: the test outputs valgrind messages in the file 'valgrind' and outputs the valgrind debug output in stderr. This setup is very special, I would like to have both outputs in the same file. I have not seen where is the piece of code that really launches valgrind and makes these 2 separate files (I even do not too much understand how this is done, as valgrind messages and debug output are normally both written on the same fd). Is there an easy change to the test framework that would ensure all the output is either in stderr or valgrind file ? Thanks Philippe -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 458915] syscall sometimes returns its number instead of return code when vgdb is attached
https://bugs.kde.org/show_bug.cgi?id=458915 --- Comment #23 from Philippe Waroquiers --- (In reply to Libor Peltan from comment #22) > (In reply to Philippe Waroquiers from comment #21) > > Valgrind should stop by itself when it finds an error (when using > > --vgdb-error argument) > > The error mentioned by me is an error in application logic. Valgrind has no > way to detect it, no reason to stop in that case and it does not. > > > If yes, as a bypass for this bug, you might try to have valgrind invoking > > gdbserver and then launch gdb/vgdb, rather than having gdb/vgdb > > 'interrupting' a (still) running process. > > Yes, we can use a workaround that we don't invoke vgdb at all in such cases. > But in any case, it would be nice to fix what's going wrong in Valgrind. > > > But in any case, what you are doing should not cause a problem. When I have > > a little bit of time, I will dig again in the vgdb logic > > and see if/where it could create such wrong interaction. > > Thank you for continuous effor in this issue! I tried to create a tiny small > program that does only listens to incomming UDP packets while using poll or > epoll_wait syscalls, and frequently attach vgdb to it. Unfortunately, the > issue did not reproduce in this simple scenario. You might need to have a mixture of threads executing some cpu instructions, some threads blocked in syscalls, and some signals such as SIGALRM. > The only reproducer we have > so far (at least pretty reliable) is the whole Knot DNS and its test-case. > If you wish, we can give you instructions how to build it and launch the > test-case, but it'll be several steps to do. If the instructions are not too long to describe, it will not harm to have them. If simple enough and building Knot DNS does not need too much dependencies etc ..., I can give it a try. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 458915] syscall sometimes returns its number instead of return code when vgdb is attached
https://bugs.kde.org/show_bug.cgi?id=458915 --- Comment #21 from Philippe Waroquiers --- (In reply to Libor Peltan from comment #20) > Thank you for your observations! Based on this, we actually found out that > the issue happens exactly (sometimes!) when we attach vgdb to the running > process, like this: > > ``` > /usr/bin/gdb -ex "set confirm off" -ex "target remote | /usr/bin/vgdb > --pid=5944" -ex "info threads" -ex "thread apply all bt full" -ex q > /home/peltan/master_knot/src/knotd > ``` > > I apologize that we overlooked this improtant fact earlier. (Our test > environment performs this automatically when a routine error occurs.) > > We will continue working on minimizing the reproducer in following days. That starts to clarify where the problem could originate from. Valgrind should stop by itself when it finds an error (when using --vgdb-error argument) and invoke its gdbserver, waiting for gdb/vgdb to connect. Do you mean that the above command is launched (somewhat asynchronously) when an error is detected via other ways ? If yes, as a bypass for this bug, you might try to have valgrind invoking gdbserver and then launch gdb/vgdb, rather than having gdb/vgdb 'interrupting' a (still) running process. But in any case, what you are doing should not cause a problem. When I have a little bit of time, I will dig again in the vgdb logic and see if/where it could create such wrong interaction. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 458915] syscall sometimes returns its number instead of return code
https://bugs.kde.org/show_bug.cgi?id=458915 --- Comment #19 from Philippe Waroquiers --- I took a look at the attached logs. A first observation: * We have 2 groups of 3 threads that get the 0xe8 syscall return. * For each of these 2 groups, we see a little bit before these 0xe8 return that there is a connection to the embedded gdbserver of Valgrind. Here are the line number and occurrences of the 0xe8 syscall return: 6 matches for "0xe8" in buffer: valgrind 26734:SYSCALL[61639,20](232) ... [async] --> Success(0xe8) 26774:SYSCALL[61639,17](232) ... [async] --> Success(0xe8) 26789:SYSCALL[61639,15](232) ... [async] --> Success(0xe8) 31141:SYSCALL[61639,14](232) ... [async] --> Success(0xe8) 31176:SYSCALL[61639,16](232) ... [async] --> Success(0xe8) 31206:SYSCALL[61639,13](232) ... [async] --> Success(0xe8) And here are the 3 matches for the gdbserver: 3 matches for "TO DEBUG" in buffer: valgrind 286:==61639== TO DEBUG THIS PROCESS USING GDB: start GDB like this 26668:==61639== TO DEBUG THIS PROCESS USING GDB: start GDB like this 31121:==61639== TO DEBUG THIS PROCESS USING GDB: start GDB like this where the first one is the message produced at startup. Maybe this is a modified executable that triggers a call to vgdb/gdb when it encounters this syscall problem ? Or is there something that attaches to the valgrind gdbserver or sends a command to it ? Because in this last case, we could possibly have an interaction between vgdb and many threads blocked in syscalls. We see in the stderr trace the following: --61639:2: gdbsrv stored register 0 size 8 name rax value 0007 tid 1 status VgTs_WaitSys --61639:2: gdbsrv stored register 0 size 8 name rax value 00e8 tid 15 status VgTs_WaitSys --61639:2: gdbsrv stored register 0 size 8 name rax value 00e8 tid 17 status VgTs_WaitSys --61639:2: gdbsrv stored register 0 size 8 name rax value 00e8 tid 20 status VgTs_WaitSys --61639:1: gdbsrv stop_pc 0x4CAC04E changed to be resume_pc 0x4C9CD7F: poll (poll.c:29) --61639:2: gdbsrv stored register 0 size 8 name rax value 0007 tid 1 status VgTs_WaitSys --61639:2: gdbsrv stored register 0 size 8 name rax value 00e8 tid 13 status VgTs_WaitSys --61639:2: gdbsrv stored register 0 size 8 name rax value 00e8 tid 14 status VgTs_WaitSys --61639:2: gdbsrv stored register 0 size 8 name rax value 00e8 tid 16 status VgTs_WaitSys --61639:1: gdbsrv VG core calling VG_(gdbserver_report_signal) vki_nr 15 SIGTERM gdb_nr 15 SIGTERM tid 1 - So, for the 2 groups of 3 threads that got 0xe8 syscall return, we see that the valgrind gdbserver was instructed to put 0xe8 in the rax register. It is however difficult to relate the stderr output with the valgrind output. If you could redo the trace with the none tool, but keep together the stderr and the valgrind output (i.e. let valgrind do its output to stderr together with its debug log) + add --time-stamp=yes that might help to see what happens in which order. I have to say that at this state, I have not much idea what other things to look at. To further investigate and possibly find the bug, likely we will need an (easy to run) reproducer. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 459477] XERROR messages lacks ending '\n' in vgdb
https://bugs.kde.org/show_bug.cgi?id=459477 Philippe Waroquiers changed: What|Removed |Added Status|REPORTED|RESOLVED Resolution|--- |FIXED CC||philippe.waroquiers@skynet. ||be --- Comment #1 from Philippe Waroquiers --- Fixed in 3c5720453 (also fixes some occurrences of missing\n in ERROR calls) Thanks for the report and the patch. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 458915] syscall sometimes returns its number instead of return code
https://bugs.kde.org/show_bug.cgi?id=458915 --- Comment #16 from Philippe Waroquiers --- In one of the trace I see the below trace. It looks like the a signal SIGALRM is delivered to the thread that encounters the futex 202 result. --24048-- async signal handler: signal=14, vgtid=24051, tid=4, si_code=-6, exitreason VgSrc_None --24048-- interrupted_syscall: tid=4, ip=0x580e687e, restart=False, sres.isErr=False, sres.val=202 --24048-- at syscall instr: returning EINTR --24048-- delivering signal 14 (SIGALRM):-6 to thread 4 --24048-- push_signal_frame (thread 4): signal 14 ==24048==at 0x4C2C340: futex_wait (futex-internal.h:146) ==24048==by 0x4C2C340: __lll_lock_wait (lowlevellock.c:49) ==24048==by 0x4C32322: __pthread_mutex_cond_lock (pthread_mutex_lock.c:93) ==24048==by 0x4C2E9B3: __pthread_cond_wait_common (pthread_cond_wait.c:616) ==24048==by 0x4C2E9B3: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.c:627) ==24048==by 0x14184D: worker_main (pool.c:70) ==24048==by 0x1395B2: thread_ep (dthreads.c:146) ==24048==by 0x4C2FB42: start_thread (pthread_create.c:442) ==24048==by 0x4CC0BB3: clone (clone.S:100) So, this is another indication that the problem is likely linked to VG_(fixup_guest_state_after_syscall_interrupted). But it is not very clear what is special in your application. Can you also reproduce the problem with the --tool=none tool ? Or does it happen only with memcheck ? Can you check if the problem goes away when using --vex-iropt-register-updates=allregs-at-each-insn ? If the problem cannot be reproduced with this setting, can you see if it reproduces with --vex-iropt-register-updates=allregs-at-mem-access ? -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 459031] Documentation of --error-exitcode is incomplete.
https://bugs.kde.org/show_bug.cgi?id=459031 Philippe Waroquiers changed: What|Removed |Added Status|REPORTED|RESOLVED Resolution|--- |FIXED CC||philippe.waroquiers@skynet. ||be --- Comment #1 from Philippe Waroquiers --- Fixed in e489f3197 -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 458915] syscall sometimes returns its number instead of return code
https://bugs.kde.org/show_bug.cgi?id=458915 --- Comment #8 from Philippe Waroquiers --- I took a look at both logs. First the epoll log. (tid is the an thread id number used internally in valgrind) What we see is that the tid 14 is just getting the result of a previous epoll syscall, and then starts a new epoll syscall: --111235-- SCHED[14]: acquired lock (VG_(client_syscall)[async])<<<<< tid 14 acquires the valgrind lock as the epoll syscall ended SYSCALL[111235,14](232) ... [async] --> Success(0x0) <<<<< and the tid 14 reports the success of this syscall --111235-- SCHED[14]: TRC: SYSCALL <<<<< it launches a new epoll syscall SYSCALL[111235,14](232) sys_epoll_wait ( 18, 0x2610f740, 1, 1000 ) --> [async] ... --111235-- SCHED[14]: releasing lock (VG_(client_syscall)[async]) -> VgTs_WaitSys <<<<< and releases the lock waiting for the result --111235-- SCHED[35]: acquired lock (VG_(scheduler):timeslice) <<<<< and the tid 35 acquires the lock . Then later on, the tid 18 calls tgkill sending a (fatal) signal to itself (I believe it is to itself, the tracing of valgrind of the link between the tid and the linux thread id is not very clear). As this signal is fatal, all threads are being killed by valgrind We see that a little bit before the tgkill that tid18 does a write on fd 2. Possibly that is an indication of reporting an error/problem. The problem with the futex has a similar pattern: The tid 6 starts a futex syscall and releases the valgrind lock. Then sometime later, the tid 11 is doing an mmap, and then slightly after calls tgkill. And similarly this tid 11 does a write on fd 2 a little bit before. The processing of a fatal signal in valgrind is quite tricky : complex code, with race conditions see e.g. bug 409367. This fatal signal has to get all the threads out of their syscalls. For this, a kind of internal signal "vgkill" is sent by the valgrind scheduler to all threads. When the signal is received, valgrind detects that the thread was in a syscall and that the thread has to "interrupt" the syscall. For this, valgrind calls VG_(post_syscall). But this post_syscall assumes that the guest state is correctly "fixed", but I do not see where this is done. So, an hypothesis about what happens: * the application encounters an error condition (in tid 18 in the epoll case, in tid 11 in the futex case) * this application thread decides to call abort, generating a fatal signal * valgrind handling of a fatal signal is for sure complex and might still be racy, and might not properly reset the guest state when the fatal signal has to be handled by a thread doing e.g. epoll or futex syscall As the guest state is not properly restored, when this thread "resumes" and/or due to race conditions, instead of just dying, it continues and then itself reports a strange state as the guest thread state was not properly restored using a call to VG_(fixup_guest_state_after_syscall_interrupted). To validate this hypothesis, maybe the following could be done: * check what is this "write on fd 2" doing (maybe with strace?) * in case the application encounters a problem, instead of calling abort that sends a fatal signal, you might rather do e.g. sleep(10). If the hypothesis is correct, then the thread doing epoll or futex should just stay blocked in their syscall, and the thread detecting the problem will sleep in this state. It might then be possible to attach using gdb+vgdb and investigate the state of the application and/or the valgrind threads. There is a way to tell valgrind to launch gdbserver in case of an abnormal valgrind exit using the option: --vgdb-stop-at=event1,event2,... invoke gdbserver for given events [none] where event is one of: startup exit valgrindabexit all none Looks like we might add abexit to ask valgrind to call gdbserver when a client thread/process does an abnormal exit. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 458915] syscall sometimes returns its number instead of return code
https://bugs.kde.org/show_bug.cgi?id=458915 Philippe Waroquiers changed: What|Removed |Added CC||philippe.waroquiers@skynet. ||be --- Comment #2 from Philippe Waroquiers --- Does strace show that there are some signals being processed close (in time) to the system call wrongly returning its syscall number ? If a signal happens when syscall-amd64-linux.S is being executed, then VG_(fixup_guest_state_after_syscall_interrupted) has some complex logic that interacts with the partially executed asm code. Alternatively, having more valgrind tracing might give some hints. You could try valgrind -v -v -v -d -d -d --trace-syscalls=yes --trace-signals=yes your_app and if your application is multi-threaded (I guess it is), you might also use --trace-sched=yes With regards to "What intrigues me that both the syscall number and the return value appear in the RAX register at some point." If you speak about the "physical RAX register", then I think this is normal. To execute a syscall, the syscall number must be set to the syscall number before the syscall instruction, and on return of the syscall instruction, the RAX register contains the syscall return value. When this syscall instruction is finished, the syscall return code (stored by the kernel in the physical register RAX) must be moved to the guest register RAX. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 441069] Process terminating with default action of signal 4 (SIGILL) Illegal opcode at address 0x580A3C2C at 0x4000B00: ??? (in /lib/ld-2.26.so)
https://bugs.kde.org/show_bug.cgi?id=441069 Philippe Waroquiers changed: What|Removed |Added CC||philippe.waroquiers@skynet. ||be --- Comment #1 from Philippe Waroquiers --- As far as I can see, your test program is trivial. If valgrind does not work at all on such a trivial program, it might be due to your specific installation/version of the OS (your program crashes in the dynamic loader). So, the first thing to do is to try with the latest valgrind version (3.19.0 or the git version) -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 458118] Track deletions of objects from unloaded shared libraries
https://bugs.kde.org/show_bug.cgi?id=458118 Philippe Waroquiers changed: What|Removed |Added CC||philippe.waroquiers@skynet. ||be --- Comment #3 from Philippe Waroquiers --- Valgrind keeps recently freed blocks in a list that allows to report where it was allocated. If the size of this list (controlled by --freelist-vol parameter) is big enough and you use --keep-debuginfo=yes, then I think valgrind should be able to tell you the stack trace that allocated the referenced freed block. Now, if the segmentation violation happens because the destructor code has been unloaded and this destructor code is not found anymore via a pointer in the dispatch table, then valgrind does not track executable code and/or dispatch table. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 457898] Multiple threads: Assertion 'found' failed.
https://bugs.kde.org/show_bug.cgi?id=457898 Philippe Waroquiers changed: What|Removed |Added CC||philippe.waroquiers@skynet. ||be --- Comment #1 from Philippe Waroquiers --- Works for me on debian 11.4 Can you re-run your test program with debug info under a valgrind itself compiled with debug info and using more trace (such as -d -d -d --trace-sched=yes) and attach the resulting log file ? Thanks Philippe -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 457619] Instructions are not consistently executed after returning from a SIGSEGV signal handler
https://bugs.kde.org/show_bug.cgi?id=457619 Philippe Waroquiers changed: What|Removed |Added CC||philippe.waroquiers@skynet. ||be --- Comment #3 from Philippe Waroquiers --- Using the below should solve the problem: valgrind --vex-iropt-register-updates=allregs-at-mem-access ./test See https://valgrind.org/docs/manual/manual-core.html#manual-core.signals for more info. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 455826] Running Valgrind memcheck on a live process without exiting it reports LDL but on graceful exit it does not.
https://bugs.kde.org/show_bug.cgi?id=455826 --- Comment #7 from Philippe Waroquiers --- On Fri, 2022-06-24 at 10:06 +, shapath wrote: > https://bugs.kde.org/show_bug.cgi?id=455826 > > --- Comment #5 from shapath --- > (In reply to Philippe Waroquiers from comment #4) > > (In reply to shapath from comment #3) > > > > > > Valgrind report:- > > > == > > > (gdb) monitor leak_check full reachable any > > When compiling with gcc -g -O0 and doing the leak search, > > I do not get any definitely or possibly leaked block. Leak search reports > > 2 still reachable blocks. > > > > You can use the following to see why a block is still reachable: > > (where 0x4a330a0 is the addess of the strdup-ed "hello" > > (gdb) mo w 0x4a330a0 > > ==8392== Searching for pointers to 0x4a330a0 > > ==8392== tid 1 register R8 pointing at 0x4a330a0 > > (gdb) > > > > > > As you can see, in my case, the address of the just allocated name still > > happens > > to be in a register. > > > > When I force main to return, then name is reported as definitely leaked > > (as the register pointing to name is likely used for something else) > > Tried the suggestion to compile with -O0. Also modified program to print the > address for strdup-ed "hello" before the program hits the infinite while loop. > i see it reported as a definite leak. > > I tried "mo w 0x52050a0" which does not return any reference register where > 0x52050a0 is the address. Depending on the code generated by the compiler and the moment at which a leak search is done, a pointer might still be present (or not) in one register. > > (sjohri/coding)$ gcc -g -O0 valgrind.c -o val_exmple > (sjohri/coding)$ valgrind --log-file=/var/tmp/_valgrind_%p > --xml-file=/var/tmp/_valgrind_xml_%p ./val_exmple > > "The strdup-ed address is 0x52050a0" > > :(sjohri/coding)$ gdb val_exmple > GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.0.1.el7 > Copyright (C) 2013 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "x86_64-redhat-linux-gnu". > For bug reporting instructions, please see: > <http://www.gnu.org/software/gdb/bugs/>... > Reading symbols from /home/sjohri/coding/val_exmple...done. > (gdb) target remote | vgdb > Remote debugging using | vgdb > relaying data between gdb and process 86547 > Reading symbols from > /usr/libexec/valgrind/vgpreload_core-amd64-linux.so...done. > Loaded symbols for /usr/libexec/valgrind/vgpreload_core-amd64-linux.so > Reading symbols from > /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so...done. > Loaded symbols for /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so > Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done. > Loaded symbols for /lib64/libc.so.6 > Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols > found)...done. > Loaded symbols for /lib64/ld-linux-x86-64.so.2 > 0x04efc9e0 in __nanosleep_nocancel () from /lib64/libc.so.6 > Missing separate debuginfos, use: debuginfo-install > glibc-2.17-326.0.1.el7_9.x86_64 > (gdb) monitor leak_check full reachable any > ==86547== 6 bytes in 1 blocks are definitely lost in loss record 1 of 2 > ==86547==at 0x4C29F73: malloc (vg_replace_malloc.c:309) > ==86547==by 0x4EC3B89: strdup (in /usr/lib64/libc-2.17.so) > ==86547==by 0x4006B2: my_malloc (valgrind.c:29) > ==86547==by 0x40070E: main (valgrind.c:39) As the struct component "name" is not aligned, the content of "char *name" is not considered as a pointer, and so the strdup-ed string is considered as definitely lost in your case. Philippe -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 455826] Running Valgrind memcheck on a live process without exiting it reports LDL but on graceful exit it does not.
https://bugs.kde.org/show_bug.cgi?id=455826 --- Comment #4 from Philippe Waroquiers --- (In reply to shapath from comment #3) > > Valgrind report:- > == > (gdb) monitor leak_check full reachable any When compiling with gcc -g -O0 and doing the leak search, I do not get any definitely or possibly leaked block. Leak search reports 2 still reachable blocks. You can use the following to see why a block is still reachable: (where 0x4a330a0 is the addess of the strdup-ed "hello" (gdb) mo w 0x4a330a0 ==8392== Searching for pointers to 0x4a330a0 ==8392== tid 1 register R8 pointing at 0x4a330a0 (gdb) As you can see, in my case, the address of the just allocated name still happens to be in a register. When I force main to return, then name is reported as definitely leaked (as the register pointing to name is likely used for something else) -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 455826] Running Valgrind memcheck on a live process without exiting it reports LDL but on graceful exit it does not.
https://bugs.kde.org/show_bug.cgi?id=455826 Philippe Waroquiers changed: What|Removed |Added CC||philippe.waroquiers@skynet. ||be --- Comment #2 from Philippe Waroquiers --- Note that the leak search algorithm is scanning the memory starting from "root" memory zone (stacks, global variables, registers, ...). During this scanning, any aligned piece of memory which happens to point at a block will be considered as a pointer. So, for example, if an integer variable happens to have the same bit representation as the address of an allocated (but lost) block, the leak search will not detect the lost block as a leak, because it has found a "pointer" to this block. So, possibly, depending on what the process does before exit, it might create some bit patterns that look like a pointer. The leak search algorithm might thus have false negative: some real leaks might not be detected. I do not see how the leak search algorithm could create a false positive lost block (ignoring the possibility that the algorithm is buggy of course). Note also that monitor leak_check is just launching the same leak search algorithms as used by client requests and used at exit. As Paul said, more info (e.g. what does the leak stack trace look like ? Is such a leak report plausible when it is detected) might clarify. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 452058] Generated suppressions contain a mix of mangled (physical) and demangled (inline) frames
https://bugs.kde.org/show_bug.cgi?id=452058 Philippe Waroquiers changed: What|Removed |Added CC||philippe.waroquiers@skynet. ||be --- Comment #3 from Philippe Waroquiers --- Note that even when not using inline information; I think that suppression entries will similarly not match when the compiler does different inlining decisions. If we tell valgrind to not use the inline information (--read-inline-info=no), then generated suppression entries will only contain the non inlined calls. But if the compiler decides to inline more (or less), then there will be less (or more) entries in the suppression. So, as suggested by Mark, assuming we can always use the mangled name and then always use the inline info, we can then expect to have "more stable" number of stack frames put in the suppression entries (and so not depend anymore on the inline decisions). If mangled names are not found in the debug info, then it looks like we will/might need to make suppression info matching even more sophisticated (another name for complex :)). -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 440765] Feature request: when a dynamically allocated variable is last read/written
https://bugs.kde.org/show_bug.cgi?id=440765 Philippe Waroquiers changed: What|Removed |Added CC||philippe.waroquiers@skynet. ||be --- Comment #1 from Philippe Waroquiers --- Keeping a stack trace of all accesses is a very heavy functionality. This can be implemented as e.g. helgrind --history=full provides this history of past accesses (with the amountof history to keep controlled by --conflict-cache-size=N). An alternative might be to use valgrind+gdb/vgdb and use the gdb command watch to watch accesses to a piece of memory. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 435493] Cannot create 'R_TempDir' under valgrind-3.17.0.GIT-lbmacos
https://bugs.kde.org/show_bug.cgi?id=435493 Philippe Waroquiers changed: What|Removed |Added CC||philippe.waroquiers@skynet. ||be --- Comment #1 from Philippe Waroquiers --- Syscall 475 on darwin is the mkdirat syscall. I guess that when this syscall fails, R then reports the fatal error that it cannot create R_Tempdir. I do not have access to a macos system so cannot fix this but you might maybe try : it might be an easy change in the valgrind file m_syswrap/priv_syswrap-darwin.h, to make something similar to e.g. the line readlinkat a few lines above, and then do a PRE(sys_mkdirat) in syswrap-darwin.c somewhat similar to the PRE(sys_mkdirat) in syswrap-linux.c (I guess only the syscall convention will have to be updated) -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 434035] vgdb might crash if valgrind is killed
https://bugs.kde.org/show_bug.cgi?id=434035 --- Comment #1 from Philippe Waroquiers --- Patch looks ok to me. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 338633] gdbserver_tests/nlcontrolc.vgtest hangs on arm64
https://bugs.kde.org/show_bug.cgi?id=338633 Philippe Waroquiers changed: What|Removed |Added Resolution|--- |FIXED Status|REPORTED|RESOLVED --- Comment #6 from Philippe Waroquiers --- Fixed in c79180a3 -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 432870] gdbserver_tests:nlcontrolc hangs with newest glibc2.33 x86-64
https://bugs.kde.org/show_bug.cgi?id=432870 Philippe Waroquiers changed: What|Removed |Added Resolution|--- |FIXED Status|CONFIRMED |RESOLVED --- Comment #9 from Philippe Waroquiers --- Fixed in c79180a3 -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 432870] gdbserver_tests:nlcontrolc hangs with newest glibc2.33 x86-64
https://bugs.kde.org/show_bug.cgi?id=432870 --- Comment #7 from Philippe Waroquiers --- Created attachment 136473 --> https://bugs.kde.org/attachment.cgi?id=136473=edit fix nlcontrolc.vgtest blocking on arm64 or newer glibc Attach patch should fix the blockage. Tested on debian 10/amd64 and on an arm64 platform. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 434057] Add stdio mode to valgrind's gdbserver
https://bugs.kde.org/show_bug.cgi?id=434057 Philippe Waroquiers changed: What|Removed |Added CC||philippe.waroquiers@skynet. ||be --- Comment #1 from Philippe Waroquiers --- Note that valgrind gdbserver is very special: it is not a 'separate' process that e.g. do ptrace system calls to debug an inferior. The valgrind gdbserver is embedded in valgrind itself. Due to that, when the valgrind guest process is blocked in a syscall, the valgrind gdbserver cannot 'read' packets from gdb. The intermediate vgdb process is reading the packets from gdb, and then wakes up (in a very special way) the valgrind runtime to get the valgrind guest process out of the blocking system call. As such, a direct connnection between gdb and the valgrind gdbserver will not work properly. A possible implementation might be a new vgdb option such as: --valgrind [VALGRIND_OPTIONS ...] PROGRAM_TO_RUN_UNDER_VALGRIND [PROGRAM_ARGS ...] When vgdb gets this argument, it would launch valgrind itself and connect to it. That avoids thus to have to launch valgrind in one window, and have gdb/vgdb launched in another window. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 432510] RFE: ENOMEM fault injection mode
https://bugs.kde.org/show_bug.cgi?id=432510 Philippe Waroquiers changed: What|Removed |Added CC||philippe.waroquiers@skynet. ||be --- Comment #4 from Philippe Waroquiers --- To have a flexible way to specify when/where a memory allocation should fail, we might use a something that re-uses (part of) the suppression infrastructure: The user would give a file with 'suppression-like' entries, but instead of suppressing errors, these entries would put a limit (in nr of allocated blocks and/or nr of allocated bytes) after which a malloc would return NULL. That should be relatively cpu-cheap to implement, as the matching between the alloc stacktrace and the 'heap-limit supp entries' has to be done only the first time a new stacktrace is stored in the list of stack traces. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 427510] Use of uninitialized value in callgrind_annotate.
https://bugs.kde.org/show_bug.cgi?id=427510 Philippe Waroquiers changed: What|Removed |Added CC||philippe.waroquiers@skynet. ||be --- Comment #1 from Philippe Waroquiers --- Seems fixed in recent git version. Can you try with the last 3.16 version (or the last GIT version), instead of a 3.16 GIT version ? -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 426853] vagrind+massif analyze ceph-osd memory OOM
https://bugs.kde.org/show_bug.cgi?id=426853 Philippe Waroquiers changed: What|Removed |Added CC||philippe.waroquiers@skynet. ||be --- Comment #1 from Philippe Waroquiers --- The message "Error: can not open xtree output file" is produced when the open system call fails. Such qn "open" failing is likely caused by some protection problems at operating system level. I see e.g. that the program gets argument such as --setuser ceph. Are you sure the user ceph can write in /root ? You could use the valgrind arguments --massif-out-file=... to change the location where the file is produced. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 424656] Uninitialised value was created by a heap allocation
https://bugs.kde.org/show_bug.cgi?id=424656 Philippe Waroquiers changed: What|Removed |Added Resolution|--- |NOT A BUG Status|REPORTED|RESOLVED --- Comment #3 from Philippe Waroquiers --- Yes, you can suppress errors. See user manual for more info: https://www.valgrind.org/docs/manual/manual-core.html#manual-core.suppress More generally it is hiighly recommended to read or at least scan the user manual. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 424656] Uninitialised value was created by a heap allocation
https://bugs.kde.org/show_bug.cgi?id=424656 Philippe Waroquiers changed: What|Removed |Added CC||philippe.waroquiers@skynet. ||be --- Comment #1 from Philippe Waroquiers --- That looks like a real bug that valgrind detects. The malloc allocates 32 bytes, the strcpy initialises 16 bytes but the printf loop prints the 32 bytes, so effectively prints data nopt initialised. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 417075] pwritev(vector[...]) suppression ignored
https://bugs.kde.org/show_bug.cgi?id=417075 Philippe Waroquiers changed: What|Removed |Added Status|REPORTED|RESOLVED Resolution|--- |INTENTIONAL --- Comment #12 from Philippe Waroquiers --- After much discussion, we decided to keep the 3.15 behaviour rather than rollback to 3.14 behaviour. But in valgrind 3.16, the incompatible supp entry will be detected, and a warning message will be given. That has been pushed today as d9e714812 This is not really fixing the bug, so closing it as RESOLVED INTENTIONAL. Sorry for the backward incompatible change pushed in 3.15, we will try harder in the future to avoid such incompatible changes. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 417075] pwritev(vector[...]) suppression ignored
https://bugs.kde.org/show_bug.cgi?id=417075 --- Comment #11 from Philippe Waroquiers --- Updated the warning message to be: ==3170== WARNING: preadv(vector[...]) is an obsolete suppression line not supported anymore since valgrind 3.15. ==3170== You should replace [...] by a specific index such as [0] or [1] or [2] or similar -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 417075] pwritev(vector[...]) suppression ignored
https://bugs.kde.org/show_bug.cgi?id=417075 --- Comment #10 from Philippe Waroquiers --- Some further notes: I should re-update the warning to replace the final 'or ...' by 'or [2]. And I sincerely hope that nobody is using preadv and pwritev wrongly with huge vectors, as otherwise they might need to type a lot of supp entries. (that is in fact a main reason to avoid variable extra error lines ...) -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 417075] pwritev(vector[...]) suppression ignored
https://bugs.kde.org/show_bug.cgi?id=417075 --- Comment #9 from Philippe Waroquiers --- Created attachment 127809 --> https://bugs.kde.org/attachment.cgi?id=127809=edit not a fix, but detects the incompatible supp entry and produce a warning The commit log explains in details what we envisaged, and why the decision is to keep what 3.15 accepts instead of rolling back the change in 3.15 or make the error matching logic even more complex -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 419562] PR_SET_PTRACER error with Ubuntu on WSL
https://bugs.kde.org/show_bug.cgi?id=419562 --- Comment #3 from Philippe Waroquiers --- (In reply to Evan Hunter from comment #2) > Thanks for the pointers on testing it on vgdb. > It looks like it still hangs vgdb :-( > I too am not sure what the prctl(PR_SET_PTRACER, 1, 0, 0, 0) call is trying > to achieve. It seems to succeed but still leaves vgdb blocked. Using the debug options -d -d -d of vgdb and -v -v -v -d -d -d of valgrind, it should be possible to have an idea about what does not work. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 419562] PR_SET_PTRACER error with Ubuntu on WSL
https://bugs.kde.org/show_bug.cgi?id=419562 Philippe Waroquiers changed: What|Removed |Added CC||philippe.waroquiers@skynet. ||be --- Comment #1 from Philippe Waroquiers --- Thanks for the proposed patch. I do not remember the reason for the line ret = VG_(prctl) (PR_SET_PTRACER, 1, 0, 0, 0); prctl PR_SET_PTRACER documentation indicates that the second argument is either PR_SET_PTRACER_ANY (to allow any process to ptrace the caller), or a pid (to allow pid to ptrace the caller) or 0 (to not allow anymore a process to ptrace the caller). So, the reason of the call with the pid 1 is not clear (anymore to me. I must have had a good reason at a time, but not commented :(. That being said: Does calling set_ptracer with value 1 effectively allow vgdb to get a blocked valgrind process out of the syscall ? In other words, before your patch: valgrind sleep 100 in another window: vgdb help and vgdb should block or give error msg or similar (you might use vgdb -d -d -d -d help to get more info about what is going on) and after your patch, vgdb -d -d -d -d help should be able to wake up valgrind and produce the help text. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 418840] SIG_IGN doesn't clear pending signal if SIG_IGN is already the handler
https://bugs.kde.org/show_bug.cgi?id=418840 Philippe Waroquiers changed: What|Removed |Added CC||philippe.waroquiers@skynet. ||be --- Comment #1 from Philippe Waroquiers --- I think the bug originates from the lazy translation of the scss to skss, in function static void handle_SCSS_change ( Bool force_update ). When force_update is false (which is the case when handling a guest sigaction call), and the old setup is equal to the new setup of the kernel signal state, then no kernel sigaction is executed. So, when a signal is blocked+ignored then raised, the signal is queued by the kernel till it is unblocked (and then it will be ignored). In this state, if a blocked signal is re-ignored, the kernel clears it. But in the same circumstance, valgrind signal handling does not call sigaction: valgrind does not know that there is a blocked ignored queued signal in the kernel, it just sees that the signal handling is not changed, and then it does not call the kernel to tell to re-ignore the signal. Changing the call of handle_SCSS_change in sys_sigaction to always use force_update solves the problem, but that means the lazy update of the signal handling is removed. It would be possible to detect this case in handle_SCSS_change: if the current setup is blocked + ignored, then this function could check if there is a signal pending, and then it should still call the kernel to clear the blocked signal. That means one more syscall to get the pending signals in case one or more signals are blocked + ignored. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 417075] pwritev(vector[...]) suppression ignored
https://bugs.kde.org/show_bug.cgi?id=417075 --- Comment #7 from Philippe Waroquiers --- (In reply to Mark Wielaard from comment #5) > This is unfortunate and an unforeseen consequence of making the the error > message more useful (it is useful to know which vector contained > uninitialised bytes). Yes, having more precise info in this area can be very useful. To give more information about an error without making them 'different', it might be more appropriate to put more information in the 'extra' part of the error. The logic to compare this "extra" part is in the tool, and so the tool can consider that this additional info is not to be compared in the error equality logic. And of course, as usual, if you have an error, you can use gdb+vgdb to get as much details as possible about the error being reported. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 417075] pwritev(vector[...]) suppression ignored
https://bugs.kde.org/show_bug.cgi?id=417075 Philippe Waroquiers changed: What|Removed |Added CC||jsew...@acm.org -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 417075] pwritev(vector[...]) suppression ignored
https://bugs.kde.org/show_bug.cgi?id=417075 --- Comment #6 from Philippe Waroquiers --- (In reply to Mark Wielaard from comment #5) > This is unfortunate and an unforeseen consequence of making the the error > message more useful (it is useful to know which vector contained > uninitialised bytes). > > Sadly we have had releases with both the old and new variant of the error > message. So we would indeed break some existing suppressions picking either > the old or new variant. Yes, unfortunate. I will add Julian in the cc list to have some more opinions ... > > I wonder if we can make [...] special so that it either matches the literal > string [...] or []. This can for sure be coded, but the error matching code/logic is IMO quite complex already, and moreover, this logic will be tool and error dependent, as the extra message matching is done by the tool, not by the m_errormgr.c common core module. > > Note: we do use the [...] variant also in [process_vm_](readv|writev) and > vmsplice. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 417075] pwritev(vector[...]) suppression ignored
https://bugs.kde.org/show_bug.cgi?id=417075 Philippe Waroquiers changed: What|Removed |Added CC||ahajk...@redhat.com -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 417075] pwritev(vector[...]) suppression ignored
https://bugs.kde.org/show_bug.cgi?id=417075 Philippe Waroquiers changed: What|Removed |Added CC||m...@klomp.org -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 417075] pwritev(vector[...]) suppression ignored
https://bugs.kde.org/show_bug.cgi?id=417075 --- Comment #4 from Philippe Waroquiers --- (In reply to Leonid Yuriev from comment #3) > > What does `valgrind --gen-suppressions=all ...` show ? > { > >Memcheck:Param >pwritev(vector[0]) >fun:pwritev >fun:mdbx_pwritev >fun:mdbx_flush_iov >fun:mdbx_page_flush >fun:mdbx_txn_commit >fun:_ZN8testcase16breakable_commitEv >fun:_ZN8testcase39db_open__begin__table_create_open_cleanERj >fun:_ZN15testcase_nested5setupEv >fun:_Z12test_executeRK12actor_config >fun:_Z16osal_actor_startRK12actor_configRi >fun:main > } > > { > >Memcheck:Param >pwritev(vector[1]) > > ... >pwritev(vector[2]) > ... >pwritev(vector[3]) and so on. > > > > Does it suppress if you use the suppression exactly as generated by > > valgrind ? > Suppressions works for an every explicitly `vector[N]`, but not for the > `vector[...]`. As I understand, you are expecting vector[...] in the line following Memcheck:Param to match vector[1] or vector[2] or ... There is no such logic. This part of the error must match exactly. Your suppression entry was working with 3.13 because in 3.13, the error was generated with vector[...] Such errors for pwritev was changed from producing vector[...] to vector[0], vector[1], etc as part of the commit: commit b0861063a8d2a55bb7423e90d26806bab0f78a12 Author: Alexandra Hájková AuthorDate: Tue Jun 4 13:47:14 2019 +0200 Commit: Mark Wielaard CommitDate: Wed Jul 3 00:19:16 2019 +0200 As far as I can see, the fact that the new error message after this commit contains a varying offset between brackets is what causes the problem: this looks to me to be a backward incompatible change (as shown by your supp that stopped working between 3.13 and 3.15) and does not match the 'idea' of error parameters. Here are some comments extracted from the description of void VG_(maybe_record_error): Note that `ekind' and `s' are also used to generate a suppression. `s' should therefore not contain data depending on the specific execution (such as addresses, values) but should rather contain e.g. a system call parameter symbolic name. (where 's' is this vector[1] etc string). Wondering how to fix this ... If we go back to the behaviour before 3.15, we break the suppression entries working for 3.15, and if we do not go back, we break the suppression entries working for 3.14 and before. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 417075] pwritev(vector[...]) suppression ignored
https://bugs.kde.org/show_bug.cgi?id=417075 Philippe Waroquiers changed: What|Removed |Added CC||philippe.waroquiers@skynet. ||be --- Comment #2 from Philippe Waroquiers --- What does valgrind --gen-suppressions=all ... show ? Does it suppress if you use the suppression exactly as generated by valgrind ? Thanks -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 413603] callgrind_annotate/cg_annotate truncate function names at '#'
https://bugs.kde.org/show_bug.cgi?id=413603 Philippe Waroquiers changed: What|Removed |Added Resolution|--- |FIXED Status|CONFIRMED |RESOLVED --- Comment #6 from Philippe Waroquiers --- Thanks for the analysis and patch, pushed in aaf64922a -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 413603] callgrind_annotate/cg_annotate truncate function names at '#'
https://bugs.kde.org/show_bug.cgi?id=413603 Philippe Waroquiers changed: What|Removed |Added Summary|callgrind_annotate |callgrind_annotate/cg_annot |truncates function names at |ate truncate function names |'#' |at '#' -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 413603] callgrind_annotate truncates function names at '#'
https://bugs.kde.org/show_bug.cgi?id=413603 --- Comment #5 from Philippe Waroquiers --- (In reply to Andreas Arnez from comment #3) > (In reply to Philippe Waroquiers from comment #2) > > * I am wondering if we should not allow comment lines starting with > > 0 or or spaces characters (like empty lines?) followed by # ? > I wondered about that, too. Actually my first version allowed whitespace > before the comment marker. But then I realized that callgrind_annotate > doesn't skip spaces at the beginning of a line in other cases, either. Thus > I adjusted my patch to be consistent with the rest of the logic. > Do you know of a case where it would be necessary to allow whitespace before > the comment marker? No, I had no specific example in mind, but as the callgrind format is quite general, I was afraid we might have a tool not in the valgrind tree producing such comment lines with spaces before #. I however just tried with kcachegrind, and it reports a warning for such # lines that have spaces before the #. So, I think that what you have done is ok. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 413603] callgrind_annotate truncates function names at '#'
https://bugs.kde.org/show_bug.cgi?id=413603 --- Comment #2 from Philippe Waroquiers --- Thanks for the patch. Two small comments: * I am wondering if we should not allow comment lines starting with 0 or or spaces characters (like empty lines?) followed by # ? * cg_annotate seems to suffer from the same bug. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 410924] massif crashes running jetstream2 benchmark with webkit
https://bugs.kde.org/show_bug.cgi?id=410924 --- Comment #3 from Philippe Waroquiers --- Some feedback ? -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 411134] Allow the user to change a set of command line options during execution.
https://bugs.kde.org/show_bug.cgi?id=411134 Philippe Waroquiers changed: What|Removed |Added Resolution|--- |FIXED Status|REPORTED|RESOLVED --- Comment #3 from Philippe Waroquiers --- Pushed as 3a803036f -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 411134] Allow the user to change a set of command line options during execution.
https://bugs.kde.org/show_bug.cgi?id=411134 Philippe Waroquiers changed: What|Removed |Added CC||philippe.waroquiers@skynet. ||be --- Comment #1 from Philippe Waroquiers --- For background info, see discussion with MariaDB developers https://sourceforge.net/p/valgrind/mailman/message/36738630/ and a similar discussion on StackOverflow https://stackoverflow.com/questions/57245062/suppress-leak-check-in-a-specific-forked-child -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 411134] New: Allow the user to change a set of command line options during execution.
https://bugs.kde.org/show_bug.cgi?id=411134 Bug ID: 411134 Summary: Allow the user to change a set of command line options during execution. Product: valgrind Version: unspecified Platform: Other OS: Linux Status: REPORTED Severity: wishlist Priority: NOR Component: general Assignee: jsew...@acm.org Reporter: philippe.waroqui...@skynet.be Target Milestone: --- Created attachment 122275 --> https://bugs.kde.org/attachment.cgi?id=122275=edit patch to implement dynamically changeable options The attached patch changes the command line option framework and parsing code to allow to change (some) command line options dynamically. Here is a summary of the new functionality (extracted from NEWS): * It is now possible to dynamically change the value of many command line options while your program (or its children) are running under Valgrind. To have the list of dynamically changeable options, run valgrind --help-dyn-options You can change the options from the shell by using vgdb to launch the monitor command "v.clo ...". The same monitor command can be used from a gdb connected to the valgrind gdbserver. Your program can also change the dynamically changeable options using the client request VALGRIND_CLO_CHANGE(option). -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 400593] In Coregrind, use statx for some internal syscalls if [f]stat[64] fail
https://bugs.kde.org/show_bug.cgi?id=400593 --- Comment #5 from Philippe Waroquiers --- (In reply to Petar Jovanovic from comment #4) > This version looks better, thanks. > I have just pushed it [1] after some testing, but I will leave the issue > open so we can see over the weekend whether there have been regressions on > other platforms or not. > > [1] > https://sourceware.org/git/?p=valgrind.git;a=commit; > h=c6a6cf929f3e2a9bf5d7f09f334ed4d67f2d6e18 Would be good to add the bug nr in NEWS fixed list. Thanks -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 410924] massif crashes running jetstream2 benchmark with webkit
https://bugs.kde.org/show_bug.cgi?id=410924 Philippe Waroquiers changed: What|Removed |Added CC||philippe.waroquiers@skynet. ||be --- Comment #1 from Philippe Waroquiers --- Does the same also crash with other tools, in particular, memcheck and none tools ? Just to be sure, does it also crash if you specify --smc-check=all ? In the attached log, we see that thread 1 is busy executing some code, but thread 2 is blocked in a madvise syscall. Remaining threads are in a timed wait call. It is slightly strange to have thread 2 blocked in madvise syscall. Maybe the application is doing very special actions on mapped pages ? You could also see if adding --px-default=allregs-at-mem-access or --px-default=allregs-at-each-insn changes the behaviour. (possibly, the javascript engine is changing protection of some pages, and expect 'precise exception handling' for handling signals. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 410599] Non-deterministic behaviour of pth_self_kill_15_other test
https://bugs.kde.org/show_bug.cgi?id=410599 --- Comment #5 from Philippe Waroquiers --- (In reply to Stefan Maksimovic from comment #4) > Created attachment 122077 [details] > pth_self_kill.patch v2 > > Thanks Philippe, validating the test through memcheck slipped my mind. > > I've updated the patch by initializing the variables reported by memcheck, > it should be fine now. The patch looks ok to me and can be pushed. Thanks for the work -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 410599] Non-deterministic behaviour of pth_self_kill_15_other test
https://bugs.kde.org/show_bug.cgi?id=410599 --- Comment #3 from Philippe Waroquiers --- (In reply to Stefan Maksimovic from comment #2) > If it's not too much trouble, I suggest you test it yourself just to make > sure. I tested, and the modified test still reproduces the bug with the old release (and is working ok with the new release). However, the test itself is now using uninitialised variables: ==32132== Syscall param rt_sigaction(act->sa_mask) points to uninitialised byte(s) ==32132==at 0x487D800: __libc_sigaction (sigaction.c:58) ==32132==by 0x109266: main (pth_self_kill.c:39) ==32132== Address 0x1ffefffd08 is on thread 1's stack ==32132== in frame #0, created by __libc_sigaction (sigaction.c:43) ==32132== ==32132== Syscall param rt_sigaction(act->sa_flags) points to uninitialised byte(s) ==32132==at 0x487D800: __libc_sigaction (sigaction.c:58) ==32132==by 0x109266: main (pth_self_kill.c:39) ==32132== Address 0x1ffefffcf8 is on thread 1's stack ==32132== in frame #0, created by __libc_sigaction (sigaction.c:43) That should better be fixed to ensure we have a deterministic test :). -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 410599] Non-deterministic behaviour of pth_self_kill_15_other test
https://bugs.kde.org/show_bug.cgi?id=410599 --- Comment #1 from Philippe Waroquiers --- (In reply to Stefan Maksimovic from comment #0) > A recent commit > https://sourceware.org/git/?p=valgrind.git;a=commit; > h=63a9f0793113fd5d828ea7b6183812ad71f924f1 > has introduced a test which exhibits different behaviour on some platforms. > > Namely, running the pth_self_kill_15_other test on these can end in either > of the following: > 1) the spawned thread finishes first > 2) the main thread finishes first > > Running the test multiple times in succession we observed that on x86 the > test finishes as described in the 2) case > whereas on others either of the two cases can be present. > We have seen this behaviour on different arm and mips platforms. > > In the 2) case the output we get corresponds with the .exp file while in the > 1) case we get an extra 'Terminated' string from the kernel on stderr. > > Moreover, we ran the test on arm/mips without the functionality the rest of > that patch provides, to test whether it really hangs/loops on arm/mips or > not. > Interestingly the pth_self_kill_9 test behaves the same on arm/mips and x86 > whereas the pth_self_kill_15_other does finish on arm/mips > (it prints the 'Terminated' message - the spawned thread finishes first). > > A possible solution would be to make the test deterministic; one way would > consist of inserting a pthread_join call. > That would alter the test in terms of the output produced but we believe > that the nature of the test itself would remain intact. > Reading the commit message which introduced the tests, we gather that the > purpose was to test two scenarios(loop/hang) which the > commit was created to solve. > In case the above suggested change would not disrupt the intended > functionality of the test, would it be applicable? > > What course of action would you recommend? Thanks for looking at this (this part of the code and the related tests are very tricky). I suggest you reproduce the bug by using the test program and the previous version of Valgrind. Then modify the test as you want to make it deterministic, but verify that the test still triggers the bug with the old version of Valgrind. Thanks -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 409678] improvement suggestion for dhat
https://bugs.kde.org/show_bug.cgi?id=409678 --- Comment #4 from Philippe Waroquiers --- (In reply to plasmahh from comment #3) > > Seems there is no doc changed, no test changed. > I wasn't aware of that its necessary to do all this, you can then close this > ticket, I was just sharing our changes in the hope it might be useful for > other people that want to know the same information. Yes, of course, any contribution is welcome. A change has however more chances to be integrated if it is clear what it does, and the more complete the patch is, e.g. with doc updated, with a test, ... -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 409367] exit_group() after signal arriving to thread waiting in futex() causes hangs
https://bugs.kde.org/show_bug.cgi?id=409367 --- Comment #6 from Philippe Waroquiers --- (In reply to Allison Karlitskaya from comment #5) > (In reply to Philippe Waroquiers from comment #4) > > Pushed as 63a9f0793 > > Thanks very much, Philippe. > > A few questions, if you don't mind: > > 1) is there any workaround to this problem that you can imagine (in terms of > commandline flags, etc.) that void avoid the problem other than to update > valgrind to a version that includes this patch? Our current workaround is > to add a sleep on the main thread before exit, and I'd like to remove that > ASAP. I do not see a workaround at valgrind command line level. > > 2) when is this patch likely to appear in a release? when is it likely to > appear in a stable release? We only have stable releases :). Typically, there is a release every 6 months or so. The last release was the 12 of April. > > 3) do you think this patch is suitable for backporting/vendor-patching for > distro packages? The patch can for sure be backported, if some distro want to do it. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 409678] improvement suggestion for dhat
https://bugs.kde.org/show_bug.cgi?id=409678 Philippe Waroquiers changed: What|Removed |Added CC||philippe.waroquiers@skynet. ||be --- Comment #1 from Philippe Waroquiers --- The attached patch is in an unusual format, e.g. contains various control characters. Also, what is the idea that the patch is implementing ? Seems there is no doc changed, no test changed. So, that makes it not easy to see what you propose to add. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 409367] exit_group() after signal arriving to thread waiting in futex() causes hangs
https://bugs.kde.org/show_bug.cgi?id=409367 Philippe Waroquiers changed: What|Removed |Added Status|REPORTED|RESOLVED Resolution|--- |FIXED --- Comment #4 from Philippe Waroquiers --- Pushed as 63a9f0793 -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 409141] Valgrind hangs when SIGKILLed
https://bugs.kde.org/show_bug.cgi?id=409141 Philippe Waroquiers changed: What|Removed |Added Status|REPORTED|RESOLVED Resolution|--- |FIXED --- Comment #13 from Philippe Waroquiers --- Thanks for the review. Pushed as 63a9f0793 -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 409429] False positives at unexpected location due to failure to recognize cmpeq as a dependency breaking idiom
https://bugs.kde.org/show_bug.cgi?id=409429 Philippe Waroquiers changed: What|Removed |Added CC||philippe.waroquiers@skynet. ||be --- Comment #1 from Philippe Waroquiers --- For info, reproduced on Ubuntu 19.04 with g++ 8.3.0 and valgrind trunk. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 409141] Valgrind hangs when SIGKILLed
https://bugs.kde.org/show_bug.cgi?id=409141 --- Comment #11 from Philippe Waroquiers --- A patch fixing this problem (and also bug 409367) has been attached to bug 409367. If no remarks on the approach, I will push in a few days. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 409367] exit_group() after signal arriving to thread waiting in futex() causes hangs
https://bugs.kde.org/show_bug.cgi?id=409367 --- Comment #3 from Philippe Waroquiers --- Created attachment 121295 --> https://bugs.kde.org/attachment.cgi?id=121295=edit fix hands and loops when process sends signal to itself I have tested with the reproducer attached, and it works. The test added by the patch is similar to this test. If no remark on the approach, I will push in a few days ... -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 409367] exit_group() after signal arriving to thread waiting in futex() causes hangs
https://bugs.kde.org/show_bug.cgi?id=409367 Philippe Waroquiers changed: What|Removed |Added Assignee|jsew...@acm.org |philippe.waroquiers@skynet. ||be -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 409367] exit_group() after signal arriving to thread waiting in futex() causes hangs
https://bugs.kde.org/show_bug.cgi?id=409367 Philippe Waroquiers changed: What|Removed |Added CC||philippe.waroquiers@skynet. ||be --- Comment #2 from Philippe Waroquiers --- This looks very similar to a loop reproduced with 409141. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 409141] Valgrind hangs when SIGKILLed
https://bugs.kde.org/show_bug.cgi?id=409141 Philippe Waroquiers changed: What|Removed |Added Assignee|jsew...@acm.org |philippe.waroquiers@skynet. ||be -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 255603] exp-sgcheck Assertion '!already_present' failed
https://bugs.kde.org/show_bug.cgi?id=255603 Philippe Waroquiers changed: What|Removed |Added CC||bugzilla@poradnik-webmaster ||a.com --- Comment #10 from Philippe Waroquiers --- *** Bug 409162 has been marked as a duplicate of this bug. *** -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 409162] exp-sgcheck: sg_main.c:559 (add_blocks_to_StackTree): Assertion '!already_present' failed.
https://bugs.kde.org/show_bug.cgi?id=409162 Philippe Waroquiers changed: What|Removed |Added Status|REPORTED|RESOLVED CC||philippe.waroquiers@skynet. ||be Resolution|--- |DUPLICATE --- Comment #1 from Philippe Waroquiers --- See bug 255603. This is (supposed to be) solved in valgrind >= 3.14. Please upgrade your valgrind (it is easy to compile from source) and verify it is working for you. Thanks *** This bug has been marked as a duplicate of bug 255603 *** -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 409141] Valgrind hangs when SIGKILLed
https://bugs.kde.org/show_bug.cgi?id=409141 --- Comment #9 from Philippe Waroquiers --- See also bug https://bugs.kde.org/show_bug.cgi?id=372600 This bug seems somewhat related/similar to the above. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 409141] Valgrind hangs when SIGKILLed
https://bugs.kde.org/show_bug.cgi?id=409141 --- Comment #8 from Philippe Waroquiers --- Thanks for the small reproducer. This small test case is revealing a bunch of problems related to termination of a process when it kills itself, and some problems in the gdbserver debug tracing. This last thing was easily fixable (commit 90d831171). Otherwise, it looks like there are at least 2 problems: * the hang reported here * but a similar hang in case the main thread is sending the signal to the other thread. We then seem to have a race condition between the main thread that exits, and the other thread that tries to kill the process. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 409141] Valgrind hangs when SIGKILLed
https://bugs.kde.org/show_bug.cgi?id=409141 Philippe Waroquiers changed: What|Removed |Added CC||philippe.waroquiers@skynet. ||be --- Comment #5 from Philippe Waroquiers --- IMO, this is supposed to work: If the application is sending SIGKILL to itself, the syscall is intercepted, and some special handling is suppose to happen to ensure the process dies. See e.g. m_signals.c async_signalhandler and/or syswrap-generic.c ML_(do_sigkill). So, if it does not work, this looks to be a real bug. do you have a small compilable reproducer ? -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 406434] valgrind is unable to intercept the malloc calls in statically linked executables
https://bugs.kde.org/show_bug.cgi?id=406434 Philippe Waroquiers changed: What|Removed |Added CC||philippe.waroquiers@skynet. ||be --- Comment #3 from Philippe Waroquiers --- (In reply to Mark Wielaard from comment #2) > (In reply to Tom Hughes from comment #1) > > This is a fundamental, and I believe well documented, limitation. > > > > Because valgrind relies on preloading a shared object to do function > > interception it can't work for a program that doesn't use the dynamic > > linker, and intercepting malloc calls (among other things) relies on the > > function interception system. > > I don't believe that this is well documented. > It would at least be helpful if valgrind clearly warned about this. > For example if there is no PT_INTERP for the executable, but the wrapper > does use LD_PRELOAD for a vgpreload library. > > valgrind doesn't have to do normal ELF symbol interposition. It can > intercept anything with a symbol name/address. The documentation of > --soname-synonyms even implies that this would work for statically linked > code/executables: > >· Replacements in a statically linked library are done by using >the NONE pattern. For example, if you link with libtcmalloc.a, >and only want to intercept the malloc related functions in the >executable (and standard libraries) themselves, but not any >other shared libraries, you can give the option >--soname-synonyms=somalloc=NONE. Note that a NONE pattern will >match the main executable and any shared library having no >soname. > > But this doesn't work in this case because even if it can see and find the > malloc related functions in the executable the vgpreload libraries with the > replacement/interception functions isn't loaded. > > In theory we can get the replacements wired in differently like we do with > add_hardwired_spec () for ld.so. But that would require some way to compile > in the replacement functions into the tools themselves instead of relying on > LD_PRELOAD. Some years ago, I looked at having replacement functions in the tool, and not as LD_PRELOAD. I think the (only?) (main?) reason why the replacement functions are in a .so is that they must run in guest mode, and there is some protection that forbids to run some 'valgrind tool' code in guest mode. IIRC, if we could separate the 'pages' where we have the 'real valgrind' code from the 'tool code that must run in client mode', we could have the same protection but not depend on LD_PRELOAD. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 399355] Add callgrind_diff
https://bugs.kde.org/show_bug.cgi?id=399355 --- Comment #14 from Philippe Waroquiers --- Note that at work, I am busy discussing to have someone working on this bug. So, some progress might happen in the coming weeks (but not for 3.15). -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 406260] valgrind memcheck receive SIGBUS on octeon II CPU
https://bugs.kde.org/show_bug.cgi?id=406260 Philippe Waroquiers changed: What|Removed |Added CC||philippe.waroquiers@skynet. ||be --- Comment #1 from Philippe Waroquiers --- (In reply to Shouhua Yu from comment #0) > gcc version is 4.7 glibc is 2.16 kernel version is 3.10 valgrind code is > 3.14.0 The below message tells that valgrind version is 3.12.0. It would be good to try with the latest release (3.14.0) or even the GIT version. > ==7907== Memcheck, a memory error detector > ==7907== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. > ==7907== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 405782] "VEX temporary storage exhausted" when attempting to debug slic3r-pe
https://bugs.kde.org/show_bug.cgi?id=405782 --- Comment #11 from Philippe Waroquiers --- (In reply to wavexx from comment #10) > Do you still think the buffer sizes should be hard-coded though? > > I know you can recompile and all, and theoretically this should never > happen, but I do expect debugging tools to never fail on crappy input ;) There are advantages and disadvantages to the current approach: As I understand, in terms of software layers, the VEX lib does not have any dependencies to the valgrind memory management layer/address space manager. Have memory sized at startup would break this. Also, when these max are exceeded, this is really an (efficiency) bug. Maybe it might also slightly impact the performance. But for sure, it not nice to have valgrind crashing on valid programs. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 405782] "VEX temporary storage exhausted" when attempting to debug slic3r-pe
https://bugs.kde.org/show_bug.cgi?id=405782 Philippe Waroquiers changed: What|Removed |Added Status|REPORTED|RESOLVED Resolution|--- |FIXED --- Comment #9 from Philippe Waroquiers --- (In reply to wavexx from comment #7) > Created attachment 119159 [details] > valgrind trace (current master) Thanks for the quick return. Looking at the difference, the nr of front end temporaries has been divided by 3 (from 1854 to 594). After instrumentation, divided by >4: 6503 -> 1495 and later on, the generated code is also much smaller. So, Julian did a very good job :). -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 405782] "VEX temporary storage exhausted" when attempting to debug slic3r-pe
https://bugs.kde.org/show_bug.cgi?id=405782 --- Comment #6 from Philippe Waroquiers --- (In reply to wavexx from comment #5) > Indeed, the current master can run it through without any tweak. That is good news. > Is there anything you want me to try? I think the problem should be properly solved. But to grasp a little bit better how much this was improved, if you are courageous, it would be nice to redo the tracing with master of the block that was giving the crash, so that we can evaluate the code improvement. As the new version might not use exactly the same SB nr as the 3.14, you should find the line that looks like: SB 97263 (evchecks 68367534) [tid 1] 0x541a124 (anonymous namespace)::wxPNGImageData::DoLoadPNGFile(wxImage*, (anonymous namespace)::wxPNGInfoStruct&) [clone .constprop.45]+2228 /usr/local/stow/wxWidgets-3.1.2/lib/libwx_gtk3u_core-3.1.so.2.0.0+0x519124 and then do the trace with --trace-notbelow=X --trace-notabove=Y and use X and Y to have 1 or 2 SB before/after the [clone .constprop.45]+2228 address giving the problem. Thanks Philippe -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 405782] "VEX temporary storage exhausted" when attempting to debug slic3r-pe
https://bugs.kde.org/show_bug.cgi?id=405782 --- Comment #4 from Philippe Waroquiers --- I have taken a quick look at the trace, and effectively, the generated code is huge. The code looks related to xmm/ymm registers and instructions. In 3.15, Julian has made a bunch of improvements for the code generation in this area. See e.g. git log 3af8e12b0d49dc87cd26258131ebd60c9b587c74..3b2f8bf69ea11f13357468d28cebc88d41be9199 Could you try to compile the last GIT version and see it it works better ? Thanks Philippe -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 405782] "VEX temporary storage exhausted" when attempting to debug slic3r-pe
https://bugs.kde.org/show_bug.cgi?id=405782 Philippe Waroquiers changed: What|Removed |Added CC||jsew...@acm.org -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 405782] "VEX temporary storage exhausted" when attempting to debug slic3r-pe
https://bugs.kde.org/show_bug.cgi?id=405782 Philippe Waroquiers changed: What|Removed |Added CC||philippe.waroquiers@skynet. ||be --- Comment #1 from Philippe Waroquiers --- Thanks for the bug. Could you attach the VEX debug trace obtained doing the below ? Thanks --- Use the unpatched valgrind (so as to reproduce the problem/crash). run a first time: valgrind --trace-flags= This will output a bunch of lines such as: ... SB 1789 (evchecks 8650) [tid 1] 0x4f833a7 free_mem+231 UNKNOWN_OBJECT+0x0 SB 1790 (evchecks 8651) [tid 1] 0x4f832ae free_slotinfo+110 UNKNOWN_OBJECT+0x0 ... Then rerun with valgrind --trace-flags= --trace-notbelow=X where X is one or two numbers before the SB that causes the crash. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 405516] Memcheck with Ruby produces numerous outputs
https://bugs.kde.org/show_bug.cgi?id=405516 Philippe Waroquiers changed: What|Removed |Added CC||philippe.waroquiers@skynet. ||be --- Comment #3 from Philippe Waroquiers --- Looking at the trace file: As explained by Tom, when you launch ruby, several forks and at least one exec happens. Right at the beginning, you see that valgrind is reading /usr/bin/ruby and then goes on to read /usr/bin/bash. So, it looks like /usr/bin/ruby is a bash script which then launches various things via fork and some exec. Completely at the end, it does an exec of /usr/bin/ruby-mri I will assume that this is the real ruby interpreter or whatever. The conclusion: if you want to analyse what the ruby interpreter does, you have to give --trace-children=yes otherwise what you are valgrind-ing will just be the wrapper around the real ruby thing. You might eliminate some of the output by playing with --trace-children-skip=patt1,patt2,... and/or --trace-children-skip-by-arg=patt1,patt2,... and/or --child-silent-after-fork=no|yes but --trace-children=yes seems mandatory. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 404638] Add VG_(replaceIndexXA)
https://bugs.kde.org/show_bug.cgi?id=404638 Philippe Waroquiers changed: What|Removed |Added Resolution|--- |FIXED CC||philippe.waroquiers@skynet. ||be Status|REPORTED|RESOLVED --- Comment #6 from Philippe Waroquiers --- Slightly modified version of the patch pushed as 081c34ea477. (I have removed some of the asserts and the call to ensureSpace as the replace operation can never make the array grow. Thanks for the patch -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 402833] memcheck/tests/overlap testcase fails, memcpy seen as memmove
https://bugs.kde.org/show_bug.cgi?id=402833 --- Comment #3 from Philippe Waroquiers --- (In reply to Julian Seward from comment #2) > Is there any progress here? How important will it be to fix this for 3.15.0? I believe this will be a non neglectible change in the REDIR mechanism, as the REDIR will have to be done at at ifunc level. As far as I can see; not fixing this means some false negative for memcpy, not detecting overlap args. So, not that critical IMO -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 401454] Add a --show-percs option to cg_annotate and callgrind_annotate.
https://bugs.kde.org/show_bug.cgi?id=401454 Philippe Waroquiers changed: What|Removed |Added Status|REPORTED|RESOLVED Resolution|--- |FIXED CC||philippe.waroquiers@skynet. ||be -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 402369] Overhaul DHAT
https://bugs.kde.org/show_bug.cgi?id=402369 Philippe Waroquiers changed: What|Removed |Added CC||jsew...@acm.org --- Comment #8 from Philippe Waroquiers --- Note that if no effort is available to look at what is suggested in comment 3 and 6, then maybe better to push the patch as is. (for sure, I do not want to block a clear functional improvement for that reason). Julian can maybe comment. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 402369] Overhaul DHAT
https://bugs.kde.org/show_bug.cgi?id=402369 Philippe Waroquiers changed: What|Removed |Added Component|callgrind |dhat Assignee|josef.weidendor...@gmx.de |jsew...@acm.org -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 402369] Overhaul DHAT
https://bugs.kde.org/show_bug.cgi?id=402369 --- Comment #6 from Philippe Waroquiers --- (In reply to Nick Nethercote from comment #5) > > It might be interesting to replace the wordFM by an xtree, > > It may. Nonetheless, I'd rather land the code as-is, because it's a major > improvement over the existing DHAT. We can consider optimizations to the > implementation later :) Optimization (in terms of speed) is one (non major) aspect (I would guess that it is unlikely that dhat spends a lot of cpu in this data structure). The main aspect is to avoid adding another dhat specific data structure, instead of reusing (extending) the 'common coregrind data structure', used by massif/memcheck/helgrind. Or told otherwise: avoid growing the code basis unnecessarily. But I have not looked much in details how easy it would be to extend xtre to make it support the dhat and output json. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 402833] memcheck/tests/overlap testcase fails, memcpy seen as memmove
https://bugs.kde.org/show_bug.cgi?id=402833 Philippe Waroquiers changed: What|Removed |Added CC||philippe.waroquiers@skynet. ||be --- Comment #1 from Philippe Waroquiers --- It also started to fail for me some months ago. Debian GLIBC 2.24-11+deb9u3 gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1) See https://sourceforge.net/p/valgrind/mailman/message/36034448/ At that time, IIRC, I vaguely contemplated this could be fixed by changing the REDIR mechanism by making it 'better ifunc aware, and do the redirection at an earlier stage. -- You are receiving this mail because: You are watching all bug changes.