Re: [Valgrind-users] "Error accepting connection: Function not implemented", is this expected?

2023-06-24 Thread ISHIKAWA,chiaki

On 2023/06/25 2:09, mamsds wrote:

Hi John,

I am using Debian on RaspberryPi and everything is from the official
apt package manager.

Hardware architecture: armv7l GNU/Linux
OS version:Raspbian GNU/Linux 11 (bullseye)
Libmicrohttpd: stable, 0.9.72-2 armhf
Valgrind:  valgrind-3.7.0

I can also share the code that I was testing:
https://github.com/alex-lt-kong/public-address-client



Exact invocation and surrounding output:

# valgrind --leak-check=yes --log-file=/tmp/valgrind.rpt
$HOME/bin/public-address-client/pac.out
Error accepting connection: Function not implemented
Error accepting connection: Function not implemented
Error accepting connection: Function not implemented
Error accepting connection: Function not implemented
Error accepting connection: Function not implemented
Error accepting connection: Function not implemented
Error accepting connection: Function not implemented
Error accepting connection: Function not implemented
Error accepting connection: Function not implemented
Error accepting connection: Function not implemented
[a lot more identical rows...]



strace output:

3454  cacheflush(0x42714b70, 0x42714ca0, 0) = 0
3454  cacheflush(0x42714ca0, 0x42714d20, 0) = 0
3454  cacheflush(0x42714d20, 0x42714dec, 0) = 0
3454  cacheflush(0x42714df0, 0x42714f28, 0) = 0
3454  cacheflush(0x42714f28, 0x427150e0, 0) = 0
3454  getpid()  = 3454
3454  write(1026, "==3454== \n", 10)= 10
3454  getpid()  = 3454
3454  getpid()  = 3454
3454  getpid()  = 3454
3454  getpid()  = 3454
3454  write(1026, "==3454== HEAP SUMMARY:\n==3454== "..., 170) = 170
3454  rt_sigprocmask(SIG_SETMASK, NULL, ~[ILL TRAP BUS FPE KILL SEGV
STOP], 8) = 0
3454  rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP],
NULL, 8) = 0
3454  rt_sigprocmask(SIG_SETMASK, NULL, ~[ILL TRAP BUS FPE KILL SEGV
STOP], 8) = 0
3454  rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP],
NULL, 8) = 0
3454  rt_sigprocmask(SIG_SETMASK, NULL, ~[ILL TRAP BUS FPE KILL SEGV
STOP], 8) = 0
3454  rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP],
NULL, 8) = 0
[The last row repeats a lot of times]



I am not very sure on how to count twenty syscalls since the beginning
of the error though.

Please let me know if you notice any issues. I will leave the mail for
a while then I will file a report to bugs.kde.org.

Thanks,
Alex

On Fri, 2023-06-23 at 12:54 -0700, John Reiser wrote:

... each time a client makes a [HTTP] request, Valgrind complains
"Error
accepting connection: Function not implemented" and my program
fails to
handle the request as a result.

Which versions of each of these pieces are you running:
Libmicrohttpd, valgrind,
hardware architecture, OS?

Please give the exact copy+paste of the invocation of valgrind that
fails,
together with the output from Terminal that surrounds the complaint
"Error accepting connection: Function not implemented".

Please run under /usr/bin/strace, and report the twenty system calls
that are run
shortly before the complaint:
  strace -f -o strace.out valgrind ./my_app args...
You may wish to compare versus the output from strace on the same
command
but without using 'valgrind'.

Then the best way to gain attention of valgrind *developers*
is to put all that info into a bug report at:
https://bugs.kde.org/ ,
and post here in the mailing list the URL of the bug report that you
created.

I think it may be wiser to discuss the details in the web pub report 
mechanism.


But for now, I have a question.

- Does your program run without valgrind? (i.e. does it accept 
connection from outside?)


  If so, please capture the syscalls in that scenario and try capturing 
the syscalls when it accepts the connection.


  Compare that syscalls with the syscalls when your program is run 
under valgrind.


Then you will see where your program's behavior under valgrind deviates 
from the normal flow.


I *THINK* the issue is related to a possible timeout in the library 
which does not occur in the code usually and may not be handled very well.
I have seen some cases where the slowdown under valgrind is like x20 and 
due to this, the ordinary program execution disrupted so much that the 
program bails out due to timeout. It occurs quite often of testing of 
thunderbird mail client, for example.


Judicious use of larger timeout values often fixed the similar issues 
for me in the past.


Given that you need to post so many contextual information, I think 
filing the bug to the kde bug reporting system would be wiser.



Chiaki




___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] bug report: bugzilla refused to accept this saying it contains spam. But it doesn't.

2022-10-12 Thread ISHIKAWA,chiaki

Still No Go.

If anyone reading this e-mail is involved in bugs.kde.org bugzilla 
management, please let the "sysadmin" know there is a perplexed
developer why on earth the following is deemed a spam, and being blocked 
to the bugzilla web.


Oh, yes, please also ask whoever is responsible to list the sysadmin 
address so that it would be much easier for someone to reach them.

That way, I don't have to bother the mailing list.

I have removed the stack trace with timestamp into a separate file in 
the hope that submission now works. It does not.



 when I tried to submit bugzilla entry.

Your comment has been automatically blocked as it is believed to contain 
spam. Please contact Sysadmin if you believe this to be incorrect.


Please press Back and try again.

I tried a few times, removing a few strings, changing @ to AT and 
removing the URL reference to the e-mail exchange at the sourceforge 
archive.

To no avail...

--- begin quote ---

SUMMARY

valgrind 3.20GIT crashes due to SIGFPE while trying to run Thunderbird 
mail client.


I wrote this message initially on September 22.
So "one week ago", etc. are relative to that epoch origin.

I am trying to run thunderbird mail client (TB for short) under valgrind.
TB is created from the so-called comm-central source tree.
My local source was synced with the public source tree about a week ago.

Well, I could run TB under valgrind on August 15.
Also, I believe I could run it about 10 days ago.
However, in the last few days, when I tried to run TB under valgrind, 
valgrind crashed.


I suspect something has changed in the TB's binary toolchain in the last 
10 days or so.


Valgrind version
Valgrind-3.20.0.GIT-90763ca763-20220522X and LibVEX

OS:
Linux version 5.18.0-4-amd64 (debian-kerne AT lists.debian.org) (gcc-11 
(Debian 11.3.0-5) 11.3.0, GNU ld (GNU Binutils for Debian) 
2.38.90.20220713) #1 SMP PREEMPT_DYNAMIC Debian 5.18.16-1 (2022-08-10)


It seems the binary toolchain for TB seemed to have generated debug 
information that valgrind could not grok (?)


I am uploading the relvant part of the log from the test run when the 
fatal error occurred.


When valgrind failed, TB was created using ordinary -g flag of gcc-10.

On a hunch, I re-created TB using -gsplit-dwarf flag to gcc..
Then, with the newly created version of TB, valgrind did not crash 
although it did print

"### unhandled dwarf2 ..." warnings.

So something in the debug information in the libxul.so (about 120MB) is 
not quite right when ordinary non-split dwarf information is in the object..


STEPS TO REPRODUCE
1. run valgrind to check the memory usage of thunderbird mail client 
when it runs a test.

    The command line is dumped in the attached log.
2. Wait for the completion of a test run.
3. valgrind crashes.

OBSERVED RESULT
SIGFPE crash.
VALGRIND INTERNAL ERROR: Valgrind received a signal 8 (SIGFPE) - exiting
See the attachment.

EXPECTED RESULT
valgrind ought to finish the execution of TB running its test successfully.

SOFTWARE/OS VERSIONS
Debian GNU/Linux.
Linux version 5.18.0-4-amd64 (debian-ker...@lists.debian.org) (gcc-11 
(Debian 11.3.0-5) 11.3.0, GNU ld (GNU Binutils for Debian) 
2.38.90.20220713) #1 SMP PREEMPT_DYNAMIC Debian 5.18.16-1 (2022-08-10)


Linux/KDE Plasma:
(available in About System) <--- not sure about this. My Debian XFCE4 
desltop does not ahve "About System" anywhere.
I don't believe the GUI middleware has anything to do with the bug 
reported here.

KDE Plasma Version:
KDE Frameworks Version:
Qt Version:

ADDITIONAL INFORMATION

As I noted above, if I recompile TB using -gsplit-dwarf flag to gcc-10, 
then valgrind prints warnings about

unknown dwarf2 symbol, but it runs TB running its test to its completion.
I pass "-gdwarf-4 " to gcc.
So if something is generating dwarf2 info, it is not gcc, I think.
Maybe rust compiler being used?
 rustc --version
 rustc 1.63.0 (4b91a6ea7 2022-08-08)

In a discussion about this issue in valgrind-users mailing list, John 
Reiser suggested a following simple work-around:
(But I think we need to do something about the "unhandled dwarf2 abbrev 
code 0x25" in the long run.)


--- begin quote
>  O think valgrind experienced a division by zero.
>
> readdwarf.c:
>
>    if (op_code >= info.li_opcode_base) {
>   op_code -= info.li_opcode_base;
>   Word adv = (op_code / info.li_line_range) <--- line 831
> * info.li_min_insn_length;
>   Int advAddr = adv;
>   state_machine_regs.address += adv;

If you can re-build valgrind, then a quick-and-dirty work-around
might be

>   Word adv = (op_code / (info.li_line_range ?: 1))
> * info.li_min_insn_length;

where "x ?: y" is a deprecated-but-useful slang for "x ? x : y".
--- end quote


TIA

PS: BTW, initially, I could not post this bug report to the kde bugzilla 
due to the following error message.


==
Your comment has been automatically blocked as it is believed to contain 
spam. 

Re: [Valgrind-users] bug report: bugzilla refused to accept this saying it contains spam. But it doesn't.

2022-10-12 Thread ISHIKAWA,chiaki

On 2022/09/23 9:11, John Reiser wrote:

I think valgrind experienced a division by zero.

readdwarf.c:

   if (op_code >= info.li_opcode_base) {
  op_code -= info.li_opcode_base;
  Word adv = (op_code / info.li_line_range)    <--- line 831
    * info.li_min_insn_length;
  Int advAddr = adv;
  state_machine_regs.address += adv;


If you can re-build valgrind, then a quick-and-dirty work-around
might be

>   Word adv = (op_code / (info.li_line_range ?: 1))
> * info.li_min_insn_length;

where "x ?: y" is a deprecated-but-useful slang for "x ? x : y".

Also, one probable reason for the bug reporting system rejecting
your first submission is the many consecutive lines that begin with
"00:08.54 GECKO(319869) ".  The work-around for this is to put
the text of the valgrind complaint into an attachment, and say
"See the attachment for the full text of the valgrind complaint."




Thank you for the comment.


I will try to re-submit the bugzilla entry.

I didn't realize the lines starting with timestamp string such as 
""00:08.54 GECKO(319869) " causing the problem.

I thought there were no URLs, and so no spam. :-)

Sorry, somehow I overlooked your e-mail and thus this very tardy response.

TIA

Chiaki




___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] bug report: bugzilla refused to accept this saying it contains spam. But it doesn't.

2022-09-22 Thread ISHIKAWA,chiaki

Hi,

I thought SIGFPE is a bit odd.
But now I know why.
I think valgrind experienced a division by zero.

readdwarf.c:

  if (op_code >= info.li_opcode_base) {
 op_code -= info.li_opcode_base;
 Word adv = (op_code / info.li_line_range)    <--- line 831
   * info.li_min_insn_length;
 Int advAddr = adv;
 state_machine_regs.address += adv;

On 2022/09/22 13:39, ISHIKAWA,chiaki wrote:

Hi,
I could not post this bug report to the kde bugzilla due to the 
following error message.


==
Your comment has been automatically blocked as it is believed to 
contain spam. Please contact Sysadmin if you believe this to be 
incorrect.

==

Well, it does not, and the bugzilla web does not list sysadmin address.

So here it is.

I can send the full log on request.
Also, libxul.so when zipped is about 120MB.
I can send it via webtransfer or something to the interested party.

I forgot to mention, I recreated TB each day after a minor fix, and so 
the change in the binary toolchain is likely the culprit.


Thank you for making the great package available to programming community.

Chiaki



===
valgrind 3.20GIT crashes while trying to run Thunderbird mail client.


SUMMARY

I am trying to run thunderbird mail client (TB for short) under valgrind.
TB is created from the so-called comm-central source tree.
My local source was synced with the public source tree about a week
ago.

Well, I could run TB under valgrind on August 15.
Also, I believe I could run it about 10 days ago.
However, in the last few days, when I tried to run TB under valgrind,
valgrind crashed.

valgrind said:
00:08.54 GECKO(319869) --319873-- WARNING: Serious error when reading 
debug info
00:08.54 GECKO(319869) --319873-- When reading debug info from 
/NEW-SSD/moz-obj-dir/objdir-tb3/toolkit/library/build/libxul.so:
00:08.54 GECKO(319869) --319873-- Only DWARF version 2, 3, 4 and 5 
line info is currently supported.

00:08.54 GECKO(319869) ### unhandled dwarf2 abbrev form code 0x25
00:08.54 GECKO(319869) ### unhandled dwarf2 abbrev form code 0x25
00:08.54 GECKO(319869) ### unhandled dwarf2 abbrev form code 0x25
00:08.54 GECKO(319869) ### unhandled dwarf2 abbrev form code 0x1b
00:08.54 GECKO(319869) --319873-- VALGRIND INTERNAL ERROR: Valgrind 
received a signal 8 (SIGFPE) - exiting
00:08.54 GECKO(319869) --319873-- si_code=1;  Faulting address: 
0x580C4519;  sp: 0x100307c820

00:08.54 GECKO(319869) valgrind: the 'impossible' happened:
00:08.54 GECKO(319869)    Killed by fatal signal
00:08.54 GECKO(319869) host stacktrace:
00:08.54 GECKO(319869) ==319873==    at 0x580C4519: 
read_dwarf2_lineblock (readdwarf.c:831)
00:08.54 GECKO(319869) ==319873==    by 0x580C770F: 
vgModuleLocal_read_debuginfo_dwarf3 (readdwarf.c:1380)
00:08.54 GECKO(319869) ==319873==    by 0x58081053: 
vgModuleLocal_read_elf_debug_info (readelf.c:3489)
00:08.54 GECKO(319869) ==319873==    by 0x5806F0DB: 
di_notify_ACHIEVE_ACCEPT_STATE (debuginfo.c:969)
00:08.54 GECKO(319869) ==319873==    by 0x5806F0DB: 
vgPlain_di_notify_mmap (debuginfo.c:1326)
00:08.54 GECKO(319869) ==319873==    by 0x5809EDBF: 
vgModuleLocal_generic_PRE_sys_mmap (syswrap-generic.c:2466)
00:08.54 GECKO(319869) ==319873==    by 0x580AA5FF: 
vgSysWrap_amd64_linux_sys_mmap_before (syswrap-amd64-linux.c:413)
00:08.54 GECKO(319869) ==319873==    by 0x5809B275: 
vgPlain_client_syscall (syswrap-main.c:2234)
00:08.54 GECKO(319869) ==319873==    by 0x58096D5A: handle_syscall 
(scheduler.c:1211)
00:08.54 GECKO(319869) ==319873==    by 0x58098D67: vgPlain_scheduler 
(scheduler.c:1529)
00:08.54 GECKO(319869) ==319873==    by 0x580E170C: thread_wrapper 
(syswrap-linux.c:101)
00:08.54 GECKO(319869) ==319873==    by 0x580E170C: 
run_a_thread_NORETURN (syswrap-linux.c:154)

00:08.54 GECKO(319869) sched status:
00:08.54 GECKO(319869)   running_tid=1
00:08.54 GECKO(319869) Thread 1: status = VgTs_Runnable syscall 9 
(lwpid 319873)

00:08.54 GECKO(319869) ==319873==    at 0x4020B82: __mmap64 (mmap64.c:59)
00:08.54 GECKO(319869) ==319873==    by 0x4020B82: mmap (mmap64.c:47)
00:08.54 GECKO(319869) ==319873==    by 0x400615B: _dl_map_segments 
(dl-map-segments.h:94)
00:08.54 GECKO(319869) ==319873==    by 0x400615B: 
_dl_map_object_from_fd (dl-load.c:1250)
00:08.55 GECKO(319869) ==319873==    by 0x40074E6: _dl_map_object 
(dl-load.c:2301)
00:08.55 GECKO(319869) ==319873==    by 0x400BB57: 
dl_open_worker_begin (dl-open.c:533)
00:08.55 GECKO(319869) ==319873==    by 0x4BFF09F: _dl_catch_exception 
(dl-error-skeleton.c:208)
00:08.55 GECKO(319869) ==319873==    by 0x400B325: dl_open_worker 
(dl-open.c:777)
00:08.55 GECKO(319869) ==319873==    by 0x4BFF09F: _dl_catch_exception 
(dl-error-skeleton.c:208)
00:08.55 GECKO(319869) ==319873==    by 0x400B70A: _dl_open 
(dl-open.c:878)
00:08.55 GECKO(319869) ==319873==    by 0x4B32D77: dlopen_doit 
(dlopen.c:56)
00:08.55 GECKO(319869) ==319873==    by 0x4BFF09F: _dl_catch_exceptio

[Valgrind-users] bug report: bugzilla refused to accept this saying it contains spam. But it doesn't.

2022-09-21 Thread ISHIKAWA,chiaki

Hi,
I could not post this bug report to the kde bugzilla due to the 
following error message.


==
Your comment has been automatically blocked as it is believed to contain 
spam. Please contact Sysadmin if you believe this to be incorrect.

==

Well, it does not, and the bugzilla web does not list sysadmin address.

So here it is.

I can send the full log on request.
Also, libxul.so when zipped is about 120MB.
I can send it via webtransfer or something to the interested party.

I forgot to mention, I recreated TB each day after a minor fix, and so 
the change in the binary toolchain is likely the culprit.


Thank you for making the great package available to programming community.

Chiaki



===
valgrind 3.20GIT crashes while trying to run Thunderbird mail client.


SUMMARY

I am trying to run thunderbird mail client (TB for short) under valgrind.
TB is created from the so-called comm-central source tree.
My local source was synced with the public source tree about a week
ago.

Well, I could run TB under valgrind on August 15.
Also, I believe I could run it about 10 days ago.
However, in the last few days, when I tried to run TB under valgrind,
valgrind crashed.

valgrind said:
00:08.54 GECKO(319869) --319873-- WARNING: Serious error when reading 
debug info
00:08.54 GECKO(319869) --319873-- When reading debug info from 
/NEW-SSD/moz-obj-dir/objdir-tb3/toolkit/library/build/libxul.so:
00:08.54 GECKO(319869) --319873-- Only DWARF version 2, 3, 4 and 5 line 
info is currently supported.

00:08.54 GECKO(319869) ### unhandled dwarf2 abbrev form code 0x25
00:08.54 GECKO(319869) ### unhandled dwarf2 abbrev form code 0x25
00:08.54 GECKO(319869) ### unhandled dwarf2 abbrev form code 0x25
00:08.54 GECKO(319869) ### unhandled dwarf2 abbrev form code 0x1b
00:08.54 GECKO(319869) --319873-- VALGRIND INTERNAL ERROR: Valgrind 
received a signal 8 (SIGFPE) - exiting
00:08.54 GECKO(319869) --319873-- si_code=1;  Faulting address: 
0x580C4519;  sp: 0x100307c820

00:08.54 GECKO(319869) valgrind: the 'impossible' happened:
00:08.54 GECKO(319869)    Killed by fatal signal
00:08.54 GECKO(319869) host stacktrace:
00:08.54 GECKO(319869) ==319873==    at 0x580C4519: 
read_dwarf2_lineblock (readdwarf.c:831)
00:08.54 GECKO(319869) ==319873==    by 0x580C770F: 
vgModuleLocal_read_debuginfo_dwarf3 (readdwarf.c:1380)
00:08.54 GECKO(319869) ==319873==    by 0x58081053: 
vgModuleLocal_read_elf_debug_info (readelf.c:3489)
00:08.54 GECKO(319869) ==319873==    by 0x5806F0DB: 
di_notify_ACHIEVE_ACCEPT_STATE (debuginfo.c:969)
00:08.54 GECKO(319869) ==319873==    by 0x5806F0DB: 
vgPlain_di_notify_mmap (debuginfo.c:1326)
00:08.54 GECKO(319869) ==319873==    by 0x5809EDBF: 
vgModuleLocal_generic_PRE_sys_mmap (syswrap-generic.c:2466)
00:08.54 GECKO(319869) ==319873==    by 0x580AA5FF: 
vgSysWrap_amd64_linux_sys_mmap_before (syswrap-amd64-linux.c:413)
00:08.54 GECKO(319869) ==319873==    by 0x5809B275: 
vgPlain_client_syscall (syswrap-main.c:2234)
00:08.54 GECKO(319869) ==319873==    by 0x58096D5A: handle_syscall 
(scheduler.c:1211)
00:08.54 GECKO(319869) ==319873==    by 0x58098D67: vgPlain_scheduler 
(scheduler.c:1529)
00:08.54 GECKO(319869) ==319873==    by 0x580E170C: thread_wrapper 
(syswrap-linux.c:101)
00:08.54 GECKO(319869) ==319873==    by 0x580E170C: 
run_a_thread_NORETURN (syswrap-linux.c:154)

00:08.54 GECKO(319869) sched status:
00:08.54 GECKO(319869)   running_tid=1
00:08.54 GECKO(319869) Thread 1: status = VgTs_Runnable syscall 9 (lwpid 
319873)

00:08.54 GECKO(319869) ==319873==    at 0x4020B82: __mmap64 (mmap64.c:59)
00:08.54 GECKO(319869) ==319873==    by 0x4020B82: mmap (mmap64.c:47)
00:08.54 GECKO(319869) ==319873==    by 0x400615B: _dl_map_segments 
(dl-map-segments.h:94)
00:08.54 GECKO(319869) ==319873==    by 0x400615B: 
_dl_map_object_from_fd (dl-load.c:1250)
00:08.55 GECKO(319869) ==319873==    by 0x40074E6: _dl_map_object 
(dl-load.c:2301)
00:08.55 GECKO(319869) ==319873==    by 0x400BB57: dl_open_worker_begin 
(dl-open.c:533)
00:08.55 GECKO(319869) ==319873==    by 0x4BFF09F: _dl_catch_exception 
(dl-error-skeleton.c:208)
00:08.55 GECKO(319869) ==319873==    by 0x400B325: dl_open_worker 
(dl-open.c:777)
00:08.55 GECKO(319869) ==319873==    by 0x4BFF09F: _dl_catch_exception 
(dl-error-skeleton.c:208)

00:08.55 GECKO(319869) ==319873==    by 0x400B70A: _dl_open (dl-open.c:878)
00:08.55 GECKO(319869) ==319873==    by 0x4B32D77: dlopen_doit (dlopen.c:56)
00:08.55 GECKO(319869) ==319873==    by 0x4BFF09F: _dl_catch_exception 
(dl-error-skeleton.c:208)
00:08.55 GECKO(319869) ==319873==    by 0x4BFF15E: _dl_catch_error 
(dl-error-skeleton.c:227)
00:08.55 GECKO(319869) ==319873==    by 0x4B32855: _dlerror_run 
(dlerror.c:138)
00:08.55 GECKO(319869) ==319873==    by 0x4B32E30: dlopen_implementation 
(dlopen.c:71)
00:08.55 GECKO(319869) ==319873==    by 0x4B32E30: dlopen@@GLIBC_2.34 
(dlopen.c:81)
00:08.55 GECKO(319869) ==319873==    by 0x118474: GetLibHandle(char 

Re: [Valgrind-users] Question: bug in user code, valgrind or glibc?

2022-05-23 Thread ISHIKAWA,chiaki

John, Paul, and Mark

Thank you for the information.

Debian is a bit slow in updating tools. It is very conservative.
Eventually, I obtained the valgrind git code.
(Debian is a bit slow in updating tools. It is very conservative.)

It contained the following.

#if defined(VGO_linux)
 STRNCMP(VG_Z_LIBC_SONAME, strncmp)
 STRNCMP(VG_Z_LIBC_SONAME, __GI_strncmp)
 STRNCMP(VG_Z_LIBC_SONAME, __strncmp_sse2)
 STRNCMP(VG_Z_LIBC_SONAME, __strncmp_sse42)
 STRNCMP(VG_Z_LD_LINUX_SO_2, strncmp)                    <---
 STRNCMP(VG_Z_LD_LINUX_X86_64_SO_2, strncmp)  <---

#elif defined(VGO_freebsd)

For now, with this version, I no longer get the warning for strncmp.

As I looked for the false-positive warning in the new log,
I have caught a real issue of my patch for thunderbird mail client.
It was caused by slowdown by valgrind.
This was not quite intentional, but it is surely helpful to simulate 
abnormal condition

to trigger unforeseen uncaught error situations.

The test is still running.
Hopefully no more error related issues.

BTW, it seemed the timing of valgrind has changed from 18.0.
$  valgrind --version
valgrind-3.20.0.GIT

I mean valgrind may take a bit longer ? to simulate the program 
execution. (I am not sure. All I can is
the elapsed time seems different. It may be related to the fact that 
false positive tracedump for strncmp is not printed any more, etc.)

I probably need to tweak time out values during the test.
It could be that I was not testing many smaller tests due to time out 
but did not realize it because

I was focused on real memory errors reported by valgrind.

Thank you again.

Chiaki


On 2022/05/21 18:42, John Reiser wrote:
I sent a log of redirect information to both Paul and John since the 
log was too large was mailing list.


I wonder what would be the preferred public sharing site for such a 
purpose these days.


The preferred way is to create a bug report, attach the large file to 
the bug report,

then post the URL of the bug report in a message to the mailing list.

Begin at  https://valgrind.org/ .  In the left nav, click on "Bug 
Reports", and follow

the directions on the resulting page.



143:39.43 GECKO(115765) ==115769== Invalid read of size 8
143:39.64 GECKO(115765) ==115769==    at 0x4021BF4: strncmp 
(strcmp.S:175)
143:39.64 GECKO(115765) ==115769==    by 0x400655D: is_dst 
(dl-load.c:214) 


This indicates that 'strncmp' should be re-directed from 
ld-linux-x86-64.so.2:

=
diff --git a/shared/vg_replace_strmem.c b/shared/vg_replace_strmem.c
index 3b42b3a87..8272a3ae7 100644
--- a/shared/vg_replace_strmem.c
+++ b/shared/vg_replace_strmem.c
@@ -710,6 +710,7 @@ static inline void my_exit ( int x )
  STRNCMP(VG_Z_LIBC_SONAME, __GI_strncmp)
  STRNCMP(VG_Z_LIBC_SONAME, __strncmp_sse2)
  STRNCMP(VG_Z_LIBC_SONAME, __strncmp_sse42)
+ STRNCMP("ld-linux*.so*", strncmp)

 #elif defined(VGO_freebsd)
  STRNCMP(VG_Z_LIBC_SONAME, strncmp)
=
For instance, such a change is relevant to glibc-2.33-21.fc34.x86_64:
$ readelf --all /lib64/ld-linux-x86-64.so.2 | grep strncmp
  1706: 00022d30  6233 FUNC    LOCAL  DEFAULT   13 strncmp
$


___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users





___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Question: bug in user code, valgrind or glibc?

2022-05-20 Thread ISHIKAWA,chiaki

Hi,

I sent a log of redirect information to both Paul and John since the log 
was too large was mailing list.


I wonder what would be the preferred public sharing site for such a 
purpose these days.


TIA

Chiaki

On 2022/05/21 0:57, John Reiser wrote:
(Wait, I see  "279:13.65 GECKO(392456) ==392459==    by 0x488D2D3: 
dlopen@@GLIBC_2.2.5 (dlopen.c:87)"

Version 2.2.5 is not the same as the version reported for glibc. Hmm? )


The "@@GLIBC_2.2.5" is the linking symbol version assigned by glibc.
This effectively is an ABI version, and the ABI for dlopen
has not changed for many years, even though other parts of
glibc have changed; one recent release is glibc-2.33.

The real key to Chiaki's problem is:

279:13.65 GECKO(392456) ==392459== Invalid read of size 8
279:13.65 GECKO(392456) ==392459==    at 0x4021BF4: strncmp 
(strcmp.S:175) 

which says that this 'strncmp' was not re-directed by valgrind.
Re-running valgrind with the additional command-line parameter
"--trace-redir=yes" will help provide more information.
Probably the run can be stopped after the first actual dlopen,
because that should be enough to trigger all the redirections
that matter here.


___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users





___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Question: bug in user code, valgrind or glibc?

2022-05-20 Thread ISHIKAWA,chiaki

Dear Paul,


On 2022/05/20 16:58, Floyd, Paul wrote:

Hi Chiaki

Debugging redirection issues isn't normally too slow. Redirection is 
done when Valgrind loads the guest executable and libraries.


Run Valgrind with --trace-redir=yes and you should see Valgrind 
printing what it finds in


 * ld.so, the link loader
 * the client executable
 * the valgrind tool
 * the valgrind shared lib preloads (core and tool)
 * any client shared libraries

libc falls under the last category, though there are a small number of 
C functions in the link loader (memcpy, strcmp etc).


You should see things like

--830--  ld-linux-x86-64.so.2 strcmp RL-> (2016.0) 0x040343b0
--830--  libc.so* __strcmp_sse42 RL-> (2016.0) 
0x04034370
--830--  libc.so* __strcmp_sse2  RL-> (2016.0) 
0x04034330
--830--  libc.so* __GI_strcmp    RL-> (2016.0) 
0x040342f0


If you don't see any symbols being redirected then you have a problem.


A+

Paul



I collected the version number info and have been running TB test suite 
under valgrind since this morning.

That was before I read this e-mail.

I will give the version number below first and see if I can run valgrind 
to obtain the redirection information.
(The thing is the already running valgrind+thunderbird is stretching my 
16GB memory linux image and I am not sure if I can start another 
instance of valgrind+thunderbird, or I need to bite the bullet and 
cancel the current run. I am afraid that the test takes close to a full 
day...)
Anyway, let me first send this version info, and I will check to see if 
I can obtain the redirection info easily.



Obviously, I don't seem to have the redirected symbol for strncpy in the 
trace.  That is for sure.

I do see redirection for malloc.
279:13.66 GECKO(392456) ==392459==    at 0x483F7B5: malloc 
(vg_replace_malloc.c:381)


--- version info ---

Hi,

Before I can figure out how to create a short reproducer, here is the 
version info

I collected.

[] Debian Version
ishikawa@ip030:/NEW-SSD/NREF-COMM-CENTRAL/work-dir$ uname -a
Linux ip030 5.17.0-1-amd64 #1 SMP PREEMPT Debian 5.17.3-1 (2022-04-18) 
x86_64 GNU/Linux


[gcc-10] Used compiler. I just re-compiled the source tree using this 
compiler and still get the same error (trace attached at the end.)


Maybe I should use a newer version, but thunderbird mail client heavily 
relies on mozilla source code, and

newer version may encounter a compiler issues (warning or worse).

ishikawa@ip030:/NEW-SSD/NREF-COMM-CENTRAL/work-dir$ gcc-10 --version
gcc-10 (Debian 10.3.0-15) 10.3.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[glibc-a] As for glibc: I was not sure how to check for the version, but 
here it is.
ldd --version and running libc.so as a program was something I never 
realized we could (!)


ishikawa@ip030:/NEW-SSD/NREF-COMM-CENTRAL/work-dir$ ldd 
/NEW-SSD/moz-obj-dir/objdir-tb3/dist/bin/thunderbird

    linux-vdso.so.1 (0x7fffa31ae000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x7f4403b64000)
    /lib64/ld-linux-x86-64.so.2 (0x7f4403d5d000)

[glibc-b]  ldd --version reports:

ishikawa@ip030:/NEW-SSD/NREF-COMM-CENTRAL/work-dir$ ldd --version
ldd (Debian GLIBC 2.33-7) 2.33
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.

[glibc-c] I did not realize that we can "run" GLIBC libc.so file this 
way to obtain glibc

version number.

The above info all points to Debian GLIBC 2.33-7
ishikawa@ip030:/NEW-SSD/NREF-COMM-CENTRAL/work-dir$ 
/lib/x86_64-linux-gnu/libc.so.6

GNU C Library (Debian GLIBC 2.33-7) release release version 2.33.
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 10.3.0.
libc ABIs: UNIQUE IFUNC ABSOLUTE
For bug reporting instructions, please see:
.

[] Version of valgrind:

   valgrind --version
   valgrind-3.18.1

(Well, I was quite upset when I initially realized I was using 
valgrind-3.18.0.GIT which I installed last September,

but I then verified that the bug appears with the current release, too.)

[Source code] mozilla comm-central source version version is:
I have added a few local mods but they don't touch the affected
version.

changeset:   35764:90328ce5bee2
tag: qparent
fxtree:  comm
user:    John Bieling 
date:    Wed May 18 13:13:33 2022 +0300
summary: Bug 1732554 - Make GenericSendMessage async. r=mkmelin

changeset:   35763:74a4091d1c27

[Source code] mozilla mozilla-central source version is:
Again, I have 

Re: [Valgrind-users] Question: bug in user code, valgrind or glibc?

2022-05-20 Thread ISHIKAWA,chiaki

Dear Paul,

Thank you for your e-mail and the lucid explanation.

I am sorry that I could not write to you earlier.
There was something wrong with my PC hardware and it took me quite a 
while to re-install many software products I regularly use.


I will try to create a short  sample. (The whole thunderbird software is 
a gigantic problem.) But it may be difficult
since the source code is large and if the compiler's code generation is 
history-sensitive, the problem may not be easy to re-create.


I will also check on the versions of tools that was used when the 
problem was noticed.

Let me have a couple of hours to check the versions.

BTW, now I vaguely recall that there was an issue with DL-library 
released many years ago by Debian regarding the
symbols for strcpy and friends. I can't recall the details now, but in 
that instance, the lack of proper debug symbols made the
re-direction difficult(?)  If my hazy memory is correct, the today's 
case may be influenced by a similar issue, but I better collect the 
versions so that someone in the know can experiment on their ends.
Back then, I think I created a wrapper that introduces the symbols for 
strcmp and friends. But that was many years ago.


TIA

Chiaki

PS: For those curious enough to know the hardware issue, I wanted to 
replace my Ryzen 1700 CPU with 16MB of L3 cache with

Ryzen 3700x  with 32GB of cache, solely because I learned that
larger the cache, the valgrind running big program like thunderbird mail 
client would fare better.
After a few years of use of 1700, I suspect the CPU is the limiting 
factor. I like it.: it uses much less power than many other modern CPUs. 
So it runs cool, and the PC is very silent without noisy fans.
Unfortunately, when I replaced the CPUs after carefully checking BIOS 
version, etc. to make sure the CPU would run on the motherboard (yes, it 
did. It runs linux without an issue at all,),
somehow Windows 10 Pro hosting my virtualbox running linux did not boot 
any more after the replacement and

trashed my boot environment. Aargh.

In the end, I figured it was faulty AMD SATA driver which got installed 
maybe in the last couple of years when I installed AMD's chip driver.  
It did not cause a problem for Ryzen 1700 for the last few years, but 
with 3700x, the boot fails due to it.

After the boot failure, even the safe mode fails to boot. Ugh.
I had to re-install windows and so had to re-install many applications 
and such that I use for work and hobby. Oh, such is life.
But I am a happy camper now with the second hand Ryzen 3700x and hope to 
run and find more of these valgrind issues of TB soon. The whole build 
time from scratch got shorted from abot 90+ minutes to 60+ minutes. Not bad.
I have yet to figure out the shortening of TB's test suite execution 
time. I am hampered with strange errors that I did not notice a few 
months before. Maybe these are newly introduced errors, including the 
one I reported, and I am analyzing whether I can simply suppress them or 
investigate in detail.



On 2022/05/11 16:54, Floyd, Paul wrote:

Hi

Can you give us

the source of the small reproducer

the versions of Valgrind, Debian, GCC and glibc?

As you mention, functions like strncmp are often optimized to work on 
multiple bytes at a time and to take advantage of the fact that memory 
will always be allocated in a multiple of say 8 or 16 bytes. And what 
happens sometime is that a function like strncmp will be replaced by 
the compiler with something like __strncmp_avx128 or something like 
that. If Valgrind doesn't recognize this it can't redirect it and do 
error checking on it.


I would expect that the error message contain the name of the Valgrind 
redirect, for instance


==22489==    at 0x4033B7C: __strncmp_sse42 (vg_replace_strmem.c:712)

Si it seems to me that you have a redirection problem. For some reason 
Valgrind is not seeing your strncmp when the client libc gets loaded 
into memory.



A+

Paul




___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users





___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


[Valgrind-users] Question: bug in user code, valgrind or glibc?

2022-05-10 Thread ISHIKAWA,chiaki

Hi,

I have been analyzing thunderbird mail client under valgrind for sometime.
memcheck has been so useful for me to find memory-related errors.
Thank  you for releasing this great tool.

Recently, I noticed an invalid read of 8 bytes warning, which should be 
familiar to all of us.


Interestingly, the initial part of the stack trace is found in a report 
in Qt bug database.

It comes from dynamic loading library support.
https://bugreports.qt.io/browse/QTBUG-90374
It was filed last year.

My system is Debian GNU/Linux and I used gcc to compile thunderbird.
The report was done by someone who uses clang.

I believe the issue lies in a certain version of dl-library, glibc OR 
valgrind? The reason I say valgrind might be to blame, too, is as follows.
(Debian is known to release toolchains very conservatively. I think that 
is why I did not see this issue last year.)


Actually, mine has line numbers slight off due to version differences I 
suspect.


143:39.43 GECKO(115765) ==115769== Invalid read of size 8
143:39.64 GECKO(115765) ==115769==    at 0x4021BF4: strncmp (strcmp.S:175)
143:39.64 GECKO(115765) ==115769==    by 0x400655D: is_dst (dl-load.c:214)
143:39.64 GECKO(115765) ==115769==    by 0x4007666: _dl_dst_count 
(dl-load.c:251)
143:39.64 GECKO(115765) ==115769==    by 0x4007857: 
expand_dynamic_string_token (dl-load.c:393)
143:39.64 GECKO(115765) ==115769==    by 0x40079C7: fillin_rpath.isra.0 
(dl-load.c:465)
143:39.68 GECKO(115765) ==115769==    by 0x4007CC2: decompose_rpath 
(dl-load.c:636)
143:39.68 GECKO(115765) ==115769==    by 0x4009E9D: cache_rpath 
(dl-load.c:678)
143:39.68 GECKO(115765) ==115769==    by 0x4009E9D: cache_rpath 
(dl-load.c:659)

      ... [omitted] ...

My local valgrind dump tells me where the address was allocated.

143:40.60 GECKO(115765) ==115769==  Address 0x27ba3819 is 9 bytes inside 
a block of size 15 alloc'd
143:40.65 GECKO(115765) ==115769==    at 0x483CF9B: malloc 
(vg_replace_malloc.c:380)
143:40.65 GECKO(115765) ==115769==    by 0x402074B: malloc 
(rtld-malloc.h:56)

143:40.65 GECKO(115765) ==115769==    by 0x402074B: strdup (strdup.c:42)
143:40.65 GECKO(115765) ==115769==    by 0x4007C54: decompose_rpath 
(dl-load.c:611)
143:40.65 GECKO(115765) ==115769==    by 0x4009E9D: cache_rpath 
(dl-load.c:678)
143:40.65 GECKO(115765) ==115769==    by 0x4009E9D: cache_rpath 
(dl-load.c:659)
143:40.65 GECKO(115765) ==115769==    by 0x4009E9D: _dl_map_object 
(dl-load.c:2174)

143:40.65 GECKO(115765) ==115769==    by 0x400E4B0: openaux (dl-deps.c:64)
  ... [omission] ...

I *think* this is a valid error case of large-sized READ used in strncmp 
reading beyond the
allocated memory boundary. (strcmp.S shows 8 octets read instead of one 
octet at a time.)


I think such a usage of strdup/str{n}cmp combination is abound in C 
source codes.

So I thought maybe valgrind was reporting something different.
Otherwise, many application programs have to create suppression for this 
type of issue.

That is what I thought initially.

A different type of error I thought initially was, say, for example, 9 bytes
inside a block of size 15 might mean somehow the data contains
uninitialized data in the string area in that position.  However, come
to think of it, if so, strdup would have triggered a valgrind warning
before this.  There is no warning from valgrind for strdup.

Also, I created a test program and realized that in that case, valgrind 
prints


==120076== Conditional jump or move depends on uninitialised value(s)
==120076==    at 0x4843172: strncmp (vg_replace_strmem.c:663)
==120076==    by 0x108778: main (in /home/ishikawa/Dropbox/TB-DIR/a.out)

So the original problem must be the read beyond malloc'ed area boundary.

Now, is dl-library to blame?
I think dl-library has been used literally hundreds of million times or 
more daily and

is hard to think that there is a bug there. (Famous last word).

Dl-library does not have control how long each path strings are (I
think it is trying to record the path components of a loading path),
and thus cannot control valgrind messages generated due to 8-char read
going beyond the malloced memory end. (So probably people have to
create suppression after all. If the particular version has this
issue.)

As for valgrind, can valgrind be somehow more intelligent in this
case?  Maybe creating a substitute strcmp? (I know single char
comparison at a time would be slower than comparing 8 characters at a
time when appropriate).  But at least, this type of surprise warning
would be reduced.

However, we may have a problem here for glibc..  If this read beyond
the malloced region is for real, we have a problem.  I have no idea how
this behavior is constrained or sanctioned by C standard, C library
standard or POSIX standard, but the use of 8 octets strcmp.S can lead
to a real issue possibly unless malloc() does allocate memory chunks
in 8 or larger unit uniformly. Unless glibc makes sure that there is a 
guard area between malloc area and the 

Re: [Valgrind-users] does Valgrind-3.19.0.GIT support Clang14 dwarf 5?

2022-02-17 Thread ISHIKAWA,chiaki

Thank you for your clarification.

It certainly seems the inlined function is not handled well, come to 
think of it.


I will look at the video you kindly referred to.

Thank you again.

Chiaki

On 2022/02/16 20:24, Mark Wielaard wrote:

On Wed, Feb 16, 2022 at 12:46:32PM +0900, ISHIKAWA,chiaki wrote:

This is tangent to the original question, but this part:


with split-dwarf (which rnglistx and strx

form codes doesn't support yet)

valgrind does support bulk of data generated by --split-dwarf (gcc),
doesn't it?

No, it doesn't. valgrind won't read .dwo files/sections at all which
split-dwarf uses.


The above refers to "doesn't support"  rnglistx and strx form codes?

And addrx forms. These are normally only used (by gcc) for split-dwarf
and involve an extra indirection through an index to reach the value
of the attribute. These forms are technically valid in non-split-dwarf
(with DWARF5) but not supported by valgrind.


I am asking this  because I have checked mozilla thunderbird code
which is compiled using --split-dwarf of GCC and valgrind reports
reasonably sane diagnosis so far.

valgrind can report valid diagnostics with DWARF debuginfo, but it
might be less accurate, for example it won't report on inlined
functions.

I happened to give a talk on that recently:
https://fosdem.org/2022/schedule/event/valgrind_debuginfo/
(That talk doesn't cover split-dwarf though)

Cheers,

Mark





___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] does Valgrind-3.19.0.GIT support Clang14 dwarf 5?

2022-02-15 Thread ISHIKAWA,chiaki

This is tangent to the original question, but this part:


with split-dwarf (which rnglistx and strx

form codes doesn't support yet)

valgrind does support bulk of data generated by --split-dwarf (gcc), 
doesn't it?

The above refers to "doesn't support"  rnglistx and strx form codes?

I am asking this  because I have checked mozilla thunderbird code which 
is compiled using

--split-dwarf of GCC and
valgrind reports reasonably sane diagnosis so far.
Well, that is what I thought.
Maybe I was mistaken.

Will you kindly clarify the above?

TIA

Chiaki Ishikawa

On 2022/02/14 17:02, Mark Wielaard wrote:

Hi Thomas,

On Sun, Feb 13, 2022 at 08:18:48PM +, Thomas Wollenzin wrote:

This is a just a quick question. As I haven't found a sufficient answer in the 
archives.
Does Valgrind-3.19.0.GIT support Clang14's dwarf5 yet?

Compiling my application with '-gfull -gdwarf-4 -gdwarf64' allows valgrind to 
function as expected.
When using `-gfull -gdwarf-5 -gdwarf64` I get this failure report.

==139537== Memcheck, a memory error detector
==139537== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==139537== Using Valgrind-3.19.0.GIT and LibVEX; rerun with -h for copyright 
info
==139537== Command: ./TheAppName
==139537== Parent PID: 139504
==139537==
### unhandled dwarf2 abbrev form code 0x25
### unhandled dwarf2 abbrev form code 0x25
### unhandled dwarf2 abbrev form code 0x25
### unhandled dwarf2 abbrev form code 0x23
==139537== Valgrind: debuginfo reader: ensure_valid failed:
==139537== Valgrind:   during call to ML_(img_get)
==139537== Valgrind:   request for range [11350107, +4) exceeds
==139537== Valgrind:   valid image size of 1584416 for image:
==139537== Valgrind:   "/path/to/TheAppName"
==139537==
==139537== Valgrind: debuginfo reader: Possibly corrupted debuginfo file.
==139537== Valgrind: I can't recover.  Giving up.  Sorry.
==139537==

As you can see above, valgrind doesn't. clang uses rnglistx and strx
form codes, which are normally (by gcc) only used with split-dwarf
(which valgrind doesn't support yet). Best is to simply use -gdwarf-4
with clang.

Cheers,

Mark



___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users





___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Valgrind 3.13.0 tarball hosted at sourceware.org - legit or not?

2017-06-18 Thread ISHIKAWA,chiaki

On 2017/06/16 22:55, John Reiser wrote:

On 06/16/2017 06:31 AM, Zhiming Wang wrote:

By the way, just a suggestion, maybe you could publish the
SHA-256 checksums of release tarballs instead of MD5?


Please also publish the exact length in bytes.
This is worth _more_ than expanding the width of the checksum,
because it is easier (much easier) to produce checksum collisions
by extending the length.




It's not signed (by PGP/GPG, for example), is it? I realized that it is 
not.(!)

(I saw no trace of signature files for verification on my local PC.)

I know all the pitfalls of signing by open keys, but it still adds a 
layer of confidence, much better than a single checksum as noted above.


Thank you again for sharing a great piece of software.

TIA






--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Problem of running mozilla thunderbird under valgrind on Debian GNU/Linux 4.x kernel.

2017-02-18 Thread ISHIKAWA,chiaki
Hi,

Thank you again.

I will hopefully upload the requested info next week.
Here is what I can write down today.

What would be the appropriate upload service? [The data would be too 
large for e-mail to the list.]

On 2017/02/19 7:32, John Reiser wrote:
> How many failures occur in 10 runs of thunderbird under valgrind?

10 times, i.e., all the time under the Debian's stock newer kernel.

> How many failures occur in 10 runs if you reboot just before each run?

It never occurred to me to to reboot the system before retrying.
I will check this next week (but given the tests I did by SWITCHING 
kernel versions by rebooting to a different revision before over the 
last few months, I would say 10 times, i.e. all the times, but again let 
me check.)

>
> Thunderbird is a user mail agent that uses interactive graphics.
> How many failures occur before the display window appears, and how many after?

There is one issue: I am seeing a failure of valgrind when I try to run 
thunderbird test suite and the complicating factor here is aside from 
the available user interaction through GUI under X windows, during the 
execution of |make mozmill| test suite, there is a daemon that runs test 
scripts and talks to the main TB binary via COM interface. [I stay away 
from KB and mouse cursor during tests to avoid interfering with the test 
suite run. I do this by invoking virtual X desktop using Xephyr: the 
test suite run using Valgrind is done in that virtual desktop. If I 
wanted to, I COULD interact with thunderbird's GUI via mouse explicitly. 
I did this a few times when a bug in thunderbird or test scrips made the 
execution hung waiting for a confirmation of modal dialog, etc.]

 From what I did, the crash occurs before the display window of the 
tested thunderbird appears all the time [all the time when the valgrind 
printed mysterious Segmentation error under newer Debian kernel.

> Are the symptoms and frequency the same for a Radeon card as for NVidia?
> On the open-source NVidia driver versus the proprietary driver?
> In "dumb framebuffer" mode ("no" acceleration)?
> Please tell us which cards: "lspci -nn | grep VGA" or similar.

I am using Debian GNU/Linux inside
VirtualBox installed under Windows 10 as a platform to
develop and test thunderbird patches.
Debian GNU/Linux installed as the guest OS inside VirtualBox.

So the video graphics driver relevant here is the the VirtualBox video 
driver, I think, correct? (But there was a puzzling message in X.0.log. 
I will mention it to the answer to your second to last question.)

Under 3.19.5 kernel where the valgrind + thunderbird test suite works:

$ lspci -nn | grep VGA
00:02.0 VGA compatible controller [0300]: InnoTek Systemberatung GmbH 
VirtualBox Graphics Adapter [80ee:beef]
ishikawa@ip030:/KERNEL-SRC/kernel/linux-source-4.9$

(InnoTek is the name of original virtualbox developer.)

I am not sure if I can remove the above virtualbox graphics adaptor and 
revert to the plain VGA adaptor emulation done by VirtualBox, but let me 
try.

> Are the symptoms and frequency the same for Firefox as for thunderbird?
I am not developing or creating patches for  Firefox. Sorry.

> Are the symptoms and frequency the same for Chrome as for thunderbird?
Ditto.

Oh, you mean to ask whether I can run very simple
valgrind firefox-binary (without any test harness invovlment) under the 
new kernel and see it works?

Then I can test it.
But Chrome. I have not even installed it before.

> Please present a histogram of the {mapped file, pc offset, instruction stream}
> when the SIGSEGV happens.  [You should have at least 70 runs by now: 10 each
> for thunderbird plain, with reboot, other graphics card, other NVidia driver,
> dumb framebuffer, Firefox, Chrome.]

OK, I will gather the data (not sure what you man by "histogram", but I 
will gather what I think is relevant.)

   10 each

   for thunderbird plain,
   with reboot [I will certainly reboot before the test run.
   x 10 times with the above InnoTech driver (built-in for VirtualBox).
   [I am not sure if SIGSEGV happens under this setup.]

   for thunderbid + test suite hookup.
   I am quite certain that SIGSEGV happens under this setup.

   BTW, DOES ANYONE HAVE A GOOD IDEA ABOUT HOW TO CAPTURE the mapped 
file, etc WHEN SIGSEGV happens? It is very dynamic and by the time I am 
ready to type in shell commands, the child binary that experienced it 
may be gone. Yes, I have not been able to figure out exactly which 
process under the test suite setup started by thunderbird (under 
valgrind) is experiencing a difficulty.
I guess some clever hacking via gdb gets me started there?
BTW, valgrind's --gdb-* options are meant to debug the target under 
valgrind, NOT the segfault of valgrind itself, correct?
[And the whole thing including valgrind works under kernel 3.19.5 and 
not under later kernel drives me crasy.]

   > other graphics card, other NVidia driver, These won't apply.
   for thunderbird plain,
   dumb framebuffer [IF 

Re: [Valgrind-users] Problem of running mozilla thunderbird under valgrind on Debian GNU/Linux 4.x kernel.

2017-02-16 Thread ISHIKAWA,chiaki
On 2017/02/16 1:50, ISHIKAWA,chiaki wrote:
> On 2017/02/15 23:32, Tom Hughes wrote:
>> On 15/02/17 13:34, ISHIKAWA,chiaki wrote:
>>
>>> When I tried to run mozilla thunderbird mail client, which I create
>>> under Debian GNU/Linux 64-bit,
>>> under valgrind, valgrind mysteriously crashed and gdb was not much help.
>>
>> Well valgrind almost never "mysteriously crashes".
>>
>> In fact it is usually very verbose when anything goes wrong.
>>
>
> Hi,
>
> Thank you for your comment.
>
> The above was what I thought back in 2015 and actually I exchanged a few
> e-mails with Julian Seward about the issue back then. But we gave up on it.
> Because the system printed out "Segmentation error" without a good trace
> of anything at all (!) (which was quite surprising): We traced signals,
> and stuff. Everything we could think of using various options passed to
> valgrind (and even traced the system calls valgrind was issuing using
> strace.).
>
>
>> So the first thing you should do is to tell us in detail exactly what it
>> said when it stopped.
>
> Since gdb and various traces invoked by the options passed to valgrind
> are useless (as in the case back in 2015),
> I traced the system calls issued by valgrind.
>
> There was a MMAP call before something went wrong and signal 11 was
> issued and then
> I saw SIGSEGV passed a dozen times or so, and voila. Segmentation error
> back at the shell level.
> gdb does not print anything useful at all...
>
>>
>>> This happened under the latest 4.8.x kernel which Debian distributed as
>>> part of its testing repository.
>>>
>>> I tried a few things but subsequently reverted to kernel 3.19.5.
>>> Now thunderbird under valgrind works (!).
>>
>> So most likely this is just a new system call that valgrind doesn't
>> handle or something, in which case valgrind will have reported all the
>> details needed to fix it when it stopped.
>
> That was what I (and Julian Seward) hoped back in 2015, but valgrind did
> not. From the debugging I did over the last few months, I figured the
> problem I face is indeed as perplexing as the case back in 2015 and I
> took the easy course now: I decided that trying to find out if there is
> ANYBODY who is using valgrind and running big program under it using
> Debian GNU/Linux official kernel is easier (which I doubt based on my
> experience). Also, Julian Seward back in 2015 mentioned valgrind could
> grok thunderbird under Fedora and thus I thought it would be easier to
> figure out if someone is running 64-bit thunderbird under CentOS or
> Fedora 64-bit and compare the config to figure out what is causing the
> problem under Debian's kernel.
>
> BTW, the following is is what I found back in 2015.
>
>
> +
> Kernel version  | valgrind + C-C TB works or not
> +
> Debian  3.2.0...|  works <--- base debian version for wheezy
> +
> self-compiled   3.9.0...|  works
> +
> self-compiled  3.12.40  | works
> +
> self-compiled  3.13.11  | works
> +
>
> self-compiled  3.14.38  | ???  <--- pristine kernel hit the problem
> mentioned in the following patch and panicked. open source is
> wonderful when it works, but when it does not
> http://lkml.iu.edu/hypermail/linux/kernel/1407.3/04296.html
>
> +
> self-compiled 3.15.9| ??? <--- vanilla kernel could not bring up X
> probably because the same reason above. X
> did not start in a few minutes, and so I gave up. I did not see the
> kernel panic, though.
>
> +
> Debian backport 3.16 ...|  Segmentation fault! [Why? I have no idea.]
> +
>
> +--
> Vanilla 3.19.5  | works   (worked back in 2015 and now I have to
> revert to it...)
> +--
>
> This time arouind, I tried to figure out if I could do something similar
> using the latest kernel 4.9.x (vanilla version), hoping it might make
> valgrind run thunderbird under it without segmentation error. But the
> very late kernel caused a problem of VirtualBox utility, such as
> graphics driver that supports dynamic resizing, not supporting the
> latest kernel as guest at all, and I had to give it up.
> (Yes, I am running Debian GNU/Linux inside Virtu

Re: [Valgrind-users] Problem of running mozilla thunderbird under valgrind on Debian GNU/Linux 4.x kernel.

2017-02-15 Thread ISHIKAWA,chiaki
On 2017/02/15 23:32, Tom Hughes wrote:
> On 15/02/17 13:34, ISHIKAWA,chiaki wrote:
>
>> When I tried to run mozilla thunderbird mail client, which I create
>> under Debian GNU/Linux 64-bit,
>> under valgrind, valgrind mysteriously crashed and gdb was not much help.
>
> Well valgrind almost never "mysteriously crashes".
>
> In fact it is usually very verbose when anything goes wrong.
>

Hi,

Thank you for your comment.

The above was what I thought back in 2015 and actually I exchanged a few 
e-mails with Julian Seward about the issue back then. But we gave up on it.
Because the system printed out "Segmentation error" without a good trace 
of anything at all (!) (which was quite surprising): We traced signals, 
and stuff. Everything we could think of using various options passed to 
valgrind (and even traced the system calls valgrind was issuing using 
strace.).


> So the first thing you should do is to tell us in detail exactly what it
> said when it stopped.

Since gdb and various traces invoked by the options passed to valgrind 
are useless (as in the case back in 2015),
I traced the system calls issued by valgrind.

There was a MMAP call before something went wrong and signal 11 was 
issued and then
I saw SIGSEGV passed a dozen times or so, and voila. Segmentation error 
back at the shell level.
gdb does not print anything useful at all...

>
>> This happened under the latest 4.8.x kernel which Debian distributed as
>> part of its testing repository.
>>
>> I tried a few things but subsequently reverted to kernel 3.19.5.
>> Now thunderbird under valgrind works (!).
>
> So most likely this is just a new system call that valgrind doesn't
> handle or something, in which case valgrind will have reported all the
> details needed to fix it when it stopped.

That was what I (and Julian Seward) hoped back in 2015, but valgrind did 
not. From the debugging I did over the last few months, I figured the 
problem I face is indeed as perplexing as the case back in 2015 and I 
took the easy course now: I decided that trying to find out if there is 
ANYBODY who is using valgrind and running big program under it using 
Debian GNU/Linux official kernel is easier (which I doubt based on my 
experience). Also, Julian Seward back in 2015 mentioned valgrind could 
grok thunderbird under Fedora and thus I thought it would be easier to 
figure out if someone is running 64-bit thunderbird under CentOS or 
Fedora 64-bit and compare the config to figure out what is causing the 
problem under Debian's kernel.

BTW, the following is is what I found back in 2015.


+
Kernel version  | valgrind + C-C TB works or not
+
Debian  3.2.0...|  works <--- base debian version for wheezy
+
self-compiled   3.9.0...|  works
+
self-compiled  3.12.40  | works
+
self-compiled  3.13.11  | works
+

self-compiled  3.14.38  | ???  <--- pristine kernel hit the problem
mentioned in the following patch and panicked. open source is
wonderful when it works, but when it does not
http://lkml.iu.edu/hypermail/linux/kernel/1407.3/04296.html

+
self-compiled 3.15.9| ??? <--- vanilla kernel could not bring up X
probably because the same reason above. X
did not start in a few minutes, and so I gave up. I did not see the
kernel panic, though.

+
Debian backport 3.16 ...|  Segmentation fault! [Why? I have no idea.]
+

+--
Vanilla 3.19.5  | works   (worked back in 2015 and now I have to 
revert to it...)
+--

This time arouind, I tried to figure out if I could do something similar 
using the latest kernel 4.9.x (vanilla version), hoping it might make 
valgrind run thunderbird under it without segmentation error. But the 
very late kernel caused a problem of VirtualBox utility, such as 
graphics driver that supports dynamic resizing, not supporting the 
latest kernel as guest at all, and I had to give it up.
(Yes, I am running Debian GNU/Linux inside VirtualBox.)

Sorry, I was so tired of debugging and seeing that the current issue 
looked so much like the mysterious problem back in 2015, that I did not 
bother to pursue the issue in valgrind per se, but rather wanted to 
focus on kernel issue now.

I am running the |make mozmill| test of thunderbird which now takes 
about 48 hours and once it is over, I will switch the kernel and gather 
the gdb stack trace (which is useless) when valgrind crashes, and
also show the last part of strace (system 

Re: [Valgrind-users] Strange warning: Ignoring non-Dwarf2/3/4 block in .debug_info

2016-01-27 Thread ISHIKAWA,chiaki
This is an observation from a Debian user.

I use Debian 64bit on my home PC.

Between the end of December/beginning of January and now, something
in the Debian repository which I fetched, probably during the update
to gcc 5.3 branch, caused a significant change from the viewpoint of 
valgrind/memcheck.
Most notably c11++ runtime (or is it spelled c++11 ?) seems to have been 
introduced and
this caused massive reports of mismatched new vs free from 
valgrind/memcheck.
c++11 runtime seems to use |new| operation to create some data in a very 
primitive internal string handling function, and these string data are 
"free"ed by many other functions that my application (mozilla 
thunderbird) use. So delete vs free issues are reported. I suspect c++11 
ought to use "malloc()" for the internal string operation, but then 
maybe other parts of c++11 library may complain about malloc vs delete 
mismatch then :-(

I think those who want to move on to newer GCC (g++) and its runtime may 
want to get prepared for a surprise. I wish the developers of c++11 
runtime use valgrind/memcheck during their development cycle. Maybe they 
use addresssanitizer and don't pay attention to the free vs delete issue 
much.

Just my two cents worth.

CI



On 2016/01/19 7:13, ISHIKAWA,chiaki wrote:
> On 2016/01/18 23:32, Julian Seward wrote:
>> Chiaki,
>>
>>> First of all, thank you for sharing this great package.
>> First of all, thank you for supporting Thunderbird.  I use it all the time.
>>
>>> --11405-- WARNING: Serious error when reading debug info
>>> --11405-- When reading debug info from
>>> /usr/lib/x86_64-linux-gnu/libgdk_pixbuf-2.0.so.0.3200.3:
>>> --11405-- Ignoring non-Dwarf2/3/4 block in .debug_info
>> Hmm.  That is pretty strange.  Can you send by email (not to the
>> list) a copy of this libgdk_pixbuf-2.0.so.0.3200.3, for investigation?
>>
>> J
> Yes, I will.
>
> Chiaki
>
>
>
> --
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140
> ___
> Valgrind-users mailing list
> Valgrind-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/valgrind-users
>
>


--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140
___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Strange warning: Ignoring non-Dwarf2/3/4 block in .debug_info

2016-01-18 Thread ISHIKAWA,chiaki
On 2016/01/18 23:32, Julian Seward wrote:
> Chiaki,
>
>> First of all, thank you for sharing this great package.
> First of all, thank you for supporting Thunderbird.  I use it all the time.
>
>> --11405-- WARNING: Serious error when reading debug info
>> --11405-- When reading debug info from
>> /usr/lib/x86_64-linux-gnu/libgdk_pixbuf-2.0.so.0.3200.3:
>> --11405-- Ignoring non-Dwarf2/3/4 block in .debug_info
> Hmm.  That is pretty strange.  Can you send by email (not to the
> list) a copy of this libgdk_pixbuf-2.0.so.0.3200.3, for investigation?
>
> J

Yes, I will.

Chiaki



--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140
___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


[Valgrind-users] Strange warning: Ignoring non-Dwarf2/3/4 block in .debug_info

2016-01-16 Thread ISHIKAWA,chiaki
Hi,

First of all, thank you for sharing this great package.

I encountered a strange bug (?) while trying to use valgrind to
validate the operation of a binary produced when I was testing Mozilla
Thunderbird operation.

The observation is with valgrind -3.12.0 SVN which was compiled last
November.  the copyright header shows this.

==11405== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==11405== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for 
copyright info

Now the problem.
As soon as the binary is run under valgrind, valgrind printed many 
warnings like the following.


--11405-- WARNING: Serious error when reading debug info
--11405-- When reading debug info from 
/usr/lib/x86_64-linux-gnu/libgdk_pixbuf-2.0.so.0.3200.3:
--11405-- Ignoring non-Dwarf2/3/4 block in .debug_info
--11405-- WARNING: Serious error when reading debug info
--11405-- When reading debug info from 
/usr/lib/x86_64-linux-gnu/libgdk_pixbuf-2.0.so.0.3200.3:
--11405-- Last block truncated in .debug_info; ignoring
--11405-- WARNING: Serious error when reading debug info
--11405-- When reading debug info from 
/usr/lib/x86_64-linux-gnu/libgdk_pixbuf-2.0.so.0.3200.3:
--11405-- parse_CU_Header: is neither DWARF2 nor DWARF3 nor DWARF4
--11405-- WARNING: Serious error when reading debug info
--11405-- When reading debug info from 
/usr/lib/x86_64-linux-gnu/libgio-2.0.so.0.4600.2:
--11405-- Ignoring non-Dwarf2/3/4 block in .debug_info
--11405-- WARNING: Serious error when reading debug info
--11405-- When reading debug info from 
/usr/lib/x86_64-linux-gnu/libgio-2.0.so.0.4600.2:
--11405-- Last block truncated in .debug_info; ignoring
--11405-- WARNING: Serious error when reading debug info
--11405-- When reading debug info from 
/usr/lib/x86_64-linux-gnu/libgio-2.0.so.0.4600.2:

   There are  more repetitions for different libraries

This is under Debian GNU/Linux 64-bit version.

uname -a outoup:
Linux ip030 3.19.5 #1 SMP Mon Apr 20 08:50:21 JST 2015 x86_64 GNU/Linux

Say, for the last library file in the warning lines, |file| command 
printed something like this.
/usr/lib/x86_64-linux-gnu/libgio-2.0.so.0.4600.2: ELF 64-bit LSB shared 
object, x86-64, version 1 (SYSV), dynamically linked, 
BuildID[sha1]=f7cf7ddb040d36cb8709e6403bc19be4570754d6, stripped

Since I could link the binary using GNU LD (gold) without an obvious
issue to compile and build mozilla thunderbird locally under linux
64-bit on this PC, the library is not completely broken, I suppose.

Now using google and search for hits with the string "Ignoring
non-Dwarf2/3/4 block in .debug_info"
I immediately find a few hits:

The first one is about valgrind operation on ARM CPU.
http://sourceforge.net/p/valgrind/mailman/valgrind-users/thread/4e8534c5.6000...@bitwagon.com/
It mentions something about unaligned access.
But mine is x86_64 CPU and hugely popular one. Beside, I think x86 can 
read at any
octet boundary albeit minor performance loss if alignment is not perfect.
So I doubt anything like the ARM issue may exist. Correct ?

Second hit is from Feb 2015.
https://bugs.kde.org/show_bug.cgi?id=338781
which explains maybe "--read-var-info=yes" can be a problem (Is it on by 
default?)
and "--read-var-info=no" may solve the issue.
But it turns out it is an OSX issue. And maybe the compiler is from llvm.

The next three entries are from the same Debian bug entry.

Anyway, I tried to add --read-var-info=no, but it did not help.

The following is an excerpt from the log:
E.g.: Please understand that the invocation of valgrind is part of
a test harness and so I have to show the log line which showed how the
valgrind is invoked:

  0:01.09 PROCESS_OUTPUT: Thread-1 (pid:12190) Full command: 
['/usr/local/bin/valgrind', '--read-var-info=no', 
'--read-inline-info=no', 
'/NREF-COMM-CENTRAL/objdir-tb3/dist/bin/xpcshell', '-g', 
'/NREF-COMM-CENTRAL/objdir-tb3/dist/bin', '-a', 
'/NREF-COMM-CENTRAL/objdir-tb3/dist/bin', '-r', 
'/NREF-COMM-CENTRAL/objdir-tb3/dist/bin/components/httpd.manifest', 
'-m', '-s', '-e', 'const _HEAD_JS_PATH = 
"/NREF-COMM-CENTRAL/comm-central/mozilla/testing/xpcshell/head.js";', 
'-e', 'const _MOZINFO_JS_PATH = 
"/tmp/xpc-profile-fcjmuR/mozinfo.json";', '-e', 'const 
_TESTING_MODULES_DIR = 
"/NREF-COMM-CENTRAL/objdir-tb3/_tests/modules/";', '-f', 
'/NREF-COMM-CENTRAL/comm-central/mozilla/testing/xpcshell/head.js', 
'-p', '/tmp/xpc-plugins-wGKRGS', '-e', 'const _SERVER_ADDR = 
"localhost"', '-e', 'const _HEAD_FILES = [];', '-e', 'const _TAIL_FILES 
= [];', '-e', 'const _JSDEBUGGER_PORT = 0;', '-e', 'const _TEST_FILE = 
["/NREF-COMM-CENTRAL/objdir-tb3/_tests/xpcshell/caps/tests/unit/test_origin.js"];',
 
'-e', 'const _TEST_NAME = "caps/tests/unit/test_origin.js"', '-e', 
'_execute_test(); quit(0);']
(pid:12190) "==12190== Memcheck, a memory error detector"
  0:01.09 PROCESS_OUTPUT: Thread-1 (pid:12190) "==12190== Copyright (C) 
2002-2015, and GNU GPL'd, by Julian Seward et al."
  0:01.09 PROCESS_OUTPUT: Thread-1 (pid:12190) "==12190== 

[Valgrind-users] Can valgrind print the used uninitialized value and its size?

2015-04-23 Thread ISHIKAWA,chiaki
Hi,

I was running mozilla TB and found a message during a test run


==20163== Conditional jump or move depends on uninitialised value(s)
==20163==at 0x90974BC: nsJSObjWrapper::NP_SetProperty(NPObject*,
void*, _NPVariant const*) (nsJSNPRuntime.cpp:137)
  ...

so there is an uninitialized usage.

In this particular instance, I  realized that it would be really great
to know the value of this uninitialized memory (turns out to be
automatic variable on the stack.)
so that I can learn what has happened to the subsequent processing.
Can the uninitialized value was zero or not, for example?

Can valgrind print value of used uninitialized memory location, say,
something along the suggested manner below?

I use the above example:

  ==20163== Conditional jump or move depends on uninitialised value(s)
  ==20163== The value used was: 0xdeadbeaf (4 bytes)
  ==20163==at 0x90974BC: nsJSObjWrapper::NP_SetProperty(NPObject*,
void*, _NPVariant const*) (nsJSNPRuntime.cpp:137)
  ...

or maybe considering the possibility of byte block:

  ==20163== Conditional jump or move depends on uninitialised value(s)
  ==20163== The value used started with 0xde 0xad 0xbe 0xaf 0xaa 0xbb
0xcc ... (up to the minimum of first N octets or the size of the data,
maybe N can be 16 , etc.)
  ==20163==at 0x90974BC: nsJSObjWrapper::NP_SetProperty(NPObject*,
void*, _NPVariant const*) (nsJSNPRuntime.cpp:137)

I believe this simple addition makes the detection very useful
because it would make creating the suggested patch easier by knowing
why a program that has been used so long had a hidden issue for so many
years.

Thank you for sharing this great tool with the programming community.

Best Regards,
Chiaki Ishikawa

--
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF
___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


[Valgrind-users] Unsupported clone() ?

2013-04-09 Thread ISHIKAWA,chiaki
Sorry for a lengthy post, but
I got the following output from valgrind/memcheck while
I was testing mozilla thunderbird using its test framework, make mozmill.

By running thunderbird under memgrind while it goes through
tests, I can detect where some variables are not initialized and used, or
outright memory access violations.

While I was doing it, I got the following log:

My question and request:

1. Can this unfamiliar clone be supported?

2. The dump from valgrind/memcheck afterward, i.e., the
print out of the status of threads in the program with brief
stack traces related to task synchronization looks so useful that,
I wonder, it merits a switch to print that dump when memory error is detected
during memcheck. Can you add this feature (enabled by a command line option)?

Here is a the relevant log.

I trimmed the strange very long repetition that follows PR_WaitCondVar in the 
stack for each thread.
I think there is a bug in the print routine or something. We can skip printing 
the same information again and again.

You can see that task 1 is the main Thunderbird thread, and other information 
from this information and
without the full feature of helgrind, we may glean the overview of thread 
activity from this stack dump.
Thus the request for outputting such information when memgrind detectes memory 
error.

TB seems to be trying to generate a so called crash dump (triggered by other 
cause( and
it tries to hard to do so when it encounteres this valgrind/memcheck limitation.
Actually, from other trace data that precedes the excerpt below,
the Lock() routine near the top of the stack,
  ==803==   by 0xAD79110: mozilla::Mutex::Lock() 
  (BlockingResourceBase.cpp:227)
caused a null pointer reference inside the thread library, and that triggered 
the google's breakpad
exception handler to be invoked.
So this happened not in the smooth processing of a program.


==803==
==803== Unsupported clone() flags: 0x800600
==803==
==803== The only supported clone() uses are:
==803== - via a threads library (LinuxThreads or NPTL)
==803== - via the implementation of fork or vfork
==803== - for the Quadrics Elan3 user-space driver
==803==
==803== Valgrind detected that your program requires
==803== the following unimplemented functionality:
==803==   Valgrind does not support general clone().
==803== This may be because the functionality is hard to implement,
==803== or because no reasonable program would behave this way,
==803== or because nobody has yet needed it.  In any case, let us know at
==803== www.valgrind.org and/or try to work around the problem, if you can.
==803==
==803== Valgrind has to exit now.  Sorry.  Bye!
==803==

sched status:  === I have not asked for the following dump, but valgrind 
provided it!
   running_tid=1

Thread 1: status = VgTs_Runnable
==803==   at 0x8BA9823: 
google_breakpad::ExceptionHandler::GenerateDump(google_breakpad::ExceptionHandler::CrashContext*)
 
(linux_syscall_support.h:1685)
==803==   by 0x8BA9B7F: 
google_breakpad::ExceptionHandler::HandleSignal(int, siginfo*, void*) 
(exception_handler.cc:412)
==803==   by 0x8BA9C5F: 
google_breakpad::ExceptionHandler::SignalHandler(int, siginfo*, void*) 
(exception_handler.cc:326)
==803==   by 0x405B74F: ??? (in 
/lib/i386-linux-gnu/i686/cmov/libpthread-2.13.so)
==803==   by 0xAD79110: mozilla::Mutex::Lock() 
(BlockingResourceBase.cpp:227)
==803==   by 0x8BA08D8: CrashReporter::TakeMinidumpForChild(unsigned int, 
nsIFile**, unsigned int*) (Mutex.h:153)
==803==   by 0x8B5FA48: XRE_TakeMinidumpForChild (nsEmbedFunctions.cpp:234)
==803==   by 0xA8F60A6: 
mozilla::dom::PContentParent::TakeMinidump(nsIFile**, unsigned int*) const 
(PPluginModuleParent.cpp:1588)
==803==   by 0xA8AA365: 
mozilla::plugins::PluginModuleParent::ProcessFirstMinidump() 
(PluginModuleParent.cpp:649)
==803==   by 0xA8AA5E7: 
mozilla::plugins::PluginModuleParent::ActorDestroy(mozilla::ipc::IProtocolManagermozilla::ipc::RPCChannel::RPCListener::ActorDestroyReason)
 
(PluginModuleParent.cpp:707)
==803==   by 0xA8F829D: 
mozilla::plugins::PPluginModuleParent::DestroySubtree(mozilla::ipc::IProtocolManagermozilla::ipc::RPCChannel::RPCListener::ActorDestroyReason)
 
(PPluginModuleParent.cpp:1654)
==803==   by 0xA8F8309: 
mozilla::plugins::PPluginModuleParent::OnChannelError() 
(PPluginModuleParent.cpp:1486)
==803==   by 0xA8BA65F: 
mozilla::ipc::AsyncChannel::NotifyMaybeChannelError() (AsyncChannel.cpp:570)
==803==   by 0xA8BAD1E: 
mozilla::ipc::AsyncChannel::OnNotifyMaybeChannelError() (AsyncChannel.cpp:535)
==803==   by 0xA8685CE: RunnableMethodmozilla::dom::TabChild, void 
(mozilla::dom::TabChild::*)(), Tuple0::Run() 
(tuple.h:383)
==803==   by 0xAE3A575: MessageLoop::RunTask(Task*) (message_loop.cc:334)
==803==   by 0xAE3C274: 
MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const) 
(message_loop.cc:342)
==803==   by 0xAE3C4EE: MessageLoop::DoWork() 

Re: [Valgrind-users] Unsupported clone() ?

2013-04-09 Thread ISHIKAWA,chiaki
(2013/04/10 3:41), Julian Seward wrote:
 On 04/09/2013 07:54 PM, ISHIKAWA,chiaki wrote:
 Sorry for a lengthy post, but
 I got the following output from valgrind/memcheck while
 I was testing mozilla thunderbird using its test framework, make mozmill.

 Did you use --smc-check=all-non-file?  If not, can you retry with that please?

 J

I used --smc-check=all-non-file.

The following is how TB was invoked under valgrind in my modified make 
mozmill setup.
(sorry for unintentional linebreaks. The whole lines is actually on one line.)

valgrind --trace-children=yes --smc-check=all-non-file --gen-suppressions=all
  --track-origins=yes --malloc-fill=0xA5
  --free-fill=0xC3 --leak-check=full --num-callers=50
  
--suppressions=$HOME/TB-NEW/TB-3HG/new-src/mozilla/build/valgrind/cross-architecture.sup
  
--suppressions=$HOME/TB-NEW/TB-3HG/new-src/mozilla/build/valgrind/i386-redhat-linux-gnu.sup
  --suppressions=$HOME/Dropbox/myown.sup --show-possibly-lost=no
  /COMM-CENTRAL/objdir-tb3/mozilla/dist/bin/thunderbird-bin -profile
  /COMM-CENTRAL/objdir-tb3/mozilla/_tests/mozmill/mozmillprofile
  -jsbridge 24242 -foreground

TIA





--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis  visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Unsupported clone() ?

2013-04-09 Thread ISHIKAWA,chiaki
My comment inline:

(2013/04/10 4:01), John Reiser wrote:
 I got the following output from valgrind/memcheck while
 I was testing mozilla thunderbird using its test framework, make mozmill.

 1. Can this unfamiliar clone be supported?

 ==803== Unsupported clone() flags: 0x800600

 Perhaps.  Here is an analysis plus something that you can attempt:

 - /usr/include/bits/sched.h:
 # define CLONE_VM  0x0100 /* Set if VM shared between processes.  */

 # define CLONE_FS  0x0200 /* Set if fs info shared between processes. 
  */
 # define CLONE_FILES   0x0400 /* Set if open files shared between 
 processes.  */

 # define CLONE_VFORK   0x4000 /* Set if the parent wants the child to
   wake it up on mm_release.  */

 # define CLONE_UNTRACED 0x0080 /* Set if the tracing process can't
force CLONE_PTRACE on this clone.  */

 - coregrind/m_syswrap/syswrap-amd64-linux.c

I take that 32-bit version behaves the same.
(I am using 32-bit Debian GNU/Linux.)

 /* Only look at the flags we really care about */
 switch (cloneflags  (VKI_CLONE_VM | VKI_CLONE_FS
   | VKI_CLONE_FILES | VKI_CLONE_VFORK)) {
 case VKI_CLONE_VM | VKI_CLONE_FS | VKI_CLONE_FILES:
/* thread creation */
snip

 case VKI_CLONE_VFORK | VKI_CLONE_VM: /* vfork */
/* FALLTHROUGH - assume vfork == fork */
cloneflags = ~(VKI_CLONE_VFORK | VKI_CLONE_VM);

 case 0: /* plain fork */
snip

 default:
/* should we just ENOSYS? */
VG_(message)(Vg_UserMsg,
 Unsupported clone() flags: 0x%lx\n, ARG1);
 -

 So by omitting VKI_CLONE_VM from flags then mozilla is *not* making a new 
 thread,
 but instead is making a new process via plain fork.  However, a plain fork
 with CLONE_FS and CLONE_FILES set (that is, a separate process that *shares*
 filesystem info and open files with its parent) is not something that 
 coregrind
 understands.  I'm not sure I understand it, either.  What do the mozilla 
 comments
 say?  What is mozilla trying to accomplish by clone() with those peculiar 
 flags?

I have no idea, but I think mozilla thunderbird was trying to invoke its crash
reporter after detecting memory error on its own using a signal handler via 
SIGSEGV or something.


 After that, then the first thing to try is to tail-merge mozilla's special 
 case
 into coregrind's plain fork:

  case VKI_CLONE_FS | VKI_CLONE_FILES:  /* DEBUG: mozilla special case */
 /* FALLTHROUGH into plain fork */

  case 0: /* plain fork */

 Re-build valgrind (including make install) and try that.


Thank you for the suggestion. I will try this.

*BUT*, this memory error may or may not happen during make mozmill always.
It seems that it can be caused by memory corruption due to races and thus
highly timing dependent.

I encountered the error for the first time during testing mozilla thunderbird
since last November when I began tinkering with thunderbird and valgrind.

So I am not holding a breath, but when my run encounters another manifestation 
of the bug
and valgrind proceeds further with the above change, I will report it.

BTW, do you have any idea about adding the dump of thread info about
all the threads (maybe enabled on/off by a command switch)
when valgrind/memcheck detects serious memory error ?

After fixing the strange repetition of same addresses (numerical address),
which seems to be a bug, or we can simply omit the repetition of the same 
numbers after a few times,
the dump will be very useful.

I learned about the status of the threads in thunderbird
for the first time after seeing the dump myself :-)
(I am not the author of TB, and am just trying to figure out where some crashes 
I experienced
in real life could have been caused, so that people in the know could home
in the buggy places to fix them.)


Thank you, valgrind community, for sharing this wonderful tool!







--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis  visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] helgrind bug in pthread_cond_destroy (testcase)

2013-03-15 Thread ISHIKAWA,chiaki
Hi,

You can try the patch against 3.8.1 and
it should work as expected.

  Looks like this is:
  https://bugs.kde.org/show_bug.cgi?id=307082
  which contains an analysis and has an attached patch.

The output from the patch in the bug database may insert one extra line
to offend some users's taste, but it should be easy to fix that issue.

TIA

(2013/03/15 2:48), David Faure wrote:
 The attached testcase (which is simply pthread_cond_init +
 pthread_cond_destroy), leads to an error in helgrind:
 pthread_cond_destroy: destruction of unknown cond var

 I've seen this forever with helgrind, but it's time to clean this up :)

 However my debugging got stuck. I found out that 1) the call is given a valid
 condition variable pointer, and it actually succeeds, outside and inside
 helgrind. 2) the error message comes from this line of code:

  DO_CREQ_v_W(_VG_USERREQ__HG_PTHREAD_COND_DESTROY_PRE,
  pthread_cond_t*,cond);

 (hg_intercepts.c:940).
 How do I debug this further? This looks like a hook to me, the actual call is
 the next line,  CALL_FN_W_W(ret, fn, cond), isn't it?

 Output from helgrind (with debug output added by me)

 ==4741== Helgrind, a thread error detector
 ==4741== Copyright (C) 2007-2012, and GNU GPL'd, by OpenWorks LLP et al.
 ==4741== Using Valgrind-3.9.0.SVN and LibVEX; rerun with -h for copyright info
 ==4741== Command: ./bin/testcase_pthread_cond
 ==4741==
 pthread_cond_init(0xffefff390) said 0
 cond = 0xffefff390
 ==4741== ---Thread-Announcement--
 ==4741==
 ==4741== Thread #1 is the program's root thread
 ==4741==
 ==4741== 
 ==4741==
 ==4741== Thread #1: pthread_cond_destroy: destruction of unknown cond var
 ==4741==at 0x4C2EB28: pthread_cond_destroy_WRK (hg_intercepts.c:940)
 ==4741==by 0x4C2FA44: pthread_cond_destroy@* (hg_intercepts.c:958)
 ==4741==by 0x400AFC: main (testcase_pthread_cond.cpp:21)
 ==4741==
 pthread_cond_destroy 0xffefff390 in helgrind, AFTER DO_ and before CALL_
 pthread_cond_destroy(0xffefff390) said 0




 --
 Everyone hates slow websites. So do we.
 Make your web apps faster with AppDynamics
 Download AppDynamics Lite for free today:
 http://p.sf.net/sfu/appdyn_d2d_mar



 ___
 Valgrind-users mailing list
 Valgrind-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/valgrind-users





--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


[Valgrind-users] (A patch to helgrind) Re: Can we know more about condition variable being destroyed?

2012-12-23 Thread ISHIKAWA,chiaki
A new patch was posted to
https://bugs.kde.org/show_bug.cgi?id=307082

This cleans up the previous patch that I posted to fix the problem of
- pthread_cond_init() was not handled properly, AND
- a newly discovered problem of pthread_cond_destroy()
   erraneously destroying internal data even when
   pthread_cond_destroy() returns EBUSY and retains
   the internal data necessary for pthread library operation
   since there are still tasks waiting on the condition variable.

With this, I get a saner output for some sample programs (modified
by inserting pthread_cond_destroy() in a place or two) when
they are run under helgrind. Sample programs are
http://www.cs.kent.edu/~ruttan/sysprog/lectures/multi-thread/multi-thread.html
  specifically,
  
http://www.cs.kent.edu/~ruttan/sysprog/lectures/multi-thread/thread-pool-server.c

Another sample I used is from
 
https://www.securecoding.cert.org/confluence/display/seccode/POS54-C.+Notify+all+POSIX+threads+waiting+on+a+condition+variable+instead+of+a+single+thread
but this one caused a hung when I tinkered with pthread_cond_destroy() and I 
suspect
I created an invalid program (rather than helgrind being buggy.).

Anyway, I am now running mozilla thunderbird under helgrind to track down
a possible thread-related problem again.

Again, a review of the patch by developers of valgrind will be appreciated.

(Gee, I hate my misspellings and incorrect grammar in my post to
https://bugs.kde.org/show_bug.cgi?id=307082
I was not even drinking egg nog. A peril of posting something in the middle of 
the night when one ought to be sleeping.)

TIA


(2012/12/23 22:06), Philippe Waroquiers wrote:
 On Sun, 2012-12-23 at 03:36 +0900, ISHIKAWA,chiaki wrote:

 Since the testing is automated (and shrouded in many layers of
 test scripts), it is not easy to figure out how we can
 attach gdb to vgdbserver, and also, it is not quite clear
 how we can prepare the input for vgdbserver/gdb interaction.
 Maybe if I were to start from the scratch, I could do it.
 But mozmill test setup has been there already.
 I have no knowledge of mozmill but I suppose it is possible to run one
 single specific test of the test suite (up to now, I have never
 seen an automatic reg test setup which did not allow to run one
 single test).

 So, you should be able to do the same as what you would do for a
 manual test: just ensure the test tool passes --vgdb-error=0 (or
 rather than 0, the error nr you want to dig into).
 The app will then stop at the time of the error.

 It looks to me a lot easier/faster than having helgrind reporting
 the list of threads waiting (and if you would need other information
 such as stack trace, local var values, ... you can look at all this
 with GDB).


 Philippe







--
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Can we know more about condition variable being destroyed?

2012-12-22 Thread ISHIKAWA,chiaki
Thank you for your response.

(2012/12/23 2:58), Philippe Waroquiers wrote:

 helgrind can't really know which task is being removed from the waiting list 
 and
 so decrmenting nWaiters is all it does (I think).

 I think it does a lot more (otherwise helgrind could not follow at all what 
 would
 happen with cond variables).
 See e.g. pthread_cond_wait_WRK

I will study it a more.

 Also, does anyone have a clever idea about how to debug this situation?
 As mentionned previously, if you use vgdb, it should be trivial to find which 
 thread is doing what.
 E.g. do in a GDB attached to the Valgrind embedded gdbserver:
thread apply all bt

 The stack traces will allow to determine which thread is waiting on a cond 
 var.
 You can then examine which cond var is being waited upon.


Sorry, I was not making myself clear.

I was wondering if finding out which tasks are waiting on a given cond variable 
that was going to be destroyed
and for which pthread_cond_destroy() would return EBUSY (per POSIX)
in an AUTOMATED TEST SETUP.

If I am to debug a single program that generates problem based on my 
keyboard/mouse input,
your suggestion of using vgdbserver/gdb works very well.
(Yes, I have figured out how to run gdb from the start: --vgdb-error=0 did the 
trick.)

When I say, AUTOMATED TEST SETUP, I am talking about mozilla thunderbird
testing harness invoked by make mozmill there.
It is a rather complicated setup.
It runs a test sequence (mimicking user input by a description of user actions),
and in so doing, it show that thuderbird invokes pthread_cond_destroy() for a 
few cond vars which
helgrind thinks have still threads waiting on them.

Since the testing is automated (and shrouded in many layers of
test scripts), it is not easy to figure out how we can
attach gdb to vgdbserver, and also, it is not quite clear
how we can prepare the input for vgdbserver/gdb interaction.
Maybe if I were to start from the scratch, I could do it.
But mozmill test setup has been there already.

Thus a non-interactive, helgrind-initiated output is preferable in this case.

 Philippe

Thank you again.

Chiaki Ishikawa




--
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Can we know more about condition variable being destroyed?

2012-12-19 Thread ISHIKAWA,chiaki
(2012/12/19 3:57), Philippe Waroquiers wrote:
 On Tue, 2012-12-18 at 21:00 +0900, ISHIKAWA,chiaki wrote:
 (2012/12/18 8:07), Philippe Waroquiers wrote:
 Destruction of unknown cond var is probably/maybe bug
 https://bugs.kde.org/show_bug.cgi?id=307082

 I have produced a patch to take care of the issue.
 But before that, I have a question.

 Q1: Why does valgrind not complain if I compile  link
 Marc's code (in the bug entry which was given as a reminder that
 unknown cond var may be a bug or false positive.) in the following manner,

   cc -o /tmp/a.out marc.c
 No idea. Maybe a problem of redirection caused by static linking ?


 Q2: I have produced a work-in-progress patch to take care this issue.
 I wonder if the developers in the know can take a look and improve it.

 The patch is posted to the bug entry
 https://bugs.kde.org/show_bug.cgi?id=307082
 I took a quick look at the patch, approach looks ok to me.
 No time to look more in depth at this now however :(.


Thank you again.


With my patch, I tested mozilla thunderbird mail client under helgrind, and 
found
that most of the warning messages (destruction of unknown cond var) were bogus.
Only a few warnings now come from external libraries, and so in this sense 
mozilla
thunderbird is OK.
[And the patch does not seem to introduce serious bugs so far.]

However, I am still struggling to figure out whether I can learn
which tasks are possibly waiting on a cond var being destroyed.
The message is something like this:
  ==4103== Thread #1: pthread_cond_destroy: destruction of condition variable 
being waited upon
==4103==at 0x4027B9E: pthread_cond_destroy_WRK (hg_intercepts.c:940)
==4103==by 0x4029A01: pthread_cond_destroy@* (hg_intercepts.c:958)
==4103==by 0x47193BA: PR_DestroyCondVar (ptsynch.c:372)
==4103==by 0x5947C40: nsHTTPListener::~nsHTTPListener() (CondVar.h:56)
==4103==by 0x5947D82: nsHTTPListener::Release() (nsNSSCallbacks.cpp:536)
==4103==by 0x603EFCA: nsCOMPtr_base::assign_with_AddRef(nsISupports*) 
(nsCOMPtr.h:442)

I wanted to print out the task IDs that are waiting on this cond variables.

Now my tentative conclusion is, it is impossible to know which tasks are waiting
even going outside helgrind.

Here is my reasoning. I wonder if I am right or wrong.
I am discussing the situation in linux.

Let nWaiters be the number of tasks waiting.

1. The specification of pthread_cond_signal() does not say which task is being 
unblocked.
So all helgrind can do is to decrement nWaiters by one.
(pthread_cons_broadcast() releases all the tasks instead.)

helgrind can't really know which task is being removed from the waiting list and
so decrmenting nWaiters is all it does (I think).

2. My desire was just printing out the task ids still waiting.
OK, let me go outside helgrind.
Is it possible to do so by modifying libpthread?

So I thought I could tweak libpthread and print the task list if it
maintains a list of tasks that are waiting.

Under Debian GNU/Linux, which I use, pthread library seems to
come from libc. It is actually libc6 and is an alias of eglibc, which is
a streamlined libc that can be used on embedded systems.

So this is the source file I looked at.
I looked inside the source code and found that,
since the pthread semantics relatedto cond var is such
that the library only needs to release ONE  unspecified task
by pthread_cond_signal(), the library
does not seem to contain an explicit list of waiting
tasks.
thread library relies on a futex kernel mechanism to take
care of blocking and releasing the tasks. futex is a kernel mechanism developed 
to
take care 1-1 user/kernel task space mapping, and
thread function seems to use futex for synchronization by directly invoking 
this kernel API.
Basically, pthread functions don't use library level task list and such, but
relies exclusively on futex mechanism inside the kernel to take care task 
synchronizations.

So at user level, it is not possible to print tasks waiting on a cond var when
the cond var is being destroyed (or for that matter impossible to know the task 
ids
to begin with.)

So my tentative conclusion is it is impossible to know
which task(s) are still blocking on a cond var when the cond var
is being destroyed.

Maybe at the kernel level, we can know (not sure), but
invoking extra kernel calls just to know this internal data structure
(if possible at all), may introduce extra thread context switches due to
such kernel calls (being cancellation point maybe) and disturb
libpthread and helgrind operation...
So I am inclined to avoid it and decided to forget about it.

So I am stuck.
I thought it was easy, but going down to kernel level seems
too heavy-weight operation (AND it is not portable, and not sure
is possible to begin with.)

Actually, pthread_cond_destroy() ought to return EBUSY when there is at
least one task waiting so a careful program
can do something about such a situtaion ( but in mozilla thunderbird case,
it looks the error is printed

Re: [Valgrind-users] Can we know more about condition variable being destroyed?

2012-12-18 Thread ISHIKAWA,chiaki
(2012/12/18 8:07), Philippe Waroquiers wrote:
 Destruction of unknown cond var is probably/maybe bug
 https://bugs.kde.org/show_bug.cgi?id=307082

I have produced a patch to take care of the issue.
But before that, I have a question.

Q1: Why does valgrind not complain if I compile  link
Marc's code (in the bug entry which was given as a reminder that
unknown cond var may be a bug or false positive.) in the following manner,

 cc -o /tmp/a.out marc.c

(Note that there is no -lpthread parameter)

and then run

 valgrind --tool=helgrind /tmp/a.out

There is no warning or  error at all. (under linux, that is).

I wonder WHICH library is used for pthread_cond_init() and friends.

This is just out of curiosity. Not related to the next important
request/question.

Q2: I have produced a work-in-progress patch to take care this issue.
I wonder if the developers in the know can take a look and improve it.

The patch is posted to the bug entry
https://bugs.kde.org/show_bug.cgi?id=307082
I also posted the output of regression test for helgind.

Before the patch, valgrind 3.8.1 reported the issue as Marc reported.
(marc.c is the code reported by marc)

ishikawa@debian-vbox-ci:/tmp$ gcc -g -o /tmp/a.out  -lpthread 
~/Dropbox/marc.c
ishikawa@debian-vbox-ci:/tmp$ /tmp/a.out
ishikawa@debian-vbox-ci:/tmp$ valgrind --tool=helgrind /tmp/a.out
==22785== Helgrind, a thread error detector
==22785== Copyright (C) 2007-2012, and GNU GPL'd, by OpenWorks LLP et al.
==22785== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==22785== Command: /tmp/a.out
==22785==
==22785== ---Thread-Announcement--
==22785==
==22785== Thread #1 is the program's root thread
==22785==
==22785== 
==22785==
==22785== Thread #1: pthread_cond_destroy: destruction of unknown cond var
==22785==at 0x4027B1E: pthread_cond_destroy_WRK (hg_intercepts.c:940)
==22785==by 0x402987F: pthread_cond_destroy@* (hg_intercepts.c:958)
==22785==by 0x80484F4: main (marc.c:6)
==22785==
==22785==
==22785== For counts of detected and suppressed errors, rerun with: -v
==22785== Use --history-level=approx or =none to gain increased speed, at
==22785== the cost of reduced accuracy of conflicting-access information
==22785== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

But after the patch, it runs as follows. See the false positive warning
is no longer there.

$ valgrind --tool=helgrind /tmp/a.out
==25080== Helgrind, a thread error detector
==25080== Copyright (C) 2007-2012, and GNU GPL'd, by OpenWorks LLP et al.
==25080== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==25080== Command: /tmp/a.out
==25080==
==25080==
==25080== For counts of detected and suppressed errors, rerun with: -v
==25080== Use --history-level=approx or =none to gain increased speed, at
==25080== the cost of reduced accuracy of conflicting-access information
==25080== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
ishikawa@debian-vbox-ci:/tmp$

Not sure, though if it works with the initialized data as in
pthread_mutex_t mut = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t cond = PTHREAD_COND_INITIALIZER;

[Note: No it can't be obviously. This is because the mapping of cond var 
to CVInfo structure can not be done explicitly using the timing of
pthread_cond_init(). I tested it to confirm this observation by a slight 
modification of marc's code.]

I really want to get down to the bottom of the problem I was analyzing
(random crashes of mozilla thunderbird mail client, and I suspect
races because it is random.) Seeing a mail client crash when it was
exiting is very distressful: I have to wonder if the e-mail from an
important client that just arrived is stored properly in the file or
not before the crash :-(

So if developers in the know can take a look at the patch, and offer
suggestions for improvement I am all ears so that I can use a patched
3.8.1-b to dig down the problem with mozilla thunderbird (with the
reduced clutter of unknown cond var and I will use the suggestion to
check for task id or something using vgdbserver feature. Or maybe I
can print the address of cond variables by modifying the code.

Q3:
  I have in a corner a patch for helgrind which print symbolic information
  for the lock addresses. Patch not finished yet.
 
  Would be worth filing a wish bug in bugzilla telling that helgrind
  could use --read-var-info=yes to show more info about
  cond var addresses, lock addresses, etc.
 
  Philippe

You mean helgrind can't use the information obtained by
--read-var-info=yes (!?). That is tough, indeed.

Maybe that is why the current helgrind doesn't even seem to attempt to
print the address of cond var even in the form of offset in heap, or
offset in the stack, etc.

TIA





--
LogMeIn Rescue: Anywhere, Anytime Remote