Re: Help for SIGSEGV in test suite needed when built with gcc 8.2 what works nicely with gcc 6.3

2019-01-11 Thread Yavor Doganov
On Wed, 09 Jan 2019 22:42:43 +0200,
Andreas Tille wrote:
> The values of the structure are set in line 350[3] and are OK there.

What looks suspicious to me is that an unsigned long long value is
assigned to struct members of type size_t.  In the previous upstream
release that worked, the return value of ffparse_ulong was used which
was unsigned long.

I doubt this is the culprit but may be something worth looking at.

> I admit I fail to see why the code works under stretch with gcc 6.3
> but fails with gcc 8.2.

If the code works with an old compiler but fails with a modern one, in
99.99% of the cases it's a bug in the code.  These bugs are revealed
due to new and more aggressive optimization techniques/algorithms that
assume undefined behavior.  IOW, the code was/is buggy by definition
but you got away with it somehow.  The remaining 0.01% is due to
compiler bugs but I bet that's not the case here.



Re: Help for SIGSEGV in test suite needed when built with gcc 8.2 what works nicely with gcc 6.3

2019-01-10 Thread Andreas Tille
Hi Sune,

On Thu, Jan 10, 2019 at 06:27:47PM +, Sune Vuorela wrote:
> ...
> I looked briefly at the code, but I didn't feel like actually trying to
> understand what's going on.

Thanks a lot for this detailed analysis.  I'll forward it to bug #907624
and the upstream issue[1].  I admit I also will not try to understand
the code since I not even have an idea what the program is supposed to
do.

I think we have now provided sufficient input for upstream to track down
the issue.

Thanks again

   Andreas.

[1] https://github.com/soedinglab/ffindex_soedinglab/issues/7

-- 
http://fam-tille.de



Re: Help for SIGSEGV in test suite needed when built with gcc 8.2 what works nicely with gcc 6.3

2019-01-10 Thread Sune Vuorela
On 2019-01-09, Andrey Rahmatullin  wrote:
> As usual: reading the code, debugging, printfs. Address sanitizer and/or
> valgrind may or may not help too.

I just tried throwing some tools at it.

Apparantly you need a three step thing to get to it.

address-sanitizer. First issue. The command to create the test data to
get the error.

$ ./ffindex_build -s ./test.data ./test.ffindex test/data test/data2

=
==824==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 304 byte(s) in 1 object(s) allocated from:
#0 0x7f3393888ed0 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xe8ed0)
#1 0x7f33937994f1 in ffindex_index_parse 
/home/sune/src/ffindex-0.9.9.7.soedinglab+git20171201.74550c8/src/ffindex.c:325
#2 0x56072c890783 in main 
/home/sune/src/ffindex-0.9.9.7.soedinglab+git20171201.74550c8/src/ffindex_build.c:243
#3 0x7f33935f9b16 in __libc_start_main ../csu/libc-start.c:310

SUMMARY: AddressSanitizer: 304 byte(s) leaked in 1 allocation(s).


Oh well. rebuild without address sanitizer and run the first two steps.
Then rebuild with address sanitizer for the last step.

$ ./ffindex_modify -u ./test.ffindex b
AddressSanitizer:DEADLYSIGNAL
=
==1453==ERROR: AddressSanitizer: SEGV on unknown address 0x000ca3ff8001 (pc 
0x7f459600a9f7 bp 0x7ffd6674b8d0 sp 0x7ffd6674b8a0 T0)
==1453==The signal is caused by a READ memory access.
#0 0x7f459600a9f6 in action 
/home/sune/src/ffindex-0.9.9.7.soedinglab+git20171201.74550c8/src/ffindex.c:554
#1 0x7f45960076ed in trecursemisc 
/home/sune/src/ffindex-0.9.9.7.soedinglab+git20171201.74550c8/src/twalkmisc.h:26
#2 0x7f459600775d in trecursemisc 
/home/sune/src/ffindex-0.9.9.7.soedinglab+git20171201.74550c8/src/twalkmisc.h:31
#3 0x7f4596007827 in twalkmisc 
/home/sune/src/ffindex-0.9.9.7.soedinglab+git20171201.74550c8/src/twalkmisc.h:44
#4 0x7f459600aac3 in ffindex_tree_write 
/home/sune/src/ffindex-0.9.9.7.soedinglab+git20171201.74550c8/src/ffindex.c:563
#5 0x7f4596009f60 in ffindex_write 
/home/sune/src/ffindex-0.9.9.7.soedinglab+git20171201.74550c8/src/ffindex.c:443
#6 0x55c8564c3fa8 in main 
/home/sune/src/ffindex-0.9.9.7.soedinglab+git20171201.74550c8/src/ffindex_modify.c:182
#7 0x7f4595e69b16 in __libc_start_main ../csu/libc-start.c:310
#8 0x55c8564c3259 in _start 
(/home/sune/src/ffindex-0.9.9.7.soedinglab+git20171201.74550c8/build/src/ffindex_modify+0x2259)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV 
/home/sune/src/ffindex-0.9.9.7.soedinglab+git20171201.74550c8/src/ffindex.c:554 
in action
==1453==ABORTING

I'm not sure that gives more new info.

Lets try valgrind.

$ valgrind ./ffindex_modify -u ./test.ffindex b
==32176== Memcheck, a memory error detector
==32176== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==32176== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==32176== Command: ./ffindex_modify -u ./test.ffindex b
==32176== 
==32176== Invalid read of size 8
==32176==at 0x4846525: trecursemisc (twalkmisc.h:25)
==32176==by 0x484658E: trecursemisc (twalkmisc.h:31)
==32176==by 0x4846633: twalkmisc (twalkmisc.h:44)
==32176==by 0x4847CE0: ffindex_tree_write (ffindex.c:563)
==32176==by 0x48477C2: ffindex_write (ffindex.c:443)
==32176==by 0x10985E: main (ffindex_modify.c:182)
==32176==  Address 0x4a536e1 is 17 bytes inside a block of size 24 alloc'd
==32176==at 0x483577F: malloc (vg_replace_malloc.c:299)
==32176==by 0x4986160: tsearch (tsearch.c:338)
==32176==by 0x4847C02: ffindex_index_as_tree (ffindex.c:533)
==32176==by 0x1094D7: main (ffindex_modify.c:122)
==32176== 
==32176== Invalid read of size 8
==32176==at 0x4847C6D: action (ffindex.c:554)
==32176==by 0x4846543: trecursemisc (twalkmisc.h:26)
==32176==by 0x484658E: trecursemisc (twalkmisc.h:31)
==32176==by 0x4846633: twalkmisc (twalkmisc.h:44)
==32176==by 0x4847CE0: ffindex_tree_write (ffindex.c:563)
==32176==by 0x48477C2: ffindex_write (ffindex.c:443)
==32176==by 0x10985E: main (ffindex_modify.c:182)
==32176==  Address 0x4a53d is not stack'd, malloc'd or (recently) free'd
==32176== 
==32176== 
==32176== Process terminating with default action of signal 11 (SIGSEGV)
==32176==  Access not within mapped region at address 0x4A53D
==32176==at 0x4847C6D: action (ffindex.c:554)
==32176==by 0x4846543: trecursemisc (twalkmisc.h:26)
==32176==by 0x484658E: trecursemisc (twalkmisc.h:31)
==32176==by 0x4846633: twalkmisc (twalkmisc.h:44)
==32176==by 0x4847CE0: ffindex_tree_write (ffindex.c:563)
==32176==by 0x48477C2: ffindex_write (ffindex.c:443)
==32176==by 0x10985E: main (ffindex_modify.c:182)
==32176==  If you believe this happened as a result of a stack
==32176==  overflow in your program's main thread (unlikely but
==32176==  possible), you can try to increase the size

Re: Help for SIGSEGV in test suite needed when built with gcc 8.2 what works nicely with gcc 6.3

2019-01-09 Thread Andrey Rahmatullin
On Wed, Jan 09, 2019 at 10:49:48PM +0100, Andreas Tille wrote:
> > > to find the exact code line[2] where the SIGSEGV is thrown.  It turns out
> > > that the elements of a structure are not accessible:
> > > 
> > >(gdb) print entry->offset
> > >Cannot access memory at address 0x7
> > It's because entry is 0x7.
> 
> When I was running the code with some more debugging info activated[1]
> I had pretty valid looking adresses 0x555666 
And still SEGV?

> > > The values of the structure are set in line 350[3] and are OK there.
> > The problem is not about the structure fields but about the structure
> > pointer itself though.
> > ...
> > You need to find out why one of the tree nodes has an invalid address.
> 
> Can you propose any means to find this out?
As usual: reading the code, debugging, printfs. Address sanitizer and/or
valgrind may or may not help too.

> I have no idea about specific compiler differences.
I don't think pondering compiler differences can be helpful here, it's
most likely bad code that is working file with some compilers but is still
bad code.


-- 
WBR, wRAR


signature.asc
Description: PGP signature


Re: Help for SIGSEGV in test suite needed when built with gcc 8.2 what works nicely with gcc 6.3

2019-01-09 Thread Andreas Tille
Hi,

On Thu, Jan 10, 2019 at 02:14:14AM +0500, Andrey Rahmatullin wrote:
> On Wed, Jan 09, 2019 at 09:42:43PM +0100, Andreas Tille wrote:
> > to find the exact code line[2] where the SIGSEGV is thrown.  It turns out
> > that the elements of a structure are not accessible:
> > 
> >(gdb) print entry->offset
> >Cannot access memory at address 0x7
> It's because entry is 0x7.

When I was running the code with some more debugging info activated[1]
I had pretty valid looking adresses 0x555666 (or something in that line
just remembering by heart - can activate the patch if needed).  I have
no idea why the address is this without that extra debug code.
 
> > The values of the structure are set in line 350[3] and are OK there.
> The problem is not about the structure fields but about the structure
> pointer itself though.
> ...
> You need to find out why one of the tree nodes has an invalid address.

Can you propose any means to find this out?  I have no idea about
specific compiler differences.  BTW, I also tried to set -O0 but this
did not avoided the SIGSEGV.

Thanks for your hint anyway

  Andreas.

[1] 
https://salsa.debian.org/med-team/ffindex/blob/master/debian/patches/debug_segfault

-- 
http://fam-tille.de



Re: Help for SIGSEGV in test suite needed when built with gcc 8.2 what works nicely with gcc 6.3

2019-01-09 Thread Ole Streicher
Hi Andreas,

one thing I usually do in such cases is to rebuild the package adding
"-fsanitize=address -O0" flags (optimization just to understand better
what happens in the source). This switches the address sanitizer on
. This can
test if a local variable is accidently overwritten (by an off-by-one
error or similar). Often it finds many more bugs which one can turn
upstream into bonus points...

Otherwise I see no other chance than to go through the debugger and see
where the strange address was set. 0x7 however sounds that somewhere a
small integer was assigned to the pointer, so I would try the sanitizing
stuff first.

Cheers

Ole

Andreas Tille  writes:
> Hi,
>
> as reported in bug #907624 ffindex autopkgtest fails with SIGSEGV in sid
> and buster.  I've tested in stretch (gcc 6.3) and the code works fine.
> I've reported upstream[1] the results of my gdb session where I was able
> to find the exact code line[2] where the SIGSEGV is thrown.  It turns out
> that the elements of a structure are not accessible:
>
>(gdb) print entry->offset
>Cannot access memory at address 0x7
>
> (full gdb log under [1] or in the bug log).
>
> In fact I tried in some more detailed debugging that any attempt to
> access one of the structure elements even for instance only injecting
> something like 
>
>if ( !entry->offset ) {
>
> in line 554 will trigger the SIGSEGV.  The values of the structure are
> set in line 350[3] and are OK there.  The funktion that contains the
> failing line is action() [4] and called via a pointer to this function
> in line 563[5] (I admit I have no real idea why this pointer to a
> function should be needed.  Its the only function that is used in this
> place and IMHO only adds an extra layer of complexity.)
>
> The structure is declared in the header file[6].
>
> I admit I fail to see why the code works under stretch with gcc 6.3
> but fails with gcc 8.2.
>
> Any idea?
>
> Kind regards
>
>Andreas.
>
>
> [1] https://github.com/soedinglab/ffindex_soedinglab/issues/7
> [2] https://salsa.debian.org/med-team/ffindex/blob/master/src/ffindex.c#L554
> [3] https://salsa.debian.org/med-team/ffindex/blob/master/src/ffindex.c#L350
> [4] https://salsa.debian.org/med-team/ffindex/blob/master/src/ffindex.c#L541
> [5] https://salsa.debian.org/med-team/ffindex/blob/master/src/ffindex.c#L563
> [6] https://salsa.debian.org/med-team/ffindex/blob/master/src/ffindex.h#L30



Re: Help for SIGSEGV in test suite needed when built with gcc 8.2 what works nicely with gcc 6.3

2019-01-09 Thread Andrey Rahmatullin
On Wed, Jan 09, 2019 at 09:42:43PM +0100, Andreas Tille wrote:
> to find the exact code line[2] where the SIGSEGV is thrown.  It turns out
> that the elements of a structure are not accessible:
> 
>(gdb) print entry->offset
>Cannot access memory at address 0x7
It's because entry is 0x7.

> In fact I tried in some more detailed debugging that any attempt to
> access one of the structure elements even for instance only injecting
> something like 
> 
>if ( !entry->offset ) {
Of course this won't work, entry is 0x7.

> The values of the structure are set in line 350[3] and are OK there.
The problem is not about the structure fields but about the structure
pointer itself though.

> The funktion that contains the failing line is action() [4] and called
> via a pointer to this function in line 563[5] (I admit I have no real
> idea why this pointer to a function should be needed.  Its the only
> function that is used in this place and IMHO only adds an extra layer of
> complexity.)
No? line 563 calls twalkmisc() which walks the tree and calls action() for
each node. 

You need to find out why one of the tree nodes has an invalid address.

-- 
WBR, wRAR


signature.asc
Description: PGP signature