Bug#907624: Help for SIGSEGV in test suite needed when built with gcc 8.2 what works nicely with gcc 6.3

2019-01-11 Thread Yavor Doganov
On Wed, 09 Jan 2019 22:42:43 +0200,
Andreas Tille wrote:
> The values of the structure are set in line 350[3] and are OK there.

What looks suspicious to me is that an unsigned long long value is
assigned to struct members of type size_t.  In the previous upstream
release that worked, the return value of ffparse_ulong was used which
was unsigned long.

I doubt this is the culprit but may be something worth looking at.

> I admit I fail to see why the code works under stretch with gcc 6.3
> but fails with gcc 8.2.

If the code works with an old compiler but fails with a modern one, in
99.99% of the cases it's a bug in the code.  These bugs are revealed
due to new and more aggressive optimization techniques/algorithms that
assume undefined behavior.  IOW, the code was/is buggy by definition
but you got away with it somehow.  The remaining 0.01% is due to
compiler bugs but I bet that's not the case here.



Bug#907624: Help for SIGSEGV in test suite needed when built with gcc 8.2 what works nicely with gcc 6.3

2019-01-10 Thread songbird
In linux.debian.devel.mentors, you wrote:

hello, private reply only,

...
> When I was running the code with some more debugging info activated[1]
> I had pretty valid looking adresses 0x555666 (or something in that line
> just remembering by heart - can activate the patch if needed).  I have
> no idea why the address is this without that extra debug code.

  you should be able to set a watch on just
that address to see what is changing it?


>> > The values of the structure are set in line 350[3] and are OK there.
>> The problem is not about the structure fields but about the structure
>> pointer itself though.
>> ...
>> You need to find out why one of the tree nodes has an invalid address.
>
> Can you propose any means to find this out?  I have no idea about
> specific compiler differences.  BTW, I also tried to set -O0 but this
> did not avoided the SIGSEGV.


  songbird



Bug#907624: Help for SIGSEGV in test suite needed when built with gcc 8.2 what works nicely with gcc 6.3

2019-01-10 Thread Sune Vuorela
On 2019-01-09, Andrey Rahmatullin  wrote:
> As usual: reading the code, debugging, printfs. Address sanitizer and/or
> valgrind may or may not help too.

I just tried throwing some tools at it.

Apparantly you need a three step thing to get to it.

address-sanitizer. First issue. The command to create the test data to
get the error.

$ ./ffindex_build -s ./test.data ./test.ffindex test/data test/data2

=
==824==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 304 byte(s) in 1 object(s) allocated from:
#0 0x7f3393888ed0 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xe8ed0)
#1 0x7f33937994f1 in ffindex_index_parse 
/home/sune/src/ffindex-0.9.9.7.soedinglab+git20171201.74550c8/src/ffindex.c:325
#2 0x56072c890783 in main 
/home/sune/src/ffindex-0.9.9.7.soedinglab+git20171201.74550c8/src/ffindex_build.c:243
#3 0x7f33935f9b16 in __libc_start_main ../csu/libc-start.c:310

SUMMARY: AddressSanitizer: 304 byte(s) leaked in 1 allocation(s).


Oh well. rebuild without address sanitizer and run the first two steps.
Then rebuild with address sanitizer for the last step.

$ ./ffindex_modify -u ./test.ffindex b
AddressSanitizer:DEADLYSIGNAL
=
==1453==ERROR: AddressSanitizer: SEGV on unknown address 0x000ca3ff8001 (pc 
0x7f459600a9f7 bp 0x7ffd6674b8d0 sp 0x7ffd6674b8a0 T0)
==1453==The signal is caused by a READ memory access.
#0 0x7f459600a9f6 in action 
/home/sune/src/ffindex-0.9.9.7.soedinglab+git20171201.74550c8/src/ffindex.c:554
#1 0x7f45960076ed in trecursemisc 
/home/sune/src/ffindex-0.9.9.7.soedinglab+git20171201.74550c8/src/twalkmisc.h:26
#2 0x7f459600775d in trecursemisc 
/home/sune/src/ffindex-0.9.9.7.soedinglab+git20171201.74550c8/src/twalkmisc.h:31
#3 0x7f4596007827 in twalkmisc 
/home/sune/src/ffindex-0.9.9.7.soedinglab+git20171201.74550c8/src/twalkmisc.h:44
#4 0x7f459600aac3 in ffindex_tree_write 
/home/sune/src/ffindex-0.9.9.7.soedinglab+git20171201.74550c8/src/ffindex.c:563
#5 0x7f4596009f60 in ffindex_write 
/home/sune/src/ffindex-0.9.9.7.soedinglab+git20171201.74550c8/src/ffindex.c:443
#6 0x55c8564c3fa8 in main 
/home/sune/src/ffindex-0.9.9.7.soedinglab+git20171201.74550c8/src/ffindex_modify.c:182
#7 0x7f4595e69b16 in __libc_start_main ../csu/libc-start.c:310
#8 0x55c8564c3259 in _start 
(/home/sune/src/ffindex-0.9.9.7.soedinglab+git20171201.74550c8/build/src/ffindex_modify+0x2259)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV 
/home/sune/src/ffindex-0.9.9.7.soedinglab+git20171201.74550c8/src/ffindex.c:554 
in action
==1453==ABORTING

I'm not sure that gives more new info.

Lets try valgrind.

$ valgrind ./ffindex_modify -u ./test.ffindex b
==32176== Memcheck, a memory error detector
==32176== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==32176== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==32176== Command: ./ffindex_modify -u ./test.ffindex b
==32176== 
==32176== Invalid read of size 8
==32176==at 0x4846525: trecursemisc (twalkmisc.h:25)
==32176==by 0x484658E: trecursemisc (twalkmisc.h:31)
==32176==by 0x4846633: twalkmisc (twalkmisc.h:44)
==32176==by 0x4847CE0: ffindex_tree_write (ffindex.c:563)
==32176==by 0x48477C2: ffindex_write (ffindex.c:443)
==32176==by 0x10985E: main (ffindex_modify.c:182)
==32176==  Address 0x4a536e1 is 17 bytes inside a block of size 24 alloc'd
==32176==at 0x483577F: malloc (vg_replace_malloc.c:299)
==32176==by 0x4986160: tsearch (tsearch.c:338)
==32176==by 0x4847C02: ffindex_index_as_tree (ffindex.c:533)
==32176==by 0x1094D7: main (ffindex_modify.c:122)
==32176== 
==32176== Invalid read of size 8
==32176==at 0x4847C6D: action (ffindex.c:554)
==32176==by 0x4846543: trecursemisc (twalkmisc.h:26)
==32176==by 0x484658E: trecursemisc (twalkmisc.h:31)
==32176==by 0x4846633: twalkmisc (twalkmisc.h:44)
==32176==by 0x4847CE0: ffindex_tree_write (ffindex.c:563)
==32176==by 0x48477C2: ffindex_write (ffindex.c:443)
==32176==by 0x10985E: main (ffindex_modify.c:182)
==32176==  Address 0x4a53d is not stack'd, malloc'd or (recently) free'd
==32176== 
==32176== 
==32176== Process terminating with default action of signal 11 (SIGSEGV)
==32176==  Access not within mapped region at address 0x4A53D
==32176==at 0x4847C6D: action (ffindex.c:554)
==32176==by 0x4846543: trecursemisc (twalkmisc.h:26)
==32176==by 0x484658E: trecursemisc (twalkmisc.h:31)
==32176==by 0x4846633: twalkmisc (twalkmisc.h:44)
==32176==by 0x4847CE0: ffindex_tree_write (ffindex.c:563)
==32176==by 0x48477C2: ffindex_write (ffindex.c:443)
==32176==by 0x10985E: main (ffindex_modify.c:182)
==32176==  If you believe this happened as a result of a stack
==32176==  overflow in your program's main thread (unlikely but
==32176==  possible), you can try to increase the 

Bug#907624: Help for SIGSEGV in test suite needed when built with gcc 8.2 what works nicely with gcc 6.3

2019-01-10 Thread Ole Streicher
Hi Andreas,

one thing I usually do in such cases is to rebuild the package adding
"-fsanitize=address -O0" flags (optimization just to understand better
what happens in the source). This switches the address sanitizer on
. This can
test if a local variable is accidently overwritten (by an off-by-one
error or similar). Often it finds many more bugs which one can turn
upstream into bonus points...

Otherwise I see no other chance than to go through the debugger and see
where the strange address was set. 0x7 however sounds that somewhere a
small integer was assigned to the pointer, so I would try the sanitizing
stuff first.

Cheers

Ole

Andreas Tille  writes:
> Hi,
>
> as reported in bug #907624 ffindex autopkgtest fails with SIGSEGV in sid
> and buster.  I've tested in stretch (gcc 6.3) and the code works fine.
> I've reported upstream[1] the results of my gdb session where I was able
> to find the exact code line[2] where the SIGSEGV is thrown.  It turns out
> that the elements of a structure are not accessible:
>
>(gdb) print entry->offset
>Cannot access memory at address 0x7
>
> (full gdb log under [1] or in the bug log).
>
> In fact I tried in some more detailed debugging that any attempt to
> access one of the structure elements even for instance only injecting
> something like 
>
>if ( !entry->offset ) {
>
> in line 554 will trigger the SIGSEGV.  The values of the structure are
> set in line 350[3] and are OK there.  The funktion that contains the
> failing line is action() [4] and called via a pointer to this function
> in line 563[5] (I admit I have no real idea why this pointer to a
> function should be needed.  Its the only function that is used in this
> place and IMHO only adds an extra layer of complexity.)
>
> The structure is declared in the header file[6].
>
> I admit I fail to see why the code works under stretch with gcc 6.3
> but fails with gcc 8.2.
>
> Any idea?
>
> Kind regards
>
>Andreas.
>
>
> [1] https://github.com/soedinglab/ffindex_soedinglab/issues/7
> [2] https://salsa.debian.org/med-team/ffindex/blob/master/src/ffindex.c#L554
> [3] https://salsa.debian.org/med-team/ffindex/blob/master/src/ffindex.c#L350
> [4] https://salsa.debian.org/med-team/ffindex/blob/master/src/ffindex.c#L541
> [5] https://salsa.debian.org/med-team/ffindex/blob/master/src/ffindex.c#L563
> [6] https://salsa.debian.org/med-team/ffindex/blob/master/src/ffindex.h#L30



Bug#907624: Help for SIGSEGV in test suite needed when built with gcc 8.2 what works nicely with gcc 6.3

2019-01-09 Thread Andreas Tille
Hi,

On Thu, Jan 10, 2019 at 02:14:14AM +0500, Andrey Rahmatullin wrote:
> On Wed, Jan 09, 2019 at 09:42:43PM +0100, Andreas Tille wrote:
> > to find the exact code line[2] where the SIGSEGV is thrown.  It turns out
> > that the elements of a structure are not accessible:
> > 
> >(gdb) print entry->offset
> >Cannot access memory at address 0x7
> It's because entry is 0x7.

When I was running the code with some more debugging info activated[1]
I had pretty valid looking adresses 0x555666 (or something in that line
just remembering by heart - can activate the patch if needed).  I have
no idea why the address is this without that extra debug code.
 
> > The values of the structure are set in line 350[3] and are OK there.
> The problem is not about the structure fields but about the structure
> pointer itself though.
> ...
> You need to find out why one of the tree nodes has an invalid address.

Can you propose any means to find this out?  I have no idea about
specific compiler differences.  BTW, I also tried to set -O0 but this
did not avoided the SIGSEGV.

Thanks for your hint anyway

  Andreas.

[1] 
https://salsa.debian.org/med-team/ffindex/blob/master/debian/patches/debug_segfault

-- 
http://fam-tille.de



Bug#907624: Help for SIGSEGV in test suite needed when built with gcc 8.2 what works nicely with gcc 6.3

2019-01-09 Thread Andreas Tille
Hi,

as reported in bug #907624 ffindex autopkgtest fails with SIGSEGV in sid
and buster.  I've tested in stretch (gcc 6.3) and the code works fine.
I've reported upstream[1] the results of my gdb session where I was able
to find the exact code line[2] where the SIGSEGV is thrown.  It turns out
that the elements of a structure are not accessible:

   (gdb) print entry->offset
   Cannot access memory at address 0x7

(full gdb log under [1] or in the bug log).

In fact I tried in some more detailed debugging that any attempt to
access one of the structure elements even for instance only injecting
something like 

   if ( !entry->offset ) {

in line 554 will trigger the SIGSEGV.  The values of the structure are
set in line 350[3] and are OK there.  The funktion that contains the
failing line is action() [4] and called via a pointer to this function
in line 563[5] (I admit I have no real idea why this pointer to a
function should be needed.  Its the only function that is used in this
place and IMHO only adds an extra layer of complexity.)

The structure is declared in the header file[6].

I admit I fail to see why the code works under stretch with gcc 6.3
but fails with gcc 8.2.

Any idea?

Kind regards

   Andreas.


[1] https://github.com/soedinglab/ffindex_soedinglab/issues/7
[2] https://salsa.debian.org/med-team/ffindex/blob/master/src/ffindex.c#L554
[3] https://salsa.debian.org/med-team/ffindex/blob/master/src/ffindex.c#L350
[4] https://salsa.debian.org/med-team/ffindex/blob/master/src/ffindex.c#L541
[5] https://salsa.debian.org/med-team/ffindex/blob/master/src/ffindex.c#L563
[6] https://salsa.debian.org/med-team/ffindex/blob/master/src/ffindex.h#L30

-- 
http://fam-tille.de



Bug#907624: Help for SIGSEGV in test suite needed when built with gcc 8.2 what works nicely with gcc 6.3

2019-01-09 Thread Andrey Rahmatullin
On Wed, Jan 09, 2019 at 10:49:48PM +0100, Andreas Tille wrote:
> > > to find the exact code line[2] where the SIGSEGV is thrown.  It turns out
> > > that the elements of a structure are not accessible:
> > > 
> > >(gdb) print entry->offset
> > >Cannot access memory at address 0x7
> > It's because entry is 0x7.
> 
> When I was running the code with some more debugging info activated[1]
> I had pretty valid looking adresses 0x555666 
And still SEGV?

> > > The values of the structure are set in line 350[3] and are OK there.
> > The problem is not about the structure fields but about the structure
> > pointer itself though.
> > ...
> > You need to find out why one of the tree nodes has an invalid address.
> 
> Can you propose any means to find this out?
As usual: reading the code, debugging, printfs. Address sanitizer and/or
valgrind may or may not help too.

> I have no idea about specific compiler differences.
I don't think pondering compiler differences can be helpful here, it's
most likely bad code that is working file with some compilers but is still
bad code.


-- 
WBR, wRAR


signature.asc
Description: PGP signature


Bug#907624: Help for SIGSEGV in test suite needed when built with gcc 8.2 what works nicely with gcc 6.3

2019-01-09 Thread Andrey Rahmatullin
On Wed, Jan 09, 2019 at 09:42:43PM +0100, Andreas Tille wrote:
> to find the exact code line[2] where the SIGSEGV is thrown.  It turns out
> that the elements of a structure are not accessible:
> 
>(gdb) print entry->offset
>Cannot access memory at address 0x7
It's because entry is 0x7.

> In fact I tried in some more detailed debugging that any attempt to
> access one of the structure elements even for instance only injecting
> something like 
> 
>if ( !entry->offset ) {
Of course this won't work, entry is 0x7.

> The values of the structure are set in line 350[3] and are OK there.
The problem is not about the structure fields but about the structure
pointer itself though.

> The funktion that contains the failing line is action() [4] and called
> via a pointer to this function in line 563[5] (I admit I have no real
> idea why this pointer to a function should be needed.  Its the only
> function that is used in this place and IMHO only adds an extra layer of
> complexity.)
No? line 563 calls twalkmisc() which walks the tree and calls action() for
each node. 

You need to find out why one of the tree nodes has an invalid address.

-- 
WBR, wRAR


signature.asc
Description: PGP signature