Re: Help needed with new segfaults in frame unwinding under gcc8

2018-02-26 Thread Jakub Jelinek
On Mon, Feb 26, 2018 at 02:42:40PM -0600, Jason L Tibbitts III wrote:
> > "JJ" == Jakub Jelinek  writes:
> 
> JJ> Ok, so the problem is ignoring important warnings, in this case:
> JJ> imap/xapian_wrap.cpp: In function 'int
> JJ> stem_version_set(Xapian::WritableDatabase*, int)':
> JJ> imap/xapian_wrap.cpp:267:1: warning: no return statement in function
> JJ> returning non-void [-Wreturn-type]
> 
> OK, so good, that's bad upstream code.  I have given good odds on this
> being an upstream issue all along, but because the backtraces go through
> the unwinder it seemed obvious to look there first.
> 
> I do have to ask, though; is it considered good behavior to just
> segfault like this?  I understand that undefined behavior is just that,

Yes.  An optimizing compiler simply attempts to optimize, using assumption
that a valid program doesn't invoke undefined behavior.  If you don't care
about performance and want these runtime bugs to be diagnosed, we have
the sanitizers where it will be reported as a bug at runtime.

> but it seems like there has to be a better way to fail that at least
> looks like it's not a GCC bug.
> 
> And, finally, if doing this in C++ code is so bad, why isn't this simply
> an error?

Because the warning could have false positives (some cases where the
compiler can't prove no fallthrough to last line of function can happen
and warns) and that nothing is actually wrong if either the function
isn't called or whenever it is called takes a different path or throw an
exception or similar.

Compared to previous GCC versions, in C++ the warning is now emitted by
default even without -Wreturn-type or -Wall.

Jakub
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Help needed with new segfaults in frame unwinding under gcc8

2018-02-26 Thread Jason L Tibbitts III
> "JJ" == Jakub Jelinek  writes:

JJ> Ok, so the problem is ignoring important warnings, in this case:
JJ> imap/xapian_wrap.cpp: In function 'int
JJ> stem_version_set(Xapian::WritableDatabase*, int)':
JJ> imap/xapian_wrap.cpp:267:1: warning: no return statement in function
JJ> returning non-void [-Wreturn-type]

OK, so good, that's bad upstream code.  I have given good odds on this
being an upstream issue all along, but because the backtraces go through
the unwinder it seemed obvious to look there first.

I do have to ask, though; is it considered good behavior to just
segfault like this?  I understand that undefined behavior is just that,
but it seems like there has to be a better way to fail that at least
looks like it's not a GCC bug.

And, finally, if doing this in C++ code is so bad, why isn't this simply
an error?

JJ> So, effective summary, in C++ >>NEVER<< ignore -Wreturn-type
JJ> warning.

Thanks for the explanation; I've passed it on to upstream as
https://github.com/cyrusimap/cyrus-imapd/issues/2267

 - J<
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Help needed with new segfaults in frame unwinding under gcc8

2018-02-26 Thread Jakub Jelinek
On Fri, Feb 23, 2018 at 05:40:23PM -0600, Jason L Tibbitts III wrote:
> It creates various search indices for the IMAP server (which is why it
> wraps Xapian).  The manpage is available from
> https://cyrusimap.org/imap/reference/manpages/systemcommands/squatter.html
> Technically the test suite simply creates the conditions for it to do
> its work (in a randomly named directory) and then calls squatter
> directly.  You should be able to see the arguments in the test suite's
> output.

Ok, so the problem is ignoring important warnings, in this case:
imap/xapian_wrap.cpp: In function 'int 
stem_version_set(Xapian::WritableDatabase*, int)':
imap/xapian_wrap.cpp:267:1: warning: no return statement in function returning 
non-void [-Wreturn-type]

Don't do that, especially not for C++.  While for C there is UB only if you
actually use the returned value that hasn't been returned, i.e. in the
caller, and if you never use the return value, nothing bad happens,
in C++ the UB happens already in the callee, so the compiler in:
static int stem_version_set(Xapian::WritableDatabase *database, int version)
{
std::ostringstream convert;
convert << version;
database->set_metadata(XAPIAN_STEM_VERSION_KEY, convert.str());
}
can (and does) assume that if this function is called, then it will never
fall-through from the last call into following code, because that is UB,
even when the caller is doing just:
stem_version_set(dbw->database, dbw->stem_version);

Didn't have time to try that, but I think -fsanitize=undefined should have
diagnosed that too in addition to the warning.  And the effect in the
generated code is just that there is no edge from that set_metdata method
call to anything that follows it, so the call is followed by whatever other
code happened to be emitted next, in this case some C++ EH code that ends
with _Unwind_Resume and assumed it is invoked only when throwing or
rethrowing an exception.

So, effective summary, in C++ >>NEVER<< ignore -Wreturn-type warning.

Jakub
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Help needed with new segfaults in frame unwinding under gcc8

2018-02-23 Thread Jason L Tibbitts III
I'm sorry, for the first time in decades I actually hit the send
key sequence (Ctrl-C Ctrl-C) accidentally.

> "JJ" == Jakub Jelinek  writes:

JJ> Strangely, --enable-network to mock was really needed so that the
JJ> first testsuite passes.

That's absolutely bizarre.

JJ> I can now get the coredumps, but I'm afraid I really need a way to
JJ> reproduce it under gdb, from the core dump there isn't sufficient
JJ> information available.

I just now found a way.  See below.

JJ> So, what is this squatter process about, with what command line
JJ> options is it invoked, how does it interact with other processes, is
JJ> it single-threaded?

It creates various search indices for the IMAP server (which is why it
wraps Xapian).  The manpage is available from
https://cyrusimap.org/imap/reference/manpages/systemcommands/squatter.html
Technically the test suite simply creates the conditions for it to do
its work (in a randomly named directory) and then calls squatter
directly.  You should be able to see the arguments in the test suite's
output.

You can get down into the mock chroot and run tests manually:

mock -r fedora-rawhide-x86_64 --shell

and then enter:

su - mockbuild
cd /builddir/build/BUILD/cyrus-imapd-3.0.5/cassandane
export CYRUS_USER=$USER
export 
LD_LIBRARY_PATH=/builddir/build/BUILDROOT/cyrus-imapd-3.0.5-4.fc29.x86_64/usr/lib64
rm -rf work; mkdir work
./testrunner.pl -vvv -f pretty SearchFuzzy.xapianv2

(You may have to adjust LD_LIBRARY_PATH there; using a glob fails to
work for me for reasons I don't understand.)

That will run just one test.  Down in the log you'll see something like:

=> Instance[1506] Running: 
"/builddir/build/BUILDROOT/cyrus-imapd-3.0.5-4.fc28.x86_64/usr/sbin/squatter" 
"-C" 
"/builddir/build/BUILD/cyrus-imapd-3.0.5/cassandane/work/2322151/conf/imapd.conf"

A bit further you see the output from that process:

2322151/squatter[50141]: SQL backend defaulting to engine 'pgsql'
2322151/squatter[50141]: indexing mailboxes
2322151/squatter[50141]: indexing mailbox user.cassandane...

and at that point it segfaults.

If at this point you "rm -rf work/224058I1/search" (to clear out the
partially created search database) and then run:

/builddir/build/BUILDROOT/cyrus-imapd-3.0.5-4.fc28.x86_64/usr/sbin/squatter -C 
work/2322151/conf/imapd.conf

you should see the same segfault.  For me it does still segfault when
run under gdb, though you will have to delete that "search" directory
each time or the process won't fail.

JJ> What I can see is that the exc passed to _Unwind_Resume points to
JJ> some memory inside of the xapian_dbw_open C++ function's stack frame
JJ> which indeed contains multiple try/catch blocks, even nested ones;

That's about as far as I have been able to comprehend.

Just to be sure, I have done builds with fedora-rpm-macros and glibc
rolled back to the versions January 21 (the last successful koschei
build) and sadly things still fail in the same way.

 - J<
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Help needed with new segfaults in frame unwinding under gcc8

2018-02-23 Thread Jason L Tibbitts III
I'm sorry, for the first time in decades I actually hit the send
key sequence (Ctrl-C Ctrl-C) accidentally.

> "JJ" == Jakub Jelinek  writes:

JJ> Strangely, --enable-network to mock was really needed so that the
JJ> first testsuite passes.

That's absolutely bizarre.

JJ> I can now get the coredumps, but I'm afraid I really need a way to
JJ> reproduce it under gdb, from the core dump there isn't sufficient
JJ> information available.

I just now found a way.  See below.

JJ> So, what is this squatter process about, with what command line
JJ> options is it invoked, how does it interact with other processes, is
JJ> it single-threaded?

It creates various search indices for the IMAP server (which is why it
wraps Xapian).  The manpage is available from
https://cyrusimap.org/imap/reference/manpages/systemcommands/squatter.html
Technically the test suite simply creates the conditions for it to do
its work (in a randomly named directory) and then calls squatter
directly.  You should be able to see the arguments in the test suite's
output.

You can get down into the mock chroot and run tests manually:

mock -r fedora-rawhide-x86_64 --shell

and then enter:

su - mockbuild
cd /builddir/build/BUILD/cyrus-imapd-3.0.5/cassandane
export CYRUS_USER=$USER
export 
LD_LIBRARY_PATH=/builddir/build/BUILDROOT/cyrus-imapd-3.0.5-4.fc29.x86_64/usr/lib64
rm -rf work; mkdir work
./testrunner.pl -vvv -f pretty SearchFuzzy.xapianv2

(You may have to adjust LD_LIBRARY_PATH there; using a glob fails to
work for me for reasons I don't understand.)

That will run just one test.  Down in the log you'll see something like:

=> Instance[1506] Running: 
"/builddir/build/BUILDROOT/cyrus-imapd-3.0.5-4.fc28.x86_64/usr/sbin/squatter" 
"-C" 
"/builddir/build/BUILD/cyrus-imapd-3.0.5/cassandane/work/2322151/conf/imapd.conf"

A bit further you see the output from that process:

2322151/squatter[50141]: SQL backend defaulting to engine 'pgsql'
2322151/squatter[50141]: indexing mailboxes
2322151/squatter[50141]: indexing mailbox user.cassandane...

and at that point it segfaults.

If at this point you "rm -rf work/224058I1/search" (to clear out the
partially created search database) and then run:

/builddir/build/BUILDROOT/cyrus-imapd-3.0.5-4.fc28.x86_64/usr/sbin/squatter -C 
work/2322151/conf/imapd.conf

you should see the same segfault.  For me it does still segfault when
run under gdb, though you will have to delete that "search" directory
each time or the process won't fail.

JJ> What I can see is that the exc passed to _Unwind_Resume points to
JJ> some memory inside of the xapian_dbw_open C++ function's stack frame
JJ> which indeed contains multiple try/catch blocks, even nested ones;

That's about as far as I have been able to comprehend.

Just to be sure, I have done builds with fedora-rpm-macros and glibc
rolled back to the versions January 21 (the last successful koschei
build) and sadly things still fail in the same way.

 - J<
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Help needed with new segfaults in frame unwinding under gcc8

2018-02-23 Thread Jason L Tibbitts III
> "JJ" == Jakub Jelinek  writes:

JJ> Strangely, --enable-network to mock was really needed so that the
JJ> first testsuite passes.

That's absolutely bizarre.

JJ> I can now get the coredumps, but I'm afraid I really need a way to
JJ> reproduce it under gdb, from the core dump there isn't sufficient
JJ> information available.

And unfortunately I haven't been able to create one.

So, what is this squatter process about,
JJ> with what command line options is it invoked, how does it interact
JJ> with other processes, is it single-threaded?

JJ> What I can see is that the exc passed to _Unwind_Resume points to
JJ> some memory inside of the xapian_dbw_open C++ function's stack frame
JJ> which indeed contains multiple try/catch blocks, even nested ones;
JJ> but I find it strange that _Unwind_Exception objects would be on the
JJ> stack, they should be in malloced objects (at the end of
JJ> __cxa_exception e.g. for C++ exceptions), or come from the emergency
JJ> pool if malloc would fail (unlikely in this case).  The
JJ> _Unwind_Exception contains completely bogus values not just in one
JJ> field, but in all of them.

JJ> Jakub
JJ> ___ devel mailing list
JJ> -- devel@lists.fedoraproject.org To unsubscribe send an email to
JJ> devel-le...@lists.fedoraproject.org

-- 
 Jason L Tibbitts III - ti...@math.uh.edu - 713/743-3486 - 660PGH
 System Manager:  University of Houston Department of Mathematics 
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Help needed with new segfaults in frame unwinding under gcc8

2018-02-23 Thread Jason L Tibbitts III
Some more information:

I looked through the Koschei logs for this package and it seemed it
actually started failing before gcc8 went in.  Specifically, the first
failed build was on January 22, and the changes for that run were:

mariadb-devel  3:10.2.12-2.fc28   -> 3:10.2.12-3.fc28
binutils   2.29.1-12.fc28 -> 2.29.1-13.fc28
glibc-common   2.26.9000-46.fc28  -> 2.26.9000-48.fc28
perl-DateTime-TimeZone 2.15-1.fc28-> 2.16-1.fc28
libxcrypt  4.0.0-0.203.2018012... -> 4.0.0-0.204.2018012...
glibc  2.26.9000-46.fc28  -> 2.26.9000-48.fc28
netpbm-progs   10.81.00-1.fc28-> 10.81.00-2.fc28
glibc-devel2.26.9000-46.fc28  -> 2.26.9000-48.fc28
libxcrypt-devel4.0.0-0.203.2018012... -> 4.0.0-0.204.2018012...
netpbm 10.81.00-1.fc28-> 10.81.00-2.fc28
ca-certificates2017.2.20-5.fc28   -> 2017.2.20-6.fc28
glibc-headers  2.26.9000-46.fc28  -> 2.26.9000-48.fc28
libnsl 2.26.9000-46.fc28  -> 2.26.9000-48.fc28
glibc-all-langpacks2.26.9000-46.fc28  -> 2.26.9000-48.fc28
perl-IO-Socket-SSL 2.052-1.fc28   -> 2.054-1.fc28
redhat-rpm-config  79-1.fc28  -> 84-1.fc28


The glibc and redhat-rpm-config updates are most interesting as the
latter changed the default compiler flags.  Unfortunately the log is no
longer available, nor is any log from any koschei run previous to gcc8
landing.

So I have more to investigate, including trying to roll back the
redhat-rpm-config changes.

 - J<
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Help needed with new segfaults in frame unwinding under gcc8

2018-02-23 Thread Jakub Jelinek
On Fri, Feb 23, 2018 at 02:17:34PM -0600, Jason L Tibbitts III wrote:
> > "JJ" == Jakub Jelinek  writes:
> 
> JJ> Well, I'm seeing
> [...]
> 
> So that's the first of the two test suites, which I've seen fail before
> but I haven't invested any time into debugging it.  It's completely
> unrelated to any of the failures I'm talking about.
> 
> You can just remove or comment the line "make check || exit 1" in the
> spec to skip that test suite and get down to the one that crashes.
> 
> JJ> Does it need networking outside of the mock (i.e. shall I retry with
> JJ> --enable-networking)?
> 
> There should be no need; it certainly doesn't need networking.  That's
> just a cunit-based test suite and outside of a couple of cases where I
> was randomly trying things to get better debugging for the unwinder
> segfault at hand, I've not had any real problems with it.

Strangely, --enable-network to mock was really needed so that the first 
testsuite
passes.

I can now get the coredumps, but I'm afraid I really need a way to reproduce
it under gdb, from the core dump there isn't sufficient information
available.  So, what is this squatter process about, with what command line
options is it invoked, how does it interact with other processes, is it
single-threaded?

What I can see is that the exc passed to _Unwind_Resume points to some
memory inside of the xapian_dbw_open C++ function's stack frame which
indeed contains multiple try/catch blocks, even nested ones; but I find
it strange that _Unwind_Exception objects would be on the stack, they should
be in malloced objects (at the end of __cxa_exception e.g. for C++
exceptions), or come from the emergency pool if malloc would fail (unlikely
in this case).  The _Unwind_Exception contains completely bogus values not
just in one field, but in all of them.

Jakub
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Help needed with new segfaults in frame unwinding under gcc8

2018-02-23 Thread Jason L Tibbitts III
> "JJ" == Jakub Jelinek  writes:

JJ> Well, I'm seeing
[...]

So that's the first of the two test suites, which I've seen fail before
but I haven't invested any time into debugging it.  It's completely
unrelated to any of the failures I'm talking about.

You can just remove or comment the line "make check || exit 1" in the
spec to skip that test suite and get down to the one that crashes.

JJ> Does it need networking outside of the mock (i.e. shall I retry with
JJ> --enable-networking)?

There should be no need; it certainly doesn't need networking.  That's
just a cunit-based test suite and outside of a couple of cases where I
was randomly trying things to get better debugging for the unwinder
segfault at hand, I've not had any real problems with it.

Thanks to tmz on IRC I have some independent verification of this (he
gets the same failures) and have a way to reproduce this inside mock
inside a docker container.

 - J<
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Help needed with new segfaults in frame unwinding under gcc8

2018-02-23 Thread Jakub Jelinek
On Fri, Feb 23, 2018 at 11:54:28AM -0600, Jason L Tibbitts III wrote:
> > "JJ" == Jakub Jelinek  writes:
> 
> JJ> Haven't managed to reproduce it, while there are some testsuite
> JJ> failures, they are due to timeouts and e.g. dmesg doesn't show any
> JJ> segvs nor traps.
> 
> Of course I can reproduce this easily and it also happens in the
> buildsystem.  The first failure after the gcc update is
> https://koji.fedoraproject.org/koji/taskinfo?taskID=25054443 Note that
> it fails on all architectures except for s390x, and that succeeds only
> because the test suite is disabled there.

Well, I'm seeing
Suite: backend
  Test: badhost ...FAILED
1. cunit/unit.c:133  - CU_FAIL_FATAL("Code under test timed out")
  Test: badservice ...
unit: code under test 
(/builddir/build/BUILD/cyrus-imapd-3.0.5/cunit/backend.testc:test_badhost) 
timed out
passed
  Test: sasl_plain ...FAILED
1. cunit/unit.c:133  - CU_FAIL_FATAL("Code under test timed out")
  Test: sasl_digestmd5 ...
unit: code under test 
(/builddir/build/BUILD/cyrus-imapd-3.0.5/cunit/backend.testc:test_sasl_plain) 
timed out
FAILED
and many others, supposedly the issue you have is after this make check
and thus my mock build doesn't even reach there.
Does it need networking outside of the mock (i.e. shall I retry with
--enable-networking)?

Jakub
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Help needed with new segfaults in frame unwinding under gcc8

2018-02-23 Thread Jason L Tibbitts III
> "JJ" == Jakub Jelinek  writes:

JJ> Haven't managed to reproduce it, while there are some testsuite
JJ> failures, they are due to timeouts and e.g. dmesg doesn't show any
JJ> segvs nor traps.

Of course I can reproduce this easily and it also happens in the
buildsystem.  The first failure after the gcc update is
https://koji.fedoraproject.org/koji/taskinfo?taskID=25054443 Note that
it fails on all architectures except for s390x, and that succeeds only
because the test suite is disabled there.

Of course something could have changed in rawhide between now and when I
did this last night, so I'll rerun everything.  But I did do aw scratch
build (with an abbreviated test suite run) still shows the failures
which can be viewed here:
https://koji.fedoraproject.org/koji/taskinfo?taskID=25261672 Not that
it's useful to you unless you can get the cores out of koji.  (The s390x
build fails for different reasons.)

 - J<
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Help needed with new segfaults in frame unwinding under gcc8

2018-02-23 Thread Jakub Jelinek
On Fri, Feb 23, 2018 at 10:56:47AM -0600, Jason L Tibbitts III wrote:
> > "JJ" == Jakub Jelinek  writes:
> 
> JJ> Can I get detailed info on how to reproduce this (most importantly,
> JJ> which src.rpm you are trying to build)?
> 
> Sorry, that was all in the original message.
> 
> The package is cyrus-imapd; all you have to do is check it out and do
> fedpkg mockbuild (on the rawhide branch).  The test suite should
> generate 22 core files, each of which should give a mostly identical
> backtrace.  Unfortunately I can't get the crash to happen when running
> outside of mock.  I also can't get it when directly executing the
> crashing program while inside the chroot (via mock --shell); it has to
> be run by the test suite and in mock.  I haven't yet been able to figure
> out why.

Haven't managed to reproduce it, while there are some testsuite failures,
they are due to timeouts and e.g. dmesg doesn't show any segvs nor traps.

This was with mock -r fedora-rawhide-x86_64 ./cyrus-imapd-3.0.5-3.fc28.src.rpm

Jakub
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Help needed with new segfaults in frame unwinding under gcc8

2018-02-23 Thread Jason L Tibbitts III
> "JJ" == Jakub Jelinek  writes:

JJ> Can I get detailed info on how to reproduce this (most importantly,
JJ> which src.rpm you are trying to build)?

Sorry, that was all in the original message.

The package is cyrus-imapd; all you have to do is check it out and do
fedpkg mockbuild (on the rawhide branch).  The test suite should
generate 22 core files, each of which should give a mostly identical
backtrace.  Unfortunately I can't get the crash to happen when running
outside of mock.  I also can't get it when directly executing the
crashing program while inside the chroot (via mock --shell); it has to
be run by the test suite and in mock.  I haven't yet been able to figure
out why.

To get the test suite run before rpm strips out all of the debug
information, you can just delete the %check line from the spec before
building.

 - J<
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Help needed with new segfaults in frame unwinding under gcc8

2018-02-23 Thread Tom Hughes

On 23/02/18 13:16, Jakub Jelinek wrote:

On Thu, Feb 22, 2018 at 01:34:00PM -0800, John Reiser wrote:

Looking at the code:
= gcc/libgcc/unwind.inc
  _Unwind_ForcedUnwind_Phase2 (struct _Unwind_Exception *exc,
   struct _Unwind_Context *context,
   unsigned long *frames_p)
  {
_Unwind_Stop_Fn stop = (_Unwind_Stop_Fn) (_Unwind_Ptr) exc->private_1;
  <>
stop_code = (*stop) (1, action, exc->exception_class, exc,
 context, stop_argument);
=
we see that function pointer 'stop' is cast from an untyped word 'private_1'
with no checking at all, not even for NULL or < PAGE_SIZE, etc.
This is a giant red flag for unreliable code.


Wrong.  Just look at what the callers do:
   if (exc->private_1 == 0)
 code = _Unwind_RaiseException_Phase2 (exc, _context, );
   else
 code = _Unwind_ForcedUnwind_Phase2 (exc, _context, );
and
   if (exc->private_1 == 0)
 return _Unwind_RaiseException (exc);

   uw_init_context (_context);
   cur_context = this_context;

   code = _Unwind_ForcedUnwind_Phase2 (exc, _context, );
So, _Unwind_ForcedUnwind_Phase2 is not called if private_1 is NULL.


Judging by the valgrind report it is undefined, and likely has the
value 0x120 or something close to that, which is clearly bogus but
good enough to get past the null check ;-)


Can I get detailed info on how to reproduce this (most importantly, which
src.rpm you are trying to build)?


I believe it's cyrus-imapd.

Tom

--
Tom Hughes (t...@compton.nu)
http://compton.nu/
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Help needed with new segfaults in frame unwinding under gcc8

2018-02-23 Thread Jakub Jelinek
On Thu, Feb 22, 2018 at 01:34:00PM -0800, John Reiser wrote:
> Looking at the code:
> = gcc/libgcc/unwind.inc
>  _Unwind_ForcedUnwind_Phase2 (struct _Unwind_Exception *exc,
>   struct _Unwind_Context *context,
>   unsigned long *frames_p)
>  {
>_Unwind_Stop_Fn stop = (_Unwind_Stop_Fn) (_Unwind_Ptr) exc->private_1;
>  <>
>stop_code = (*stop) (1, action, exc->exception_class, exc,
> context, stop_argument);
> =
> we see that function pointer 'stop' is cast from an untyped word 'private_1'
> with no checking at all, not even for NULL or < PAGE_SIZE, etc.
> This is a giant red flag for unreliable code.

Wrong.  Just look at what the callers do:
  if (exc->private_1 == 0)
code = _Unwind_RaiseException_Phase2 (exc, _context, );
  else
code = _Unwind_ForcedUnwind_Phase2 (exc, _context, );
and
  if (exc->private_1 == 0)
return _Unwind_RaiseException (exc);

  uw_init_context (_context);
  cur_context = this_context;

  code = _Unwind_ForcedUnwind_Phase2 (exc, _context, );
So, _Unwind_ForcedUnwind_Phase2 is not called if private_1 is NULL.

Can I get detailed info on how to reproduce this (most importantly, which
src.rpm you are trying to build)?

Jakub
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Help needed with new segfaults in frame unwinding under gcc8

2018-02-22 Thread Jason L Tibbitts III
> "JR" == John Reiser  writes:

JR> Those one-line tracebacks, with no source file and no line number,
JR> are a clue that valgrind could find no corresponding degbuginfo.
JR> Please install the debuginfo for libcyrus_imap.so.0.0.0 and
JR> libstdc++.so.6.0.25, then re-run valgrind.  This is very close to
JR> pin-pointing the bug.

All of the debuginfo was installed at that time, and running gdb on the
core did not complain of any missing debug symbols.
libcyrus_imap.so.0.0.0 is a build artifact of this very package.  The
tests are running at the %install section (instead of in %check) so none
of the binaries have been stripped yet.  There is no additional
debuginfo I can think to install.

Just to be sure I repeated the process after wiping my mock cache but
there is no change.  I also built everything with -Og instead of -O2 but
there is also no change (except, interestingly, one repeated argument in
the do_indexer frame is no longer present):

Old:
#8  0x55aff86b4774 in do_indexer (sa=0x7fffbc365370, sa=0x7fffbc365370) at 
imap/squatter.c:352

New:
#8  0x55d7ec2f7364 in do_indexer (sa=0x7ffc51f6d8e0) at imap/squatter.c:352

#0  0x03e70120 in ?? ()
#1  0x7f31ff91c81e in _Unwind_ForcedUnwind_Phase2 (exc=0x7ffc51f6d2e0, 
context=0x7ffc51f6d000, frames_p=0x7ffc51f6cf08) at 
../../../libgcc/unwind.inc:170
#2  0x7f31ff91d105 in _Unwind_Resume () at ../../../libgcc/unwind.inc:243
#3  0x7f3205d03ba8 in stem_version_set (version=, 
database=) at /usr/include/c++/8/bits/char_traits.h:320
#4  xapian_dbw_open (paths=0x55d7edbc7b70, dbwp=0x55d7edbc80e8) at 
imap/xapian_wrap.cpp:327
#5  0x7f3205d7202f in begin_mailbox_update (rx=0x55d7edbc8050, 
mailbox=0x7f3206b57018, flags=0) at imap/search_xapian.c:1535
#6  0x7f3205d542dc in search_update_mailbox (rx=0x55d7edbc8050, 
mailbox=0x7f3206b57018, flags=0) at imap/search_engines.c:211
#7  0x55d7ec2f71fc in index_one (name=0x55d7edbc7b30 "user.cassandane", 
blocking=1) at imap/squatter.c:292
#8  0x55d7ec2f7364 in do_indexer (sa=0x7ffc51f6d8e0) at imap/squatter.c:352
#9  0x55d7ec2f8429 in main (argc=3, argv=0x7ffc51f6da18) at 
imap/squatter.c:1004

I will work up a ticket in bugzilla.redhat.com against gcc now and will
include some instructions for replicating this.

The only other interesting thing I've found so far (which I'm sure was
already obvious) is that this does appear to happen within a catch block
(the one in xapian_dbw_open, source here:
https://github.com/cyrusimap/cyrus-imapd/blob/cyrus-imapd-3.0/imap/xapian_wrap.cpp#L305).

 - J<
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Help needed with new segfaults in frame unwinding under gcc8

2018-02-22 Thread John Reiser

Jason L Tibbitts III wrote:

"JR" == John Reiser  writes:


JR> Please create a bugzilla report, or other well-known tracking
JR> instance.

But where?  I don't even know whose problem this is.  It's taken me days
of what little free time I have just to figure out how to get the
backtrace I was able to provide.


For any SIGSEGV, file bugzilla against the package that contains the
program counter at the time of the fault.  In this case, libgcc
(which bugzilla might say, "libgcc is part of gcc, so use gcc.")
Then post here the URL of the bugzilla report.



I am certainly conditioned to assume that the compiler is working as
designed and problems like this are due to upstream code issues which
simply weren't exposed with previous versions of the compiler.  And
while tracking this all down, I did find at least one real live bug in
the upstream code.

JR> Non-repeatability due to unspecified or mismatched versions is
JR> frustrating.

Well, sure it is.  I just don't have enough information to do anything
other than spew random things.  I'm still trying to understand what's
gone wrong where, and what information is relevant.


Include the versions of the packages for these pieces, please:
   cyrus-imapd and any package that it BuildRequires or Requires
   gcc, libgcc
   gcc-c++, libstdc++
   glibc
   binutils (for /usr/bin/ld)
   mock
   kernel
The idea is: I want to run _exactly_ "the same thing" as you did.


valgrind --track-origins=yes --trace-children=yes ./testrunner.pl -v -f
pretty SearchFuzzy.xapianv2


This is good progress!
> ==22846==  Uninitialised value was created by a stack allocation
> ==22846==at 0x5837ED0: ??? (in 
/builddir/build/BUILDROOT/cyrus-imapd-3.0.5-4.fc28.x86_64/usr/lib64/libcyrus_imap.so.0.0.0)

> ==22846==by 0xBCA8C7F: ??? (in /usr/lib64/libstdc++.so.6.0.25)

Those one-line tracebacks, with no source file and no line number,
are a clue that valgrind could find no corresponding degbuginfo.
Please install the debuginfo for libcyrus_imap.so.0.0.0 and libstdc++.so.6.0.25,
then re-run valgrind.  This is very close to pin-pointing the bug.


JR> The reported behavior is consistent with use of an uninitialized
JR> value.

In whose code?  It sounds like you're saying that the upstream code is
broken, which of course would be something I could understand, but


In the unwind code, which is used by exception handling.



JR> Looking at the code: = gcc/libgcc/unwind.inc

... here you're talking about gcc code.


libgcc.



JR> This is a giant red flag for unreliable code.

And I'm still confused about whose code is unreliable.


libgcc, because it does not check its input (the unwind tables) enough.
Perhaps gcc generated incorrect data into the tables,
but it is the fault of libgcc for not avoiding SIGSEGV.
SIGSEGV is *ALWAYS* the fault of the package that contains $pc
until that package proves otherwise.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Re: Help needed with new segfaults in frame unwinding under gcc8

2018-02-22 Thread Jason L Tibbitts III
> "JR" == John Reiser  writes:

JR> Please create a bugzilla report, or other well-known tracking
JR> instance.

But where?  I don't even know whose problem this is.  It's taken me days
of what little free time I have just to figure out how to get the
backtrace I was able to provide.

I am certainly conditioned to assume that the compiler is working as
designed and problems like this are due to upstream code issues which
simply weren't exposed with previous versions of the compiler.  And
while tracking this all down, I did find at least one real live bug in
the upstream code.

JR> Non-repeatability due to unspecified or mismatched versions is
JR> frustrating.

Well, sure it is.  I just don't have enough information to do anything
other than spew random things.  I'm still trying to understand what's
gone wrong where, and what information is relevant.

I can certainly list the versions of everything in the buildroot, and
everything in the buildroot of the last successful build.  But I don't
think that's going to be particularly useful.  I only know that it
failed to build during the mass rebuild that happened after gcc (and
plenty of other things) was updated.

JR> What does running under memcheck ("valgrind --track-origins=yes
JR> ...") say?

Well, I can run the test suite under valgrind.  It seems to be able to
trace children, too.  I did:

valgrind --track-origins=yes --trace-children=yes ./testrunner.pl -v -f
pretty SearchFuzzy.xapianv2

and got a pile of output.  Narrowing down to just the indexing process
(called squatter) which actually segfaults gives the output I include at
the end of this message.

JR> The reported behavior is consistent with use of an uninitialized
JR> value.

In whose code?  It sounds like you're saying that the upstream code is
broken, which of course would be something I could understand, but

JR> Looking at the code: = gcc/libgcc/unwind.inc

... here you're talking about gcc code.

JR> This is a giant red flag for unreliable code.

And I'm still confused about whose code is unreliable.

 - J<

Output under valgrind for the process which segfaults:

==22846== Memcheck, a memory error detector
==22846== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==22846== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==22846== Command: 
/builddir/build/BUILDROOT/cyrus-imapd-3.0.5-4.fc28.x86_64/usr/sbin/squatter -C 
/builddir/build/BUILD/cyrus-imapd-3.0.5/cassandane/work/2352581/conf/imapd.conf
==22846==
2352581/squatter[22846]: SQL backend defaulting to engine 'pgsql'
2352581/squatter[22846]: indexing mailboxes
2352581/squatter[22846]: indexing mailbox user.cassandane...
==22846== Conditional jump or move depends on uninitialised value(s)
==22846==at 0xBCC2052: _Unwind_Resume (unwind.inc:240)
==22846==by 0x5839B8F: stem_version_set (xapian_wrap.cpp:264)
==22846==by 0x5839B8F: xapian_dbw_open.cold.223 (xapian_wrap.cpp:327)
==22846==by 0x58AC2EE: begin_mailbox_update (search_xapian.c:1535)
==22846==by 0x588D58A: search_update_mailbox (search_engines.c:211)
==22846==by 0x10C104: index_one (squatter.c:292)
==22846==by 0x10B773: do_indexer (squatter.c:352)
==22846==by 0x10B773: main (squatter.c:1004)
==22846==  Uninitialised value was created by a stack allocation
==22846==at 0x5837ED0: ??? (in 
/builddir/build/BUILDROOT/cyrus-imapd-3.0.5-4.fc28.x86_64/usr/lib64/libcyrus_imap.so.0.0.0)
==22846==
==22846== Use of uninitialised value of size 8
==22846==at 0xBCC181B: _Unwind_ForcedUnwind_Phase2 (unwind.inc:170)
==22846==by 0x1FFEFFF76F: ???
==22846==by 0x1FFEFFF6DF: ???
==22846==by 0xBCA8C7F: ??? (in /usr/lib64/libstdc++.so.6.0.25)
==22846==by 0xE6F90EF: ???
==22846==by 0x1FFEFFF77F: ???
==22846==by 0xE6F5F6F: ???
==22846==by 0x1FFEFFF74F: ???
==22846==by 0x1FFEFFF76F: ???
==22846==by 0xE6F550F: ???
==22846==by 0x5839B8F: stem_version_set (xapian_wrap.cpp:264)
==22846==by 0x5839B8F: xapian_dbw_open.cold.223 (xapian_wrap.cpp:327)
==22846==by 0x58AC2EE: begin_mailbox_update (search_xapian.c:1535)
==22846==  Uninitialised value was created by a stack allocation
==22846==at 0x5837ED0: ??? (in 
/builddir/build/BUILDROOT/cyrus-imapd-3.0.5-4.fc28.x86_64/usr/lib64/libcyrus_imap.so.0.0.0)
==22846==
==22846== Jump to the invalid address stated on the next line
==22846==at 0x120: ???
==22846==by 0x1FFEFFF6DF: ???
==22846==by 0xBCA8C7F: ??? (in /usr/lib64/libstdc++.so.6.0.25)
==22846==by 0xE6F90EF: ???
==22846==by 0x1FFEFFF77F: ???
==22846==by 0xE6F5F6F: ???
==22846==by 0x1FFEFFF74F: ???
==22846==by 0x1FFEFFF76F: ???
==22846==by 0xE6F550F: ???
==22846==by 0x5839B8F: stem_version_set (xapian_wrap.cpp:264)
==22846==by 0x5839B8F: xapian_dbw_open.cold.223 (xapian_wrap.cpp:327)
==22846==by 0x58AC2EE: begin_mailbox_update (search_xapian.c:1535)
==22846==by 0x588D58A: search_update_mailbox 

Re: Help needed with new segfaults in frame unwinding under gcc8

2018-02-22 Thread John Reiser

I could really use some help from the gcc experts.


Please create a bugzilla report, or other well-known tracking instance.
In particular, bugzilla asks about repeatability, version numbers, etc.
Non-repeatability due to unspecified or mismatched versions is frustrating.


A package I maintain, cyrus-imapd, contains two extensive test suites
which we run at package build time.  After the big flag day where we
updated gcc and glibc and such in rawhide, one of the test suites now
shows failures and produces 22 core dumps, but only when run in mock
(not even fedpkg local on a rawhide container).  Even in mock, if I
get into the chroot, duplicate the test environment and run the failing
program by hand (or under strace, or under gdb) then it doesn't
segfault.


What does running under memcheck ("valgrind --track-origins=yes ...") say?
The reported behavior is consistent with use of an uninitialized value.
[gdb changes the environment by adding two pipes when invoking a process.]


After getting cores and all of the debugging stuff into mock
(instructions below) I found that all cores have substantially identical
backtraces:

(gdb) bt
#0  0x0120 in ?? ()
#1  0x7f18a19d281e in _Unwind_ForcedUnwind_Phase2 (exc=0x7fffbc364c70, 
context=0x7fffbc364990, frames_p=0x7fffbc364898) at 
../../../libgcc/unwind.inc:170
#2  0x7f18a19d3105 in _Unwind_Resume () at ../../../libgcc/unwind.inc:243
#3  0x7f18a7dbbb90 in stem_version_set (version=, 
database=) at /usr/include/c++/8/bits/char_traits.h:320
#4  xapian_dbw_open (paths=0x55aff951eb70, dbwp=0x55aff951f0f8) at 
imap/xapian_wrap.cpp:327


Looking at the code:
= gcc/libgcc/unwind.inc
 _Unwind_ForcedUnwind_Phase2 (struct _Unwind_Exception *exc,
  struct _Unwind_Context *context,
  unsigned long *frames_p)
 {
   _Unwind_Stop_Fn stop = (_Unwind_Stop_Fn) (_Unwind_Ptr) exc->private_1;
 <>
   stop_code = (*stop) (1, action, exc->exception_class, exc,
context, stop_argument);
=
we see that function pointer 'stop' is cast from an untyped word 'private_1'
with no checking at all, not even for NULL or < PAGE_SIZE, etc.
This is a giant red flag for unreliable code.
Such a check would have avoided the particular SIGSEGV in the traceback above.
Of course this might cause vague or incorrect results, but there could
be strong hints about what to fix, instead of just a bare SIGSEGV.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org


Help needed with new segfaults in frame unwinding under gcc8

2018-02-22 Thread Jason L Tibbitts III
I could really use some help from the gcc experts.

A package I maintain, cyrus-imapd, contains two extensive test suites
which we run at package build time.  After the big flag day where we
updated gcc and glibc and such in rawhide, one of the test suites now
shows failures and produces 22 core dumps, but only when run in mock
(not even fedpkg local on a rawhide container).  Even in mock, if I
get into the chroot, duplicate the test environment and run the failing
program by hand (or under strace, or under gdb) then it doesn't
segfault.

After getting cores and all of the debugging stuff into mock
(instructions below) I found that all cores have substantially identical
backtraces:

(gdb) bt
#0  0x0120 in ?? ()
#1  0x7f18a19d281e in _Unwind_ForcedUnwind_Phase2 (exc=0x7fffbc364c70, 
context=0x7fffbc364990, frames_p=0x7fffbc364898) at 
../../../libgcc/unwind.inc:170
#2  0x7f18a19d3105 in _Unwind_Resume () at ../../../libgcc/unwind.inc:243
#3  0x7f18a7dbbb90 in stem_version_set (version=, 
database=) at /usr/include/c++/8/bits/char_traits.h:320
#4  xapian_dbw_open (paths=0x55aff951eb70, dbwp=0x55aff951f0f8) at 
imap/xapian_wrap.cpp:327
#5  0x7f18a7e2e2ef in begin_mailbox_update (rx=0x55aff951f060, 
mailbox=0x7f18a8c12018, flags=0) at imap/search_xapian.c:1535
#6  0x7f18a7e0f58b in search_update_mailbox (rx=0x55aff951f060, 
mailbox=0x7f18a8c12018, flags=0) at imap/search_engines.c:211
#7  0x55aff86b5105 in index_one (name=0x55aff951eb30 "user.cassandane", 
blocking=1) at imap/squatter.c:292
#8  0x55aff86b4774 in do_indexer (sa=0x7fffbc365370, sa=0x7fffbc365370) at 
imap/squatter.c:352
#9  main (argc=3, argv=0x7fffbc3654c8) at imap/squatter.c:1004

Which as far as I've been able to gather from talking to folks on IRC,
its segfaulting in the stack unwinder trying to handle a C++ exception.
(The only C++ in the program is a wrapper for the Xapian search engine.)

I have done testing with older versions of Xapian (known to build the
package successfully) without any change in behavior, but I'm not sure I
have a reasonable way to roll back the gcc update.

The source is available from
https://github.com/cyrusimap/cyrus-imapd/tree/cyrus-imapd-3.0; the
specific function involved (stem_version_set) is
https://github.com/cyrusimap/cyrus-imapd/blob/cyrus-imapd-3.0/imap/xapian_wrap.cpp#L262
but it's only three lines.

I would appreciate any help from folks who can comprehend what's going
wrong here.  Upstream is can't really offer much in the way of help;
given that the actual failure seems to happen in the libgcc code and
that this can't (so far) be reproduced outside of our buildsystem,
they're not sure what they can do.


Getting useful backtraces from a mock chroot:

If you turn off systemd coredump catching on the machine running mock
(sysctl -w 'kernel.core_pattern=core.%p') and delete the "%check" line
from the spec before building (so that the tests run as part of
%install), you can get the cores left in the mock chroot with the
complete debug information left unstripped.  Then you can manually
install gdb and all of the debugfinfo packages:

mock -r fedora-rawhide-x86_64 --enablerepo fedora-debuginfo -i gdb
cyrus-sasl-lib-debuginfo glibc-debuginfo jansson-debuginfo
keyutils-libs-debuginfo krb5-libs-debuginfo libcom_err-debuginfo
libgcc-debuginfo libical-debuginfo libicu-debuginfo libnghttp2-debuginfo
libselinux-debuginfo libstdc++-debuginfo libuuid-debuginfo
libxcrypt-debuginfo libxml2-debuginfo nspr-debuginfo nss-debuginfo
nss-util-debuginfo openldap-debuginfo openssl-libs-debuginfo
pcre-debuginfo pcre2-debuginfo postgresql-libs-debuginfo
shapelib-debuginfo sqlite-libs-debuginfo xapian-core-libs-debuginfo
xz-libs-debuginfo zlib-debuginfo

Then run:

mock -r fedora-rawhide-x86_64 --shell
su - mockbuild
find . -name core.\*

and gdb as usual.

 - J<
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org