Re: Help needed with new segfaults in frame unwinding under gcc8
On Mon, Feb 26, 2018 at 02:42:40PM -0600, Jason L Tibbitts III wrote: > > "JJ" == Jakub Jelinekwrites: > > JJ> Ok, so the problem is ignoring important warnings, in this case: > JJ> imap/xapian_wrap.cpp: In function 'int > JJ> stem_version_set(Xapian::WritableDatabase*, int)': > JJ> imap/xapian_wrap.cpp:267:1: warning: no return statement in function > JJ> returning non-void [-Wreturn-type] > > OK, so good, that's bad upstream code. I have given good odds on this > being an upstream issue all along, but because the backtraces go through > the unwinder it seemed obvious to look there first. > > I do have to ask, though; is it considered good behavior to just > segfault like this? I understand that undefined behavior is just that, Yes. An optimizing compiler simply attempts to optimize, using assumption that a valid program doesn't invoke undefined behavior. If you don't care about performance and want these runtime bugs to be diagnosed, we have the sanitizers where it will be reported as a bug at runtime. > but it seems like there has to be a better way to fail that at least > looks like it's not a GCC bug. > > And, finally, if doing this in C++ code is so bad, why isn't this simply > an error? Because the warning could have false positives (some cases where the compiler can't prove no fallthrough to last line of function can happen and warns) and that nothing is actually wrong if either the function isn't called or whenever it is called takes a different path or throw an exception or similar. Compared to previous GCC versions, in C++ the warning is now emitted by default even without -Wreturn-type or -Wall. Jakub ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Re: Help needed with new segfaults in frame unwinding under gcc8
> "JJ" == Jakub Jelinekwrites: JJ> Ok, so the problem is ignoring important warnings, in this case: JJ> imap/xapian_wrap.cpp: In function 'int JJ> stem_version_set(Xapian::WritableDatabase*, int)': JJ> imap/xapian_wrap.cpp:267:1: warning: no return statement in function JJ> returning non-void [-Wreturn-type] OK, so good, that's bad upstream code. I have given good odds on this being an upstream issue all along, but because the backtraces go through the unwinder it seemed obvious to look there first. I do have to ask, though; is it considered good behavior to just segfault like this? I understand that undefined behavior is just that, but it seems like there has to be a better way to fail that at least looks like it's not a GCC bug. And, finally, if doing this in C++ code is so bad, why isn't this simply an error? JJ> So, effective summary, in C++ >>NEVER<< ignore -Wreturn-type JJ> warning. Thanks for the explanation; I've passed it on to upstream as https://github.com/cyrusimap/cyrus-imapd/issues/2267 - J< ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Re: Help needed with new segfaults in frame unwinding under gcc8
On Fri, Feb 23, 2018 at 05:40:23PM -0600, Jason L Tibbitts III wrote: > It creates various search indices for the IMAP server (which is why it > wraps Xapian). The manpage is available from > https://cyrusimap.org/imap/reference/manpages/systemcommands/squatter.html > Technically the test suite simply creates the conditions for it to do > its work (in a randomly named directory) and then calls squatter > directly. You should be able to see the arguments in the test suite's > output. Ok, so the problem is ignoring important warnings, in this case: imap/xapian_wrap.cpp: In function 'int stem_version_set(Xapian::WritableDatabase*, int)': imap/xapian_wrap.cpp:267:1: warning: no return statement in function returning non-void [-Wreturn-type] Don't do that, especially not for C++. While for C there is UB only if you actually use the returned value that hasn't been returned, i.e. in the caller, and if you never use the return value, nothing bad happens, in C++ the UB happens already in the callee, so the compiler in: static int stem_version_set(Xapian::WritableDatabase *database, int version) { std::ostringstream convert; convert << version; database->set_metadata(XAPIAN_STEM_VERSION_KEY, convert.str()); } can (and does) assume that if this function is called, then it will never fall-through from the last call into following code, because that is UB, even when the caller is doing just: stem_version_set(dbw->database, dbw->stem_version); Didn't have time to try that, but I think -fsanitize=undefined should have diagnosed that too in addition to the warning. And the effect in the generated code is just that there is no edge from that set_metdata method call to anything that follows it, so the call is followed by whatever other code happened to be emitted next, in this case some C++ EH code that ends with _Unwind_Resume and assumed it is invoked only when throwing or rethrowing an exception. So, effective summary, in C++ >>NEVER<< ignore -Wreturn-type warning. Jakub ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Re: Help needed with new segfaults in frame unwinding under gcc8
I'm sorry, for the first time in decades I actually hit the send key sequence (Ctrl-C Ctrl-C) accidentally. > "JJ" == Jakub Jelinekwrites: JJ> Strangely, --enable-network to mock was really needed so that the JJ> first testsuite passes. That's absolutely bizarre. JJ> I can now get the coredumps, but I'm afraid I really need a way to JJ> reproduce it under gdb, from the core dump there isn't sufficient JJ> information available. I just now found a way. See below. JJ> So, what is this squatter process about, with what command line JJ> options is it invoked, how does it interact with other processes, is JJ> it single-threaded? It creates various search indices for the IMAP server (which is why it wraps Xapian). The manpage is available from https://cyrusimap.org/imap/reference/manpages/systemcommands/squatter.html Technically the test suite simply creates the conditions for it to do its work (in a randomly named directory) and then calls squatter directly. You should be able to see the arguments in the test suite's output. You can get down into the mock chroot and run tests manually: mock -r fedora-rawhide-x86_64 --shell and then enter: su - mockbuild cd /builddir/build/BUILD/cyrus-imapd-3.0.5/cassandane export CYRUS_USER=$USER export LD_LIBRARY_PATH=/builddir/build/BUILDROOT/cyrus-imapd-3.0.5-4.fc29.x86_64/usr/lib64 rm -rf work; mkdir work ./testrunner.pl -vvv -f pretty SearchFuzzy.xapianv2 (You may have to adjust LD_LIBRARY_PATH there; using a glob fails to work for me for reasons I don't understand.) That will run just one test. Down in the log you'll see something like: => Instance[1506] Running: "/builddir/build/BUILDROOT/cyrus-imapd-3.0.5-4.fc28.x86_64/usr/sbin/squatter" "-C" "/builddir/build/BUILD/cyrus-imapd-3.0.5/cassandane/work/2322151/conf/imapd.conf" A bit further you see the output from that process: 2322151/squatter[50141]: SQL backend defaulting to engine 'pgsql' 2322151/squatter[50141]: indexing mailboxes 2322151/squatter[50141]: indexing mailbox user.cassandane... and at that point it segfaults. If at this point you "rm -rf work/224058I1/search" (to clear out the partially created search database) and then run: /builddir/build/BUILDROOT/cyrus-imapd-3.0.5-4.fc28.x86_64/usr/sbin/squatter -C work/2322151/conf/imapd.conf you should see the same segfault. For me it does still segfault when run under gdb, though you will have to delete that "search" directory each time or the process won't fail. JJ> What I can see is that the exc passed to _Unwind_Resume points to JJ> some memory inside of the xapian_dbw_open C++ function's stack frame JJ> which indeed contains multiple try/catch blocks, even nested ones; That's about as far as I have been able to comprehend. Just to be sure, I have done builds with fedora-rpm-macros and glibc rolled back to the versions January 21 (the last successful koschei build) and sadly things still fail in the same way. - J< ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Re: Help needed with new segfaults in frame unwinding under gcc8
I'm sorry, for the first time in decades I actually hit the send key sequence (Ctrl-C Ctrl-C) accidentally. > "JJ" == Jakub Jelinekwrites: JJ> Strangely, --enable-network to mock was really needed so that the JJ> first testsuite passes. That's absolutely bizarre. JJ> I can now get the coredumps, but I'm afraid I really need a way to JJ> reproduce it under gdb, from the core dump there isn't sufficient JJ> information available. I just now found a way. See below. JJ> So, what is this squatter process about, with what command line JJ> options is it invoked, how does it interact with other processes, is JJ> it single-threaded? It creates various search indices for the IMAP server (which is why it wraps Xapian). The manpage is available from https://cyrusimap.org/imap/reference/manpages/systemcommands/squatter.html Technically the test suite simply creates the conditions for it to do its work (in a randomly named directory) and then calls squatter directly. You should be able to see the arguments in the test suite's output. You can get down into the mock chroot and run tests manually: mock -r fedora-rawhide-x86_64 --shell and then enter: su - mockbuild cd /builddir/build/BUILD/cyrus-imapd-3.0.5/cassandane export CYRUS_USER=$USER export LD_LIBRARY_PATH=/builddir/build/BUILDROOT/cyrus-imapd-3.0.5-4.fc29.x86_64/usr/lib64 rm -rf work; mkdir work ./testrunner.pl -vvv -f pretty SearchFuzzy.xapianv2 (You may have to adjust LD_LIBRARY_PATH there; using a glob fails to work for me for reasons I don't understand.) That will run just one test. Down in the log you'll see something like: => Instance[1506] Running: "/builddir/build/BUILDROOT/cyrus-imapd-3.0.5-4.fc28.x86_64/usr/sbin/squatter" "-C" "/builddir/build/BUILD/cyrus-imapd-3.0.5/cassandane/work/2322151/conf/imapd.conf" A bit further you see the output from that process: 2322151/squatter[50141]: SQL backend defaulting to engine 'pgsql' 2322151/squatter[50141]: indexing mailboxes 2322151/squatter[50141]: indexing mailbox user.cassandane... and at that point it segfaults. If at this point you "rm -rf work/224058I1/search" (to clear out the partially created search database) and then run: /builddir/build/BUILDROOT/cyrus-imapd-3.0.5-4.fc28.x86_64/usr/sbin/squatter -C work/2322151/conf/imapd.conf you should see the same segfault. For me it does still segfault when run under gdb, though you will have to delete that "search" directory each time or the process won't fail. JJ> What I can see is that the exc passed to _Unwind_Resume points to JJ> some memory inside of the xapian_dbw_open C++ function's stack frame JJ> which indeed contains multiple try/catch blocks, even nested ones; That's about as far as I have been able to comprehend. Just to be sure, I have done builds with fedora-rpm-macros and glibc rolled back to the versions January 21 (the last successful koschei build) and sadly things still fail in the same way. - J< ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Re: Help needed with new segfaults in frame unwinding under gcc8
> "JJ" == Jakub Jelinekwrites: JJ> Strangely, --enable-network to mock was really needed so that the JJ> first testsuite passes. That's absolutely bizarre. JJ> I can now get the coredumps, but I'm afraid I really need a way to JJ> reproduce it under gdb, from the core dump there isn't sufficient JJ> information available. And unfortunately I haven't been able to create one. So, what is this squatter process about, JJ> with what command line options is it invoked, how does it interact JJ> with other processes, is it single-threaded? JJ> What I can see is that the exc passed to _Unwind_Resume points to JJ> some memory inside of the xapian_dbw_open C++ function's stack frame JJ> which indeed contains multiple try/catch blocks, even nested ones; JJ> but I find it strange that _Unwind_Exception objects would be on the JJ> stack, they should be in malloced objects (at the end of JJ> __cxa_exception e.g. for C++ exceptions), or come from the emergency JJ> pool if malloc would fail (unlikely in this case). The JJ> _Unwind_Exception contains completely bogus values not just in one JJ> field, but in all of them. JJ> Jakub JJ> ___ devel mailing list JJ> -- devel@lists.fedoraproject.org To unsubscribe send an email to JJ> devel-le...@lists.fedoraproject.org -- Jason L Tibbitts III - ti...@math.uh.edu - 713/743-3486 - 660PGH System Manager: University of Houston Department of Mathematics ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Re: Help needed with new segfaults in frame unwinding under gcc8
Some more information: I looked through the Koschei logs for this package and it seemed it actually started failing before gcc8 went in. Specifically, the first failed build was on January 22, and the changes for that run were: mariadb-devel 3:10.2.12-2.fc28 -> 3:10.2.12-3.fc28 binutils 2.29.1-12.fc28 -> 2.29.1-13.fc28 glibc-common 2.26.9000-46.fc28 -> 2.26.9000-48.fc28 perl-DateTime-TimeZone 2.15-1.fc28-> 2.16-1.fc28 libxcrypt 4.0.0-0.203.2018012... -> 4.0.0-0.204.2018012... glibc 2.26.9000-46.fc28 -> 2.26.9000-48.fc28 netpbm-progs 10.81.00-1.fc28-> 10.81.00-2.fc28 glibc-devel2.26.9000-46.fc28 -> 2.26.9000-48.fc28 libxcrypt-devel4.0.0-0.203.2018012... -> 4.0.0-0.204.2018012... netpbm 10.81.00-1.fc28-> 10.81.00-2.fc28 ca-certificates2017.2.20-5.fc28 -> 2017.2.20-6.fc28 glibc-headers 2.26.9000-46.fc28 -> 2.26.9000-48.fc28 libnsl 2.26.9000-46.fc28 -> 2.26.9000-48.fc28 glibc-all-langpacks2.26.9000-46.fc28 -> 2.26.9000-48.fc28 perl-IO-Socket-SSL 2.052-1.fc28 -> 2.054-1.fc28 redhat-rpm-config 79-1.fc28 -> 84-1.fc28 The glibc and redhat-rpm-config updates are most interesting as the latter changed the default compiler flags. Unfortunately the log is no longer available, nor is any log from any koschei run previous to gcc8 landing. So I have more to investigate, including trying to roll back the redhat-rpm-config changes. - J< ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Re: Help needed with new segfaults in frame unwinding under gcc8
On Fri, Feb 23, 2018 at 02:17:34PM -0600, Jason L Tibbitts III wrote: > > "JJ" == Jakub Jelinekwrites: > > JJ> Well, I'm seeing > [...] > > So that's the first of the two test suites, which I've seen fail before > but I haven't invested any time into debugging it. It's completely > unrelated to any of the failures I'm talking about. > > You can just remove or comment the line "make check || exit 1" in the > spec to skip that test suite and get down to the one that crashes. > > JJ> Does it need networking outside of the mock (i.e. shall I retry with > JJ> --enable-networking)? > > There should be no need; it certainly doesn't need networking. That's > just a cunit-based test suite and outside of a couple of cases where I > was randomly trying things to get better debugging for the unwinder > segfault at hand, I've not had any real problems with it. Strangely, --enable-network to mock was really needed so that the first testsuite passes. I can now get the coredumps, but I'm afraid I really need a way to reproduce it under gdb, from the core dump there isn't sufficient information available. So, what is this squatter process about, with what command line options is it invoked, how does it interact with other processes, is it single-threaded? What I can see is that the exc passed to _Unwind_Resume points to some memory inside of the xapian_dbw_open C++ function's stack frame which indeed contains multiple try/catch blocks, even nested ones; but I find it strange that _Unwind_Exception objects would be on the stack, they should be in malloced objects (at the end of __cxa_exception e.g. for C++ exceptions), or come from the emergency pool if malloc would fail (unlikely in this case). The _Unwind_Exception contains completely bogus values not just in one field, but in all of them. Jakub ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Re: Help needed with new segfaults in frame unwinding under gcc8
> "JJ" == Jakub Jelinekwrites: JJ> Well, I'm seeing [...] So that's the first of the two test suites, which I've seen fail before but I haven't invested any time into debugging it. It's completely unrelated to any of the failures I'm talking about. You can just remove or comment the line "make check || exit 1" in the spec to skip that test suite and get down to the one that crashes. JJ> Does it need networking outside of the mock (i.e. shall I retry with JJ> --enable-networking)? There should be no need; it certainly doesn't need networking. That's just a cunit-based test suite and outside of a couple of cases where I was randomly trying things to get better debugging for the unwinder segfault at hand, I've not had any real problems with it. Thanks to tmz on IRC I have some independent verification of this (he gets the same failures) and have a way to reproduce this inside mock inside a docker container. - J< ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Re: Help needed with new segfaults in frame unwinding under gcc8
On Fri, Feb 23, 2018 at 11:54:28AM -0600, Jason L Tibbitts III wrote: > > "JJ" == Jakub Jelinekwrites: > > JJ> Haven't managed to reproduce it, while there are some testsuite > JJ> failures, they are due to timeouts and e.g. dmesg doesn't show any > JJ> segvs nor traps. > > Of course I can reproduce this easily and it also happens in the > buildsystem. The first failure after the gcc update is > https://koji.fedoraproject.org/koji/taskinfo?taskID=25054443 Note that > it fails on all architectures except for s390x, and that succeeds only > because the test suite is disabled there. Well, I'm seeing Suite: backend Test: badhost ...FAILED 1. cunit/unit.c:133 - CU_FAIL_FATAL("Code under test timed out") Test: badservice ... unit: code under test (/builddir/build/BUILD/cyrus-imapd-3.0.5/cunit/backend.testc:test_badhost) timed out passed Test: sasl_plain ...FAILED 1. cunit/unit.c:133 - CU_FAIL_FATAL("Code under test timed out") Test: sasl_digestmd5 ... unit: code under test (/builddir/build/BUILD/cyrus-imapd-3.0.5/cunit/backend.testc:test_sasl_plain) timed out FAILED and many others, supposedly the issue you have is after this make check and thus my mock build doesn't even reach there. Does it need networking outside of the mock (i.e. shall I retry with --enable-networking)? Jakub ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Re: Help needed with new segfaults in frame unwinding under gcc8
> "JJ" == Jakub Jelinekwrites: JJ> Haven't managed to reproduce it, while there are some testsuite JJ> failures, they are due to timeouts and e.g. dmesg doesn't show any JJ> segvs nor traps. Of course I can reproduce this easily and it also happens in the buildsystem. The first failure after the gcc update is https://koji.fedoraproject.org/koji/taskinfo?taskID=25054443 Note that it fails on all architectures except for s390x, and that succeeds only because the test suite is disabled there. Of course something could have changed in rawhide between now and when I did this last night, so I'll rerun everything. But I did do aw scratch build (with an abbreviated test suite run) still shows the failures which can be viewed here: https://koji.fedoraproject.org/koji/taskinfo?taskID=25261672 Not that it's useful to you unless you can get the cores out of koji. (The s390x build fails for different reasons.) - J< ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Re: Help needed with new segfaults in frame unwinding under gcc8
On Fri, Feb 23, 2018 at 10:56:47AM -0600, Jason L Tibbitts III wrote: > > "JJ" == Jakub Jelinekwrites: > > JJ> Can I get detailed info on how to reproduce this (most importantly, > JJ> which src.rpm you are trying to build)? > > Sorry, that was all in the original message. > > The package is cyrus-imapd; all you have to do is check it out and do > fedpkg mockbuild (on the rawhide branch). The test suite should > generate 22 core files, each of which should give a mostly identical > backtrace. Unfortunately I can't get the crash to happen when running > outside of mock. I also can't get it when directly executing the > crashing program while inside the chroot (via mock --shell); it has to > be run by the test suite and in mock. I haven't yet been able to figure > out why. Haven't managed to reproduce it, while there are some testsuite failures, they are due to timeouts and e.g. dmesg doesn't show any segvs nor traps. This was with mock -r fedora-rawhide-x86_64 ./cyrus-imapd-3.0.5-3.fc28.src.rpm Jakub ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Re: Help needed with new segfaults in frame unwinding under gcc8
> "JJ" == Jakub Jelinekwrites: JJ> Can I get detailed info on how to reproduce this (most importantly, JJ> which src.rpm you are trying to build)? Sorry, that was all in the original message. The package is cyrus-imapd; all you have to do is check it out and do fedpkg mockbuild (on the rawhide branch). The test suite should generate 22 core files, each of which should give a mostly identical backtrace. Unfortunately I can't get the crash to happen when running outside of mock. I also can't get it when directly executing the crashing program while inside the chroot (via mock --shell); it has to be run by the test suite and in mock. I haven't yet been able to figure out why. To get the test suite run before rpm strips out all of the debug information, you can just delete the %check line from the spec before building. - J< ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Re: Help needed with new segfaults in frame unwinding under gcc8
On 23/02/18 13:16, Jakub Jelinek wrote: On Thu, Feb 22, 2018 at 01:34:00PM -0800, John Reiser wrote: Looking at the code: = gcc/libgcc/unwind.inc _Unwind_ForcedUnwind_Phase2 (struct _Unwind_Exception *exc, struct _Unwind_Context *context, unsigned long *frames_p) { _Unwind_Stop_Fn stop = (_Unwind_Stop_Fn) (_Unwind_Ptr) exc->private_1; <> stop_code = (*stop) (1, action, exc->exception_class, exc, context, stop_argument); = we see that function pointer 'stop' is cast from an untyped word 'private_1' with no checking at all, not even for NULL or < PAGE_SIZE, etc. This is a giant red flag for unreliable code. Wrong. Just look at what the callers do: if (exc->private_1 == 0) code = _Unwind_RaiseException_Phase2 (exc, _context, ); else code = _Unwind_ForcedUnwind_Phase2 (exc, _context, ); and if (exc->private_1 == 0) return _Unwind_RaiseException (exc); uw_init_context (_context); cur_context = this_context; code = _Unwind_ForcedUnwind_Phase2 (exc, _context, ); So, _Unwind_ForcedUnwind_Phase2 is not called if private_1 is NULL. Judging by the valgrind report it is undefined, and likely has the value 0x120 or something close to that, which is clearly bogus but good enough to get past the null check ;-) Can I get detailed info on how to reproduce this (most importantly, which src.rpm you are trying to build)? I believe it's cyrus-imapd. Tom -- Tom Hughes (t...@compton.nu) http://compton.nu/ ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Re: Help needed with new segfaults in frame unwinding under gcc8
On Thu, Feb 22, 2018 at 01:34:00PM -0800, John Reiser wrote: > Looking at the code: > = gcc/libgcc/unwind.inc > _Unwind_ForcedUnwind_Phase2 (struct _Unwind_Exception *exc, > struct _Unwind_Context *context, > unsigned long *frames_p) > { >_Unwind_Stop_Fn stop = (_Unwind_Stop_Fn) (_Unwind_Ptr) exc->private_1; > <> >stop_code = (*stop) (1, action, exc->exception_class, exc, > context, stop_argument); > = > we see that function pointer 'stop' is cast from an untyped word 'private_1' > with no checking at all, not even for NULL or < PAGE_SIZE, etc. > This is a giant red flag for unreliable code. Wrong. Just look at what the callers do: if (exc->private_1 == 0) code = _Unwind_RaiseException_Phase2 (exc, _context, ); else code = _Unwind_ForcedUnwind_Phase2 (exc, _context, ); and if (exc->private_1 == 0) return _Unwind_RaiseException (exc); uw_init_context (_context); cur_context = this_context; code = _Unwind_ForcedUnwind_Phase2 (exc, _context, ); So, _Unwind_ForcedUnwind_Phase2 is not called if private_1 is NULL. Can I get detailed info on how to reproduce this (most importantly, which src.rpm you are trying to build)? Jakub ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Re: Help needed with new segfaults in frame unwinding under gcc8
> "JR" == John Reiserwrites: JR> Those one-line tracebacks, with no source file and no line number, JR> are a clue that valgrind could find no corresponding degbuginfo. JR> Please install the debuginfo for libcyrus_imap.so.0.0.0 and JR> libstdc++.so.6.0.25, then re-run valgrind. This is very close to JR> pin-pointing the bug. All of the debuginfo was installed at that time, and running gdb on the core did not complain of any missing debug symbols. libcyrus_imap.so.0.0.0 is a build artifact of this very package. The tests are running at the %install section (instead of in %check) so none of the binaries have been stripped yet. There is no additional debuginfo I can think to install. Just to be sure I repeated the process after wiping my mock cache but there is no change. I also built everything with -Og instead of -O2 but there is also no change (except, interestingly, one repeated argument in the do_indexer frame is no longer present): Old: #8 0x55aff86b4774 in do_indexer (sa=0x7fffbc365370, sa=0x7fffbc365370) at imap/squatter.c:352 New: #8 0x55d7ec2f7364 in do_indexer (sa=0x7ffc51f6d8e0) at imap/squatter.c:352 #0 0x03e70120 in ?? () #1 0x7f31ff91c81e in _Unwind_ForcedUnwind_Phase2 (exc=0x7ffc51f6d2e0, context=0x7ffc51f6d000, frames_p=0x7ffc51f6cf08) at ../../../libgcc/unwind.inc:170 #2 0x7f31ff91d105 in _Unwind_Resume () at ../../../libgcc/unwind.inc:243 #3 0x7f3205d03ba8 in stem_version_set (version=, database=) at /usr/include/c++/8/bits/char_traits.h:320 #4 xapian_dbw_open (paths=0x55d7edbc7b70, dbwp=0x55d7edbc80e8) at imap/xapian_wrap.cpp:327 #5 0x7f3205d7202f in begin_mailbox_update (rx=0x55d7edbc8050, mailbox=0x7f3206b57018, flags=0) at imap/search_xapian.c:1535 #6 0x7f3205d542dc in search_update_mailbox (rx=0x55d7edbc8050, mailbox=0x7f3206b57018, flags=0) at imap/search_engines.c:211 #7 0x55d7ec2f71fc in index_one (name=0x55d7edbc7b30 "user.cassandane", blocking=1) at imap/squatter.c:292 #8 0x55d7ec2f7364 in do_indexer (sa=0x7ffc51f6d8e0) at imap/squatter.c:352 #9 0x55d7ec2f8429 in main (argc=3, argv=0x7ffc51f6da18) at imap/squatter.c:1004 I will work up a ticket in bugzilla.redhat.com against gcc now and will include some instructions for replicating this. The only other interesting thing I've found so far (which I'm sure was already obvious) is that this does appear to happen within a catch block (the one in xapian_dbw_open, source here: https://github.com/cyrusimap/cyrus-imapd/blob/cyrus-imapd-3.0/imap/xapian_wrap.cpp#L305). - J< ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Re: Help needed with new segfaults in frame unwinding under gcc8
Jason L Tibbitts III wrote: "JR" == John Reiserwrites: JR> Please create a bugzilla report, or other well-known tracking JR> instance. But where? I don't even know whose problem this is. It's taken me days of what little free time I have just to figure out how to get the backtrace I was able to provide. For any SIGSEGV, file bugzilla against the package that contains the program counter at the time of the fault. In this case, libgcc (which bugzilla might say, "libgcc is part of gcc, so use gcc.") Then post here the URL of the bugzilla report. I am certainly conditioned to assume that the compiler is working as designed and problems like this are due to upstream code issues which simply weren't exposed with previous versions of the compiler. And while tracking this all down, I did find at least one real live bug in the upstream code. JR> Non-repeatability due to unspecified or mismatched versions is JR> frustrating. Well, sure it is. I just don't have enough information to do anything other than spew random things. I'm still trying to understand what's gone wrong where, and what information is relevant. Include the versions of the packages for these pieces, please: cyrus-imapd and any package that it BuildRequires or Requires gcc, libgcc gcc-c++, libstdc++ glibc binutils (for /usr/bin/ld) mock kernel The idea is: I want to run _exactly_ "the same thing" as you did. valgrind --track-origins=yes --trace-children=yes ./testrunner.pl -v -f pretty SearchFuzzy.xapianv2 This is good progress! > ==22846== Uninitialised value was created by a stack allocation > ==22846==at 0x5837ED0: ??? (in /builddir/build/BUILDROOT/cyrus-imapd-3.0.5-4.fc28.x86_64/usr/lib64/libcyrus_imap.so.0.0.0) > ==22846==by 0xBCA8C7F: ??? (in /usr/lib64/libstdc++.so.6.0.25) Those one-line tracebacks, with no source file and no line number, are a clue that valgrind could find no corresponding degbuginfo. Please install the debuginfo for libcyrus_imap.so.0.0.0 and libstdc++.so.6.0.25, then re-run valgrind. This is very close to pin-pointing the bug. JR> The reported behavior is consistent with use of an uninitialized JR> value. In whose code? It sounds like you're saying that the upstream code is broken, which of course would be something I could understand, but In the unwind code, which is used by exception handling. JR> Looking at the code: = gcc/libgcc/unwind.inc ... here you're talking about gcc code. libgcc. JR> This is a giant red flag for unreliable code. And I'm still confused about whose code is unreliable. libgcc, because it does not check its input (the unwind tables) enough. Perhaps gcc generated incorrect data into the tables, but it is the fault of libgcc for not avoiding SIGSEGV. SIGSEGV is *ALWAYS* the fault of the package that contains $pc until that package proves otherwise. ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Re: Help needed with new segfaults in frame unwinding under gcc8
> "JR" == John Reiserwrites: JR> Please create a bugzilla report, or other well-known tracking JR> instance. But where? I don't even know whose problem this is. It's taken me days of what little free time I have just to figure out how to get the backtrace I was able to provide. I am certainly conditioned to assume that the compiler is working as designed and problems like this are due to upstream code issues which simply weren't exposed with previous versions of the compiler. And while tracking this all down, I did find at least one real live bug in the upstream code. JR> Non-repeatability due to unspecified or mismatched versions is JR> frustrating. Well, sure it is. I just don't have enough information to do anything other than spew random things. I'm still trying to understand what's gone wrong where, and what information is relevant. I can certainly list the versions of everything in the buildroot, and everything in the buildroot of the last successful build. But I don't think that's going to be particularly useful. I only know that it failed to build during the mass rebuild that happened after gcc (and plenty of other things) was updated. JR> What does running under memcheck ("valgrind --track-origins=yes JR> ...") say? Well, I can run the test suite under valgrind. It seems to be able to trace children, too. I did: valgrind --track-origins=yes --trace-children=yes ./testrunner.pl -v -f pretty SearchFuzzy.xapianv2 and got a pile of output. Narrowing down to just the indexing process (called squatter) which actually segfaults gives the output I include at the end of this message. JR> The reported behavior is consistent with use of an uninitialized JR> value. In whose code? It sounds like you're saying that the upstream code is broken, which of course would be something I could understand, but JR> Looking at the code: = gcc/libgcc/unwind.inc ... here you're talking about gcc code. JR> This is a giant red flag for unreliable code. And I'm still confused about whose code is unreliable. - J< Output under valgrind for the process which segfaults: ==22846== Memcheck, a memory error detector ==22846== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==22846== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info ==22846== Command: /builddir/build/BUILDROOT/cyrus-imapd-3.0.5-4.fc28.x86_64/usr/sbin/squatter -C /builddir/build/BUILD/cyrus-imapd-3.0.5/cassandane/work/2352581/conf/imapd.conf ==22846== 2352581/squatter[22846]: SQL backend defaulting to engine 'pgsql' 2352581/squatter[22846]: indexing mailboxes 2352581/squatter[22846]: indexing mailbox user.cassandane... ==22846== Conditional jump or move depends on uninitialised value(s) ==22846==at 0xBCC2052: _Unwind_Resume (unwind.inc:240) ==22846==by 0x5839B8F: stem_version_set (xapian_wrap.cpp:264) ==22846==by 0x5839B8F: xapian_dbw_open.cold.223 (xapian_wrap.cpp:327) ==22846==by 0x58AC2EE: begin_mailbox_update (search_xapian.c:1535) ==22846==by 0x588D58A: search_update_mailbox (search_engines.c:211) ==22846==by 0x10C104: index_one (squatter.c:292) ==22846==by 0x10B773: do_indexer (squatter.c:352) ==22846==by 0x10B773: main (squatter.c:1004) ==22846== Uninitialised value was created by a stack allocation ==22846==at 0x5837ED0: ??? (in /builddir/build/BUILDROOT/cyrus-imapd-3.0.5-4.fc28.x86_64/usr/lib64/libcyrus_imap.so.0.0.0) ==22846== ==22846== Use of uninitialised value of size 8 ==22846==at 0xBCC181B: _Unwind_ForcedUnwind_Phase2 (unwind.inc:170) ==22846==by 0x1FFEFFF76F: ??? ==22846==by 0x1FFEFFF6DF: ??? ==22846==by 0xBCA8C7F: ??? (in /usr/lib64/libstdc++.so.6.0.25) ==22846==by 0xE6F90EF: ??? ==22846==by 0x1FFEFFF77F: ??? ==22846==by 0xE6F5F6F: ??? ==22846==by 0x1FFEFFF74F: ??? ==22846==by 0x1FFEFFF76F: ??? ==22846==by 0xE6F550F: ??? ==22846==by 0x5839B8F: stem_version_set (xapian_wrap.cpp:264) ==22846==by 0x5839B8F: xapian_dbw_open.cold.223 (xapian_wrap.cpp:327) ==22846==by 0x58AC2EE: begin_mailbox_update (search_xapian.c:1535) ==22846== Uninitialised value was created by a stack allocation ==22846==at 0x5837ED0: ??? (in /builddir/build/BUILDROOT/cyrus-imapd-3.0.5-4.fc28.x86_64/usr/lib64/libcyrus_imap.so.0.0.0) ==22846== ==22846== Jump to the invalid address stated on the next line ==22846==at 0x120: ??? ==22846==by 0x1FFEFFF6DF: ??? ==22846==by 0xBCA8C7F: ??? (in /usr/lib64/libstdc++.so.6.0.25) ==22846==by 0xE6F90EF: ??? ==22846==by 0x1FFEFFF77F: ??? ==22846==by 0xE6F5F6F: ??? ==22846==by 0x1FFEFFF74F: ??? ==22846==by 0x1FFEFFF76F: ??? ==22846==by 0xE6F550F: ??? ==22846==by 0x5839B8F: stem_version_set (xapian_wrap.cpp:264) ==22846==by 0x5839B8F: xapian_dbw_open.cold.223 (xapian_wrap.cpp:327) ==22846==by 0x58AC2EE: begin_mailbox_update (search_xapian.c:1535) ==22846==by 0x588D58A: search_update_mailbox
Re: Help needed with new segfaults in frame unwinding under gcc8
I could really use some help from the gcc experts. Please create a bugzilla report, or other well-known tracking instance. In particular, bugzilla asks about repeatability, version numbers, etc. Non-repeatability due to unspecified or mismatched versions is frustrating. A package I maintain, cyrus-imapd, contains two extensive test suites which we run at package build time. After the big flag day where we updated gcc and glibc and such in rawhide, one of the test suites now shows failures and produces 22 core dumps, but only when run in mock (not even fedpkg local on a rawhide container). Even in mock, if I get into the chroot, duplicate the test environment and run the failing program by hand (or under strace, or under gdb) then it doesn't segfault. What does running under memcheck ("valgrind --track-origins=yes ...") say? The reported behavior is consistent with use of an uninitialized value. [gdb changes the environment by adding two pipes when invoking a process.] After getting cores and all of the debugging stuff into mock (instructions below) I found that all cores have substantially identical backtraces: (gdb) bt #0 0x0120 in ?? () #1 0x7f18a19d281e in _Unwind_ForcedUnwind_Phase2 (exc=0x7fffbc364c70, context=0x7fffbc364990, frames_p=0x7fffbc364898) at ../../../libgcc/unwind.inc:170 #2 0x7f18a19d3105 in _Unwind_Resume () at ../../../libgcc/unwind.inc:243 #3 0x7f18a7dbbb90 in stem_version_set (version=, database=) at /usr/include/c++/8/bits/char_traits.h:320 #4 xapian_dbw_open (paths=0x55aff951eb70, dbwp=0x55aff951f0f8) at imap/xapian_wrap.cpp:327 Looking at the code: = gcc/libgcc/unwind.inc _Unwind_ForcedUnwind_Phase2 (struct _Unwind_Exception *exc, struct _Unwind_Context *context, unsigned long *frames_p) { _Unwind_Stop_Fn stop = (_Unwind_Stop_Fn) (_Unwind_Ptr) exc->private_1; <> stop_code = (*stop) (1, action, exc->exception_class, exc, context, stop_argument); = we see that function pointer 'stop' is cast from an untyped word 'private_1' with no checking at all, not even for NULL or < PAGE_SIZE, etc. This is a giant red flag for unreliable code. Such a check would have avoided the particular SIGSEGV in the traceback above. Of course this might cause vague or incorrect results, but there could be strong hints about what to fix, instead of just a bare SIGSEGV. ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Help needed with new segfaults in frame unwinding under gcc8
I could really use some help from the gcc experts. A package I maintain, cyrus-imapd, contains two extensive test suites which we run at package build time. After the big flag day where we updated gcc and glibc and such in rawhide, one of the test suites now shows failures and produces 22 core dumps, but only when run in mock (not even fedpkg local on a rawhide container). Even in mock, if I get into the chroot, duplicate the test environment and run the failing program by hand (or under strace, or under gdb) then it doesn't segfault. After getting cores and all of the debugging stuff into mock (instructions below) I found that all cores have substantially identical backtraces: (gdb) bt #0 0x0120 in ?? () #1 0x7f18a19d281e in _Unwind_ForcedUnwind_Phase2 (exc=0x7fffbc364c70, context=0x7fffbc364990, frames_p=0x7fffbc364898) at ../../../libgcc/unwind.inc:170 #2 0x7f18a19d3105 in _Unwind_Resume () at ../../../libgcc/unwind.inc:243 #3 0x7f18a7dbbb90 in stem_version_set (version=, database=) at /usr/include/c++/8/bits/char_traits.h:320 #4 xapian_dbw_open (paths=0x55aff951eb70, dbwp=0x55aff951f0f8) at imap/xapian_wrap.cpp:327 #5 0x7f18a7e2e2ef in begin_mailbox_update (rx=0x55aff951f060, mailbox=0x7f18a8c12018, flags=0) at imap/search_xapian.c:1535 #6 0x7f18a7e0f58b in search_update_mailbox (rx=0x55aff951f060, mailbox=0x7f18a8c12018, flags=0) at imap/search_engines.c:211 #7 0x55aff86b5105 in index_one (name=0x55aff951eb30 "user.cassandane", blocking=1) at imap/squatter.c:292 #8 0x55aff86b4774 in do_indexer (sa=0x7fffbc365370, sa=0x7fffbc365370) at imap/squatter.c:352 #9 main (argc=3, argv=0x7fffbc3654c8) at imap/squatter.c:1004 Which as far as I've been able to gather from talking to folks on IRC, its segfaulting in the stack unwinder trying to handle a C++ exception. (The only C++ in the program is a wrapper for the Xapian search engine.) I have done testing with older versions of Xapian (known to build the package successfully) without any change in behavior, but I'm not sure I have a reasonable way to roll back the gcc update. The source is available from https://github.com/cyrusimap/cyrus-imapd/tree/cyrus-imapd-3.0; the specific function involved (stem_version_set) is https://github.com/cyrusimap/cyrus-imapd/blob/cyrus-imapd-3.0/imap/xapian_wrap.cpp#L262 but it's only three lines. I would appreciate any help from folks who can comprehend what's going wrong here. Upstream is can't really offer much in the way of help; given that the actual failure seems to happen in the libgcc code and that this can't (so far) be reproduced outside of our buildsystem, they're not sure what they can do. Getting useful backtraces from a mock chroot: If you turn off systemd coredump catching on the machine running mock (sysctl -w 'kernel.core_pattern=core.%p') and delete the "%check" line from the spec before building (so that the tests run as part of %install), you can get the cores left in the mock chroot with the complete debug information left unstripped. Then you can manually install gdb and all of the debugfinfo packages: mock -r fedora-rawhide-x86_64 --enablerepo fedora-debuginfo -i gdb cyrus-sasl-lib-debuginfo glibc-debuginfo jansson-debuginfo keyutils-libs-debuginfo krb5-libs-debuginfo libcom_err-debuginfo libgcc-debuginfo libical-debuginfo libicu-debuginfo libnghttp2-debuginfo libselinux-debuginfo libstdc++-debuginfo libuuid-debuginfo libxcrypt-debuginfo libxml2-debuginfo nspr-debuginfo nss-debuginfo nss-util-debuginfo openldap-debuginfo openssl-libs-debuginfo pcre-debuginfo pcre2-debuginfo postgresql-libs-debuginfo shapelib-debuginfo sqlite-libs-debuginfo xapian-core-libs-debuginfo xz-libs-debuginfo zlib-debuginfo Then run: mock -r fedora-rawhide-x86_64 --shell su - mockbuild find . -name core.\* and gdb as usual. - J< ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org