Bug#827473: ostree: FTBFS on mipsel: "ostree pull" sometimes gets SIGBUS, SIGSEGV

2016-06-29 Thread Simon McVittie
On Wed, 29 Jun 2016 at 12:30:40 +, Radovan Birdic wrote:
> I tried to build ostree package on three different machines for mipsel
> architecture: Cavium, Loongson and Broadcom.

Thanks, much appreciated!

What specific CPU/sub-architecture/whatever are these machines? I'm
assuming that saying you have a Broadcom mipsel CPU is like saying you
have an AMD x86 CPU, rather than like saying it's a specific model of
Athlon or whatever?

I've seen "ostree pull" failing on the Debian mipsel porterbox
etler.debian.org, which has /proc/cpuinfo like this:

system type : loongson-ls3a-rs780e-1w
machine : Unknown
processor   : 0
cpu model   : ICT Loongson-3 V0.5  FPU V0.1
BogoMIPS: 718.84
wait instruction: no
microsecond timers  : yes
tlb_entries : 64
extra interrupt vector  : no
hardware watchpoint : yes, count: 0, address/irw mask: []
isa : mips1 mips2 mips3 mips4 mips5 mips32r1 mips64r1
ASEs implemented:
shadow register sets: 1
kscratch registers  : 0
package : 0
core: 0
VCED exceptions : not available
VCEI exceptions : not available

(and three more identical cores)

The official buildd where this has been failing is eberlin.debian.org,
which is described as "LS3A-RS780-1w (Quad Core Loongson 3A)" on
. Ordinary developers
can't log in to buildd machines, but the description on
 also says LS3A-RS780-1w,
so eberlin's /proc/cpuinfo would probably look the same.

> On Broadcom machine, following tests fails:
> test-basic.sh - on every run
> > ERROR: tests/test-basic.sh - too few tests run (expected 57, got 55)
> > ERROR: tests/test-basic.sh - exited with status 1
> test-pull-c - occasionally
> > ERROR: tests/test-pull-c - too few tests run (expected 2, got 1)
> > ERROR: tests/test-pull-c - exited with status 138 (terminated by signal 10?)

Those errors are not enough information to be useful or diagnostic. Please
could you attach the detailed logs from a failing run? That either means
test-suite.log, or the individual test logs from the failing tests
(tests/test-basic.sh.log and tests/test-pull-c.log in this case).

If you run the tests with VERBOSE=1 in the environment, they'll write
test-suite.log to stdout/stderr, somewhat later than the ERROR lines
you quoted. That's what appears if the tests fail during a package build,
or in the official Debian buildd logs like

(search for ".. contents::" to find logs similar to the ones I'd
need to see).

(This is standard Automake behaviour, not OSTree-specific.)

If you can get anything useful from a core dump - for instance
a useful backtrace - that would also be really helpful.

Thanks,
S



Bug#827473: ostree: FTBFS on mipsel: "ostree pull" sometimes gets SIGBUS, SIGSEGV

2016-06-28 Thread Simon McVittie
Control: reopen 827473
Control: found 827473 2016.6-2

On Sun, 26 Jun 2016 at 19:54:11 +, Debian Bug Tracking System wrote:
>* New upstream release
...
>  - this version is more careful about thread-safety, which appears
>to fix the test failures that caused FTBFS on mipsel
>(Closes: #827473)

I thought this was true, and it seems to have worked on eberlin and etler
with 2016.6-1; but with 2016.6-2 (which makes no source changes, just
switches one test to LC_ALL=C) it's back to failing with sporadic
bus errors in "ostree pull" on eberlin. 5 out of 5 test runs failed,
but the test that fails in each run is different.

On Tue, 21 Jun 2016 at 11:55:29 +0100, Simon McVittie wrote:
> I'm going to try with valgrind, and if I can't get anything useful out of
> that, either ignore the test failure on mipsel, or ask for the 2016.5-3
> binaries to be removed so they don't block ostree migrating to testing
> on other architectures.

I was unable to get any useful information from valgrind either.

mips porters, is there something special that maintainers are expected
to do to be able to debug or diagnose mips-specific issues? As noted in
an earlier mail to this bug, inspecting the core dump was not helpful.
It's entirely possible that this is an ostree bug, but I'm unlikely to
be able to do anything about it without some indication of where it is.

How confident are we that these machines have reliable hardware,
and that the mipsel toolchain is reliable?

I would really like to resolve this somehow so that ostree and flatpak
can migrate to testing, either by getting enough information to be able
to diagnose and fix what is wrong, by having the mipsel binaries removed,
or by ignoring test failures on mipsel and assuming that if anyone
actually *uses* ostree there, they will also step forward to diagnose and
fix it.

S



Bug#827473: ostree: FTBFS on mipsel: "ostree pull" sometimes gets SIGBUS, SIGSEGV

2016-06-21 Thread Simon McVittie
On Thu, 16 Jun 2016 at 12:06:12 -0400, Simon McVittie wrote:
> I'm trying to reproduce this failure on the porterbox etler.debian.org
> by rebuilding the package or by using the installed-tests from 2016.5-3.

I can reproduce it eventually, but the gdb report is unhelpful. Is there
anything special that I'd need to do to get a useful backtrace on mipsel?

(gdb) thread apply all bt full

Thread 3 (LWP 4232):
#0  0x775b697c in ?? ()
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 2 (LWP 4215):
#0  0xec60 in ?? ()
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 1 (LWP 4231):
#0  0x777d151c in ?? ()
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

To reproduce:

* build ostree (I used 2016.5-4)
* while TEST_SKIP_CLEANUP=1 make check-TESTS 
TESTS=tests/test-pull-archive-z.sh; do :; done
  (and let it run for a while until it fails)
* look for a line like "Skipping cleanup of /var/tmp/tap-test.7eaT36" in
  test-suite.log
* ./libtool --mode=execute gdb ostree /var/tmp/tap-test.7eaT36/core

I'm going to try with valgrind, and if I can't get anything useful out of
that, either ignore the test failure on mipsel, or ask for the 2016.5-3
binaries to be removed so they don't block ostree migrating to testing
on other architectures.

S



Bug#827473: ostree: FTBFS on mipsel: "ostree pull" sometimes gets SIGBUS, SIGSEGV

2016-06-16 Thread Simon McVittie
Source: ostree
Version: 2016.5-4
Severity: important
Tags: help

[X-Debbugs-Cc set to debian-m...@lists.debian.org, please include them
in follow-ups.]

ostree failed to build on the mipsel buildd eberlin.debian.org:

I'm reporting this as important rather than serious because it seems
to be hard to reproduce, so I suspect a rebuild would probably be fine,
but it probably points to an underlying problem in either ostree or the
mipsel toolchain/machine.

I'm trying to reproduce this failure on the porterbox etler.debian.org
by rebuilding the package or by using the installed-tests from 2016.5-3.

One of the changes I made in 2016.5-4 was to repeat the tests 4 times
if they fail the first time, so we can see how reproducible things are.
In the failing build on eberlin, the results were:

tests/test-pull-archive-z.sh (1/5, 5/5): ostree command-line tool killed by
signal 10, which I think is SIGBUS, during "ostree pull"

tests/test-pull-large-metadata.sh (1/5): ostree command-line tool killed
by SIGBUS with no output, again during "ostree pull"

tests/test-oldstyle-partial.sh (1/5): ostree command-line tool killed
by SIGSEGV, again during "ostree pull"

tests/test-pull-metalink.sh (2/5): another SIGBUS during "ostree pull"

tests/test-pull-resume.sh (3/5, 4/5): SIGBUS during "ostree pull"

tests/test-admin-upgrade-not-backwards.sh (4/5): another SIGBUS during
"ostree pull"

tests/test-pull-depth.sh (5/5): another SIGBUS during "ostree pull"

If ostree's tests are reliable on architectures other than mipsel and
we can't reproduce the failure in an environment where stack traces are
available, I would prefer to exclude it from being built on mipsel rather
than ignore test failures, but I'm willing to be persuaded otherwise.

Any porter advice or help welcome. ostree has a standard Autotools
"make check", and GNOME-style installed-tests (mostly the same code)
wrapped in autopkgtest; if you haven't encountered those before,
the tl;dr version is that you install gnome-desktop-testing and
ostree-tests, then run "gnome-desktop-testing-runner ostree", or read
the underlying shell command-lines out of the .desktop-style files in
/usr/share/installed-tests/ostree).

Thanks,
S