On 9/07/2015 7:20 AM, Stuart Henderson wrote:
On 2015/07/08 20:00, Max Fillinger wrote:
On Wed, Jul 08, 2015 at 03:53:46PM +0200, Mark Kettenis wrote:
I'm looking for testers for this diff.  This should be safe to run on
amd64, i386 and sparc64.  But has been reported to lock up i386
machines.  I can't reproduce this on any of my own systems.  So I'm
looking for help.  I'm looking for people that are able to build a
kernel with this diff and the MP_LOCKDEBUG option enabled
(uncommented) in their GENERIC.MP kernel, run it on an MP machine and
put some load on it to see if it locks up and/or panics.

Being able to move forward with this would make OpenBSD run
significantly better on MP systems.

Thanks,

Mark
I just finished compiling the kernel for amd64; I might test i386 later.
What kind of load would be required to give useful feedback? Would
building the userland or some of the bigger ports be a useful test?
Building base with the reaper unlock diff on i386 doesn't seem to
trigger problems, or at least I haven't run into them in a few attempts.

I do see problems when building ports on a dpb cluster, quite quickly
in some cases - I just did a run and one node locked after 261s, another
after 756s (dpb master stayed up FWIW).

If you're trying to reproduce, make sure you set ddb.console=1 and
check that you can break into ddb under normal conditions. If you
manage to trigger a hang, see if you can break into ddb and get
the usual things (backtrace, ps, sh reg, etc).

I've been unable to get into ddb after a hang, including on this
most recent run with MP_LOCKDEBUG.

Nothing particular special was being built during the last hang;
from dpb term-report, the last entry before "i386-2-" appeared
(indicating that the host is no longer contactable) showed these

archivers/libzip
audio/libogg
archivers/lzo2

Looking at build logs (which are streamed over ssh and logged
on the dpb master) lzo2 and libzip were compiling (cc from base)
and libogg was doing pkg_create/gzip when contact was lost.
So I don't think it's going to be triggered by any particular
ports, there is nothing out of the ordinary about these, and
no funny autoconf checks were occurring at the time.

The main other build-related active process would be sshd,
and since pkg_create was running it would also most likely
have been writing to nfs at the time.


My money is on -> nfs.

Ian McWilliam

Reply via email to