Re: spurious out of swap kills

2019-09-12 Thread Konstantin Belousov
On Thu, Sep 12, 2019 at 05:42:00PM -0700, Don Lewis wrote:
> On 12 Sep, Mark Johnston wrote:
> > On Thu, Sep 12, 2019 at 04:00:17PM -0700, Don Lewis wrote:
> >> My poudriere machine is running 13.0-CURRENT and gets updated to the
> >> latest version of -CURRENT periodically.  At least in the last week or
> >> so, I've been seeing occasional port build failures when building my
> >> default set of ports, and I finally had some time to do some
> >> investigation.
> >> 
> >> It's a 16-thread Ryzen machine, with 64 GB of RAM and 40 GB of swap.
> >> Poudriere is configured with
> >>   USE_TMPFS="wrkdir data localbase"
> >> and I have
> >>   .if ${.CURDIR:M*/www/chromium}
> >>   MAKE_JOBS_NUMBER=16
> >>   .else
> >>   MAKE_JOBS_NUMBER=7
> >>   .endif
> >> in /usr/local/etc/poudriere.d/make.conf, since this gives me the best
> >> overall build time for my set of ports.  This hits memory pretty hard,
> >> especially when chromium, firefox, libreoffice, and both versions of
> >> openoffice are all building at the same time.  During this time, the
> >> amount of space consumed by tmpfs for /wrkdir gets large when building
> >> these large ports.  There is not enough RAM to hold it all, so some of
> >> the older data spills over to swap.  Swap usage peaks at about 10 GB,
> >> leaving about 30 GB of free swap.  Nevertheless, I see these errors,
> >> with rustc being the usual victim:
> >> 
> >> Sep 11 23:21:43 zipper kernel: pid 16581 (rustc), jid 43, uid 65534, was 
> >> killed: out of swap space
> >> Sep 12 02:48:23 zipper kernel: pid 1209 (rustc), jid 62, uid 65534, was 
> >> killed: out of swap space
> >> 
> >> Top shows the size of rustc being about 2 GB, so I doubt that it
> >> suddenly needs an additional 30 GB of swap.
> >> 
> >> I'm wondering if there might be a transient kmem shortage that is
> >> causing a malloc(..., M_NOWAIT) failure in the swap allocation path
> >> that is the cause of the problem.
> > 
> > Perhaps this is a consequence of r351114?  To confirm this, you might
> > try increasing the value of vm.pfault_oom_wait to a larger value, like
> > 20 or 30, and see if the OOM kills still occur.
> 
> I wonder if increasing vm.pfault_oom_attempts might also be a good idea.
If you are sure that you cannot exhaust your swap space, set
attempts to -1 to disable this mechanism.

Basically, page fault handler waits for vm.pfault_oom_wait *
vm.pfault_oom_attempts for a page allocation before killing the process.
Default is 30 secs, and if you cannot get a page for 30 secs, there is
something very wrong with the machine.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: spurious out of swap kills

2019-09-12 Thread Don Lewis
On 12 Sep, Mark Johnston wrote:
> On Thu, Sep 12, 2019 at 04:00:17PM -0700, Don Lewis wrote:
>> My poudriere machine is running 13.0-CURRENT and gets updated to the
>> latest version of -CURRENT periodically.  At least in the last week or
>> so, I've been seeing occasional port build failures when building my
>> default set of ports, and I finally had some time to do some
>> investigation.
>> 
>> It's a 16-thread Ryzen machine, with 64 GB of RAM and 40 GB of swap.
>> Poudriere is configured with
>>   USE_TMPFS="wrkdir data localbase"
>> and I have
>>   .if ${.CURDIR:M*/www/chromium}
>>   MAKE_JOBS_NUMBER=16
>>   .else
>>   MAKE_JOBS_NUMBER=7
>>   .endif
>> in /usr/local/etc/poudriere.d/make.conf, since this gives me the best
>> overall build time for my set of ports.  This hits memory pretty hard,
>> especially when chromium, firefox, libreoffice, and both versions of
>> openoffice are all building at the same time.  During this time, the
>> amount of space consumed by tmpfs for /wrkdir gets large when building
>> these large ports.  There is not enough RAM to hold it all, so some of
>> the older data spills over to swap.  Swap usage peaks at about 10 GB,
>> leaving about 30 GB of free swap.  Nevertheless, I see these errors,
>> with rustc being the usual victim:
>> 
>> Sep 11 23:21:43 zipper kernel: pid 16581 (rustc), jid 43, uid 65534, was 
>> killed: out of swap space
>> Sep 12 02:48:23 zipper kernel: pid 1209 (rustc), jid 62, uid 65534, was 
>> killed: out of swap space
>> 
>> Top shows the size of rustc being about 2 GB, so I doubt that it
>> suddenly needs an additional 30 GB of swap.
>> 
>> I'm wondering if there might be a transient kmem shortage that is
>> causing a malloc(..., M_NOWAIT) failure in the swap allocation path
>> that is the cause of the problem.
> 
> Perhaps this is a consequence of r351114?  To confirm this, you might
> try increasing the value of vm.pfault_oom_wait to a larger value, like
> 20 or 30, and see if the OOM kills still occur.

I wonder if increasing vm.pfault_oom_attempts might also be a good idea.

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: spurious out of swap kills

2019-09-12 Thread Mark Johnston
On Thu, Sep 12, 2019 at 04:00:17PM -0700, Don Lewis wrote:
> My poudriere machine is running 13.0-CURRENT and gets updated to the
> latest version of -CURRENT periodically.  At least in the last week or
> so, I've been seeing occasional port build failures when building my
> default set of ports, and I finally had some time to do some
> investigation.
> 
> It's a 16-thread Ryzen machine, with 64 GB of RAM and 40 GB of swap.
> Poudriere is configured with
>   USE_TMPFS="wrkdir data localbase"
> and I have
>   .if ${.CURDIR:M*/www/chromium}
>   MAKE_JOBS_NUMBER=16
>   .else
>   MAKE_JOBS_NUMBER=7
>   .endif
> in /usr/local/etc/poudriere.d/make.conf, since this gives me the best
> overall build time for my set of ports.  This hits memory pretty hard,
> especially when chromium, firefox, libreoffice, and both versions of
> openoffice are all building at the same time.  During this time, the
> amount of space consumed by tmpfs for /wrkdir gets large when building
> these large ports.  There is not enough RAM to hold it all, so some of
> the older data spills over to swap.  Swap usage peaks at about 10 GB,
> leaving about 30 GB of free swap.  Nevertheless, I see these errors,
> with rustc being the usual victim:
> 
> Sep 11 23:21:43 zipper kernel: pid 16581 (rustc), jid 43, uid 65534, was 
> killed: out of swap space
> Sep 12 02:48:23 zipper kernel: pid 1209 (rustc), jid 62, uid 65534, was 
> killed: out of swap space
> 
> Top shows the size of rustc being about 2 GB, so I doubt that it
> suddenly needs an additional 30 GB of swap.
> 
> I'm wondering if there might be a transient kmem shortage that is
> causing a malloc(..., M_NOWAIT) failure in the swap allocation path
> that is the cause of the problem.

Perhaps this is a consequence of r351114?  To confirm this, you might
try increasing the value of vm.pfault_oom_wait to a larger value, like
20 or 30, and see if the OOM kills still occur.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


spurious out of swap kills

2019-09-12 Thread Don Lewis
My poudriere machine is running 13.0-CURRENT and gets updated to the
latest version of -CURRENT periodically.  At least in the last week or
so, I've been seeing occasional port build failures when building my
default set of ports, and I finally had some time to do some
investigation.

It's a 16-thread Ryzen machine, with 64 GB of RAM and 40 GB of swap.
Poudriere is configured with
  USE_TMPFS="wrkdir data localbase"
and I have
  .if ${.CURDIR:M*/www/chromium}
  MAKE_JOBS_NUMBER=16
  .else
  MAKE_JOBS_NUMBER=7
  .endif
in /usr/local/etc/poudriere.d/make.conf, since this gives me the best
overall build time for my set of ports.  This hits memory pretty hard,
especially when chromium, firefox, libreoffice, and both versions of
openoffice are all building at the same time.  During this time, the
amount of space consumed by tmpfs for /wrkdir gets large when building
these large ports.  There is not enough RAM to hold it all, so some of
the older data spills over to swap.  Swap usage peaks at about 10 GB,
leaving about 30 GB of free swap.  Nevertheless, I see these errors,
with rustc being the usual victim:

Sep 11 23:21:43 zipper kernel: pid 16581 (rustc), jid 43, uid 65534, was 
killed: out of swap space
Sep 12 02:48:23 zipper kernel: pid 1209 (rustc), jid 62, uid 65534, was killed: 
out of swap space

Top shows the size of rustc being about 2 GB, so I doubt that it
suddenly needs an additional 30 GB of swap.

I'm wondering if there might be a transient kmem shortage that is
causing a malloc(..., M_NOWAIT) failure in the swap allocation path
that is the cause of the problem.

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Deadlock involving truss -f, pdfork() and wait4()

2019-09-12 Thread Ryan Stone
I've hit an issue with a simple use of pdfork().  I have a process
that calls pdfork() and the parent immediately does a wait4() on the
child pid.  This works fine under normal conditions, but if the parent
is run under truss -f, the three processes deadlock.  If I switch out
pdfork() for fork(), the deadlock does not occur.

This C file demonstrates the issue:

https://people.freebsd.org/~rstone/pdfork.c

If I run "truss -f ./pdfork", which uses fork(), it completes within a
second.  If I run "truss -f ./pdfork -p", which uses pdfork(), the
processes deadlock.  If I run "./pdfork -p" without truss, it
completes normally.

procstat reports the following kernel stacks:

27572 102043 truss   -   mi_switch+0xe2
sleepq_catch_signals+0x425 sleepq_wait_sig+0xf _sleep+0x1bf
kern_wait6+0x695 sys_wait6+0x9f amd64_syscall+0x36e
fast_syscall_common+0x101
27573 102469 pdfork  -   mi_switch+0xe2
sleepq_catch_signals+0x425 sleepq_wait_sig+0xf _sleep+0x1bf
kern_wait6+0x695 sys_wait4+0x78 amd64_syscall+0x36e
fast_syscall_common+0x101
27574 102053 pdfork  -   mi_switch+0xe2
thread_suspend_switch+0xd4 ptracestop+0x13b fork_return+0x14e
fork_exit+0x83 fork_trampoline+0xe

As near as I can tell, truss is blocked waiting for ptrace events, the
parent process is blocked in wait4, and the child process is perhaps
waiting for its parent to exit the kernel so it can send the ptrace
event?

I really don't see anything obvious in the pdfork() code path that
would cause this to happen when fork() doesn't have the problem.  It
may be that pdfork() just changes the timing enough to expose a latent
bug.

I'm seeing this on a recentish current (r351363).
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: "cpuset -n prefer:?" --what values for "?" are supposed to be allowed? (only 1 is, despite two numa domains)

2019-09-12 Thread Mark Johnston
On Wed, Sep 11, 2019 at 11:14:42AM -0700, Mark Millard wrote:
> 
> 
> On 2019-Sep-11, at 10:11, Mark Millard  wrote:
> 
> 
> 
> > On 2019-Sep-11, at 08:15, Mark Johnston  wrote:
> > 
> >> On Wed, Sep 11, 2019 at 07:57:26AM -0700, Mark Millard wrote:
> >>> 
> >>> 
> >>> On 2019-Sep-11, at 07:31, Mark Johnston  wrote:
> >>> 
>  On Tue, Sep 10, 2019 at 10:58:05PM -0700, Mark Millard wrote:
> > In a context with:
> > 
> > # cpuset -g
> > pid -1 mask: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 
> > 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27
> > pid -1 domain policy: first-touch mask: 0, 1
> > 
> > I get:
> > 
> > # cpuset -l0 -n prefer:0 COMMAND
> > cpuset: setdomain: Invalid argument
> > 
> > # cpuset -l0 -n prefer:2 COMMAND
> > cpuset: setdomain: Invalid argument
> > 
> > But one prefer:? value does allow the COMMAND
> > to run:
> > 
> > # cpuset -l0 -n prefer:1 COMMAND
> > 
> > This seem odd to me. Am I missing something?
> > 
> > For reference: I'm using a ThreadRipper 1950X
> > with a head -r351227 based context for this
> > activity. The above happens to have been run
> > in a Windows 10 Pro HyperV session, instead
> > of in a native-boot of the same media. (A
> > native-boot would have had 32 CPUs.)
>  
>  Can you please show the output of "sysctl vm.phys_segs" from this
>  setup?
> >>> 
> >>> Sure:
> >> 
> >> I was wondering if you had only one domain populated, but it seems not
> >> to be the case.  Could you try updating to r351672 or later and see if
> >> the behaviour persists?
> > 
> > It may be a bit before I do that.
> > 
> > FYI: I had set MAXMEMDOM to match the number of
> > actual domains for the context:
> > 
> > /usr/src/sys/amd64/conf/GENERIC-DBG:options MAXMEMDOM=2
> > /usr/src/sys/amd64/conf/GENERIC-NODBG:options   MAXMEMDOM=2
> > 
> > (These kernel configuration files include GENERIC.)

Ok, that helps.  I believe you are hitting a bug that will be fixed by
r351672 and a couple of preceding commits to the same area.

> Not that the below is the problem that I reported, but
> cpuset_modify_domain has an oddity. In the below, note
> the "root->" use followed by the "root &&" test: the
> root-> use would have failed first. Should the && be
> "dset &&" instead? Should "root &&" just be removed for
> being redundant?

Good catch.  I believe cpusets are not allowed to have a NULL domain
set, so dset should never be NULL either.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: r352239: install failure: make[10]: exec(btxld) failed (No such file or directory)

2019-09-12 Thread O. Hartmann
On Thu, 12 Sep 2019 06:27:00 +0200
"O. Hartmann"  wrote:

> Hello,
>
> we install several pkg-based systems and poudriere from a dedicated tree of
> sources, instead of /usr/src it is in our case /pool/sources/CURRENT/src and
> 12-STABLE/src. Compilation of the sources is done within a JAIL!
>
> For a couple of days now, both trees, CURRENT (r352239 now) and 12-STABLE
> (r352239) fail at the exact same point, when compiling and further packaging:
>
> [...]
> install -U -M
> /pool/sources/CURRENT/obj/pool/sources/CURRENT/src/amd64.amd64/worldstage//METALOG
> -D /pool/sources/CURRENT/obj/pool/sources/CURRENT/src/amd64.amd64/worldstage
> -T package=utilities -d -m 0755 -o root  -g wheel
> /pool/sources/CURRENT/obj/pool/sources/CURRENT/src/amd64.amd64/worldstage/boot
> objcopy -S -O binary boot2.out boot2.bin btxld -v -E 0x2000 -f bin -b
> /pool/sources/CURRENT/obj/pool/sources/CURRENT/src/amd64.amd64/stand/i386/btx/btx/btx
> -l boot2.ldr  -o boot2.ld -P 1 boot2.bin make[10]: exec(btxld) failed (No such
> file or directory) *** Error code 1
> [...]
>
> For reduction of the installed binaries and stuff, we use customized src.conf
> and each build process is delegated to its appropriate src.conf by setting the
> variabel SRCCONF accordingly; poudriere also uses the same src.conf by linking
> the jailname-src.conf file into poudriere's config folder; the content of
> src.conf is as follows:
>
> [...]
> WITH_OFED=  YES
> #WITH_CTF=  YES
> #
> #WITH_BEARSSL=  YES
> #
> WITH_SVN=   YES
> #
> WITH_SORT_THREADS=  YES
> #
> MALLOC_PRODUCTION=  YES
> #
> #WITHOUT_ASSERT_DEBUG=  YES
> #WITHOUT_DEBUG_FILES=YES
> #WITHOUT_TESTS=  YES
> WITHOUT_PROFILE=YES
> #
> WITHOUT_REPRODUCIBLE_BUILD= YES
> #
> #  mitigation for CVE-2017-5715 in the kernel build
> WITH_RETPOLINE= YES
>
> [...]
>
> Building poudriere jails from such sources also fails since a couple of days
> on all platforms with a weird message thata folder and/or file atf-check is
> missing (this happens when using command sequence: poudriere jail -j jailname
> -u -b and the install method of the appropriate jail is
> src=/path/to/source/src.
>
> Thanks for helping,
>
> oh
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

After today's update of CURRENT's source tree and "poudriere jail -j jailname
-u -b", which should result in a successful build, the error is:

[...]
cc -target x86_64-unknown-freebsd13.0
--sysroot=/pool/sources/CURRENT/obj/pool/poudriere/jails/headamd64/usr/src/amd64.amd64/tmp
-B/pool/sources/CURRENT/obj/pool/poudriere/jails/headamd64/usr/src/amd64.amd64/tmp/usr/bin
 -O2 -pipe -O3  -DNDEBUG   -I.
-I/pool/poudriere/jails/headamd64/usr/src/contrib/elftoolchain/libelf
-I/pool/poudriere/jails/headamd64/usr/src/contrib/elftoolchain/common
-mretpoline -g -MD  -MF.depend.elf_update.o -MTelf_update.o -std=gnu99
-Wno-format-zero-length -fstack-protector-strong -Wsystem-headers -Werror -Wall
-Wno-format-y2k -W -Wno-unused-parameter -Wstrict-prototypes
-Wmissing-prototypes -Wpointer-arith -Wreturn-type -Wcast-qual -Wwrite-strings
-Wswitch -Wshadow -Wunused-parameter -Wcast-align -Wchar-subscripts -Winline
-Wnested-externs -Wredundant-decls -Wold-style-definition -Wno-pointer-sign
-Wmissing-variable-declarations -Wthread-safety -Wno-empty-body
-Wno-string-plus-int -Wno-unused-const-variable  -Qunused-arguments-c
/pool/poudriere/jails/headamd64/usr/src/contrib/elftoolchain/libelf/elf_update.c
-o elf_update.o
/pool/poudriere/jails/headamd64/usr/src/contrib/elftoolchain/libelf/elf_update.c:841:67:
error: unused parameter 'ex' [-Werror,-Wunused-parameter]
_libelf_write_ehdr(Elf *e, unsigned char *nf, struct _Elf_Extent *ex)
[...]

As I mentioned earlier, the build is performed within a jail considered to run
poudriere and the task has been performed successful earlier (a couple of days
ago, but didn't memorised the revision number).

On non-jailed hosts, this task works as expected, both on 12-STABLE (recent
version 12.1-PRE) and CURRENT. Also did I remove the object's path and started
a fresh build, so  remnants from an earlier build can be excluded.
/usr/local/etc/poudriere.d/jailname-poudriere.conf has a valid setting like
this:

export   MAKEOBJDIRPREFIX=/pool/sources/CURRENT/obj/

Another strategy also fails. Building all the binaries under ./obj from the
base host with an objetctree of a valid and successful build and then jexe'ing
into the poudriere jail, which has all the infrastructure on ZFS already
mounted and typing there

poudriere jail -j jailname -u

which should result in a correct and successfuil installation, fails
immediately with