On Mon, Jan 8, 2018 at 5:31 PM Mateusz Guzik wrote:
>
> On Tue, Jan 9, 2018 at 12:41 AM, Michael Jung wrote:
>
> > On 2018-01-08 13:39, John Baldwin wrote:
> >
> >> On Tuesday, November 28, 2017 02:46:03 PM Michael Jung wrote:
> >>
> >>> Hi!
> >>>
> >>> I've recently up'd my processor count on our poudriere box and have
> >>> started noticing the error
> >>> "witness_lock_list_get: witness exhausted" on the console. The kernel
> >>> *DOES NOT* crash but I
> >>> thought the report may be useful to someone.
> >>>
> >>> $ uname -a
> >>> FreeBSD poudriere 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r325999: Sun Nov
> >>> 19 18:41:20 EST 2017
> >>> mikej@poudriere:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
> >>>
> >>> The machine is pretty busy running four poudriere build instances.
> >>>
> >>> last pid: 76584; load averages: 115.07, 115.96, 98.30
> >>>
> >>> up 6+07:32:59 14:44:03
> >>> 763 processes: 117 running, 581 sleeping, 2 zombie, 63 lock
> >>> CPU: 59.0% user, 0.0% nice, 40.7% system, 0.1% interrupt, 0.1% idle
> >>> Mem: 12G Active, 2003M Inact, 44G Wired, 29G Free
> >>> ARC: 28G Total, 11G MFU, 16G MRU, 122M Anon, 359M Header, 1184M Other
> >>> 25G Compressed, 32G Uncompressed, 1.24:1 Ratio
> >>>
> >>> Let me know what additional information I might supply.
> >>>
> >>
> >> This just means that WITNESS stopped working because it ran out of
> >> pre-allocated objects. In particular the objects used to track how
> >> many locks are held by how many threads:
> >>
> >> /*
> >> * XXX: This is somewhat bogus, as we assume here that at most 2048
> >> threads
> >> * will hold LOCK_NCHILDREN locks. We handle failure ok, and we should
> >> * probably be safe for the most part, but it's still a SWAG.
> >> */
> >> #define LOCK_NCHILDREN 5
> >> #define LOCK_CHILDCOUNT 2048
> >>
> >> Probably the '2048' (max number of concurrent threads) needs to scale with
> >> MAXCPU. 2048 threads is probably a bit low on big x86 boxes.
> >>
> >
> >
> > Thank you for you explanation. We are expanding our ESXi cluster and even
> > though with standard edition I can only assign 64 vCPU's to a guest and as
> > much
> > RAM as I want, I do like to help with edge cases if I can make them occur
> > pushing
> > boundaries as I can towards additianional improvements in FreeBSD.
> >
>
> Can you apply this and re-run the test?
>
> https://people.freebsd.org/~mjg/witness.diff
>
> It bumps the counters to be "high enough" but also starts tracking usage.
> If you get
> the message again, bump the values even higher.
>
> Once you get a complete poudriere run which did not result in the problem,
> do:
> $ sysctl debug.witness.list_used debug.witness.list_max_used
>
> to dump the actual usage.
This is a nice little patch. Can we commit to head? Even better
would be if LOCK_CHILDCOUNT could be a tunable. On my largish system,
here's what I get shortly after boot:
debug.witness.list_max_used: 8432
debug.witness.list_used: 8420
-Alan