witness_lock_list_get: witness exhausted
First shame on me as I obviously missed the reply to my former report and did not report back on a proposed patch. https://lists.freebsd.org/pipermail/freebsd-current/2018-January/068136.html I am still running current – the guest has 160GB Ram and 64 vCPU’s assigned under ESXi which mainly runs poudriere for amd64+arm builds and I again noticed "witness_lock_list_get: witness exhausted" on the console (which I don't pay much attention to). UNAME: FreeBSD poudriere.gopai.com 14.0-CURRENT FreeBSD 14.0-CURRENT #0 main-n259872-f948cb717f50: Wed Dec 28 13:13:43 EST 2022 mi...@poudriere.gopai.com:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 It seems that LOCK_NCHILDREN and LOCK_CHILDCOUNT never got incremented due to my lack of reporting in /usr/src/sys/kern/subr_witness.c If it's just a matter of incrementing LOCK_CHIDCOUNT 4096 and LOCK_NCHILDREN 20 without adding the sysctl knobs I do that also - I am not kernel savvy. I will make sure and test an updated patch should one be made available or advise whether to increment these values or ignore the warnings. Regards, Michael Jung CONFIDENTIALITY NOTE: This message is intended only for the use of the individual or entity to whom it is addressed and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this transmission in error, please notify us by telephone at (502) 212-4000 or notify us at PAI, Dept. 99, 2101 High Wickham Place, Suite 101, Louisville, KY 40245 Disclaimer The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful. This email has been scanned for viruses and malware, and may have been automatically archived by Mimecast, a leader in email security and cyber resilience. Mimecast integrates email defenses with brand protection, security awareness training, web security, compliance and other essential capabilities. Mimecast helps protect large and small organizations from malicious activity, human error and technology failure; and to lead the movement toward building a more resilient world. To find out more, visit our website.
Re: witness_lock_list_get: witness exhausted
Just take it and change as you see fit, I don't have time to work on it. On 10/4/21, Alan Somers wrote: > On Mon, Jan 8, 2018 at 5:31 PM Mateusz Guzik wrote: >> >> On Tue, Jan 9, 2018 at 12:41 AM, Michael Jung wrote: >> >> > On 2018-01-08 13:39, John Baldwin wrote: >> > >> >> On Tuesday, November 28, 2017 02:46:03 PM Michael Jung wrote: >> >> >> >>> Hi! >> >>> >> >>> I've recently up'd my processor count on our poudriere box and have >> >>> started noticing the error >> >>> "witness_lock_list_get: witness exhausted" on the console. The >> >>> kernel >> >>> *DOES NOT* crash but I >> >>> thought the report may be useful to someone. >> >>> >> >>> $ uname -a >> >>> FreeBSD poudriere 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r325999: Sun >> >>> Nov >> >>> 19 18:41:20 EST 2017 >> >>> mikej@poudriere:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 >> >>> >> >>> The machine is pretty busy running four poudriere build instances. >> >>> >> >>> last pid: 76584; load averages: 115.07, 115.96, 98.30 >> >>> >> >>> up 6+07:32:59 14:44:03 >> >>> 763 processes: 117 running, 581 sleeping, 2 zombie, 63 lock >> >>> CPU: 59.0% user, 0.0% nice, 40.7% system, 0.1% interrupt, 0.1% >> >>> idle >> >>> Mem: 12G Active, 2003M Inact, 44G Wired, 29G Free >> >>> ARC: 28G Total, 11G MFU, 16G MRU, 122M Anon, 359M Header, 1184M Other >> >>> 25G Compressed, 32G Uncompressed, 1.24:1 Ratio >> >>> >> >>> Let me know what additional information I might supply. >> >>> >> >> >> >> This just means that WITNESS stopped working because it ran out of >> >> pre-allocated objects. In particular the objects used to track how >> >> many locks are held by how many threads: >> >> >> >> /* >> >> * XXX: This is somewhat bogus, as we assume here that at most 2048 >> >> threads >> >> * will hold LOCK_NCHILDREN locks. We handle failure ok, and we >> >> should >> >> * probably be safe for the most part, but it's still a SWAG. >> >> */ >> >> #define LOCK_NCHILDREN 5 >> >> #define LOCK_CHILDCOUNT 2048 >> >> >> >> Probably the '2048' (max number of concurrent threads) needs to scale >> >> with >> >> MAXCPU. 2048 threads is probably a bit low on big x86 boxes. >> >> >> > >> > >> > Thank you for you explanation. We are expanding our ESXi cluster and >> > even >> > though with standard edition I can only assign 64 vCPU's to a guest and >> > as >> > much >> > RAM as I want, I do like to help with edge cases if I can make them >> > occur >> > pushing >> > boundaries as I can towards additianional improvements in FreeBSD. >> > >> >> Can you apply this and re-run the test? >> >> https://people.freebsd.org/~mjg/witness.diff >> >> It bumps the counters to be "high enough" but also starts tracking usage. >> If you get >> the message again, bump the values even higher. >> >> Once you get a complete poudriere run which did not result in the >> problem, >> do: >> $ sysctl debug.witness.list_used debug.witness.list_max_used >> >> to dump the actual usage. > > This is a nice little patch. Can we commit to head? Even better > would be if LOCK_CHILDCOUNT could be a tunable. On my largish system, > here's what I get shortly after boot: > > debug.witness.list_max_used: 8432 > debug.witness.list_used: 8420 > > -Alan > -- Mateusz Guzik
Re: witness_lock_list_get: witness exhausted
On Mon, Jan 8, 2018 at 5:31 PM Mateusz Guzik wrote: > > On Tue, Jan 9, 2018 at 12:41 AM, Michael Jung wrote: > > > On 2018-01-08 13:39, John Baldwin wrote: > > > >> On Tuesday, November 28, 2017 02:46:03 PM Michael Jung wrote: > >> > >>> Hi! > >>> > >>> I've recently up'd my processor count on our poudriere box and have > >>> started noticing the error > >>> "witness_lock_list_get: witness exhausted" on the console. The kernel > >>> *DOES NOT* crash but I > >>> thought the report may be useful to someone. > >>> > >>> $ uname -a > >>> FreeBSD poudriere 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r325999: Sun Nov > >>> 19 18:41:20 EST 2017 > >>> mikej@poudriere:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 > >>> > >>> The machine is pretty busy running four poudriere build instances. > >>> > >>> last pid: 76584; load averages: 115.07, 115.96, 98.30 > >>> > >>> up 6+07:32:59 14:44:03 > >>> 763 processes: 117 running, 581 sleeping, 2 zombie, 63 lock > >>> CPU: 59.0% user, 0.0% nice, 40.7% system, 0.1% interrupt, 0.1% idle > >>> Mem: 12G Active, 2003M Inact, 44G Wired, 29G Free > >>> ARC: 28G Total, 11G MFU, 16G MRU, 122M Anon, 359M Header, 1184M Other > >>> 25G Compressed, 32G Uncompressed, 1.24:1 Ratio > >>> > >>> Let me know what additional information I might supply. > >>> > >> > >> This just means that WITNESS stopped working because it ran out of > >> pre-allocated objects. In particular the objects used to track how > >> many locks are held by how many threads: > >> > >> /* > >> * XXX: This is somewhat bogus, as we assume here that at most 2048 > >> threads > >> * will hold LOCK_NCHILDREN locks. We handle failure ok, and we should > >> * probably be safe for the most part, but it's still a SWAG. > >> */ > >> #define LOCK_NCHILDREN 5 > >> #define LOCK_CHILDCOUNT 2048 > >> > >> Probably the '2048' (max number of concurrent threads) needs to scale with > >> MAXCPU. 2048 threads is probably a bit low on big x86 boxes. > >> > > > > > > Thank you for you explanation. We are expanding our ESXi cluster and even > > though with standard edition I can only assign 64 vCPU's to a guest and as > > much > > RAM as I want, I do like to help with edge cases if I can make them occur > > pushing > > boundaries as I can towards additianional improvements in FreeBSD. > > > > Can you apply this and re-run the test? > > https://people.freebsd.org/~mjg/witness.diff > > It bumps the counters to be "high enough" but also starts tracking usage. > If you get > the message again, bump the values even higher. > > Once you get a complete poudriere run which did not result in the problem, > do: > $ sysctl debug.witness.list_used debug.witness.list_max_used > > to dump the actual usage. This is a nice little patch. Can we commit to head? Even better would be if LOCK_CHILDCOUNT could be a tunable. On my largish system, here's what I get shortly after boot: debug.witness.list_max_used: 8432 debug.witness.list_used: 8420 -Alan
Re: witness_lock_list_get: witness exhausted
On Tue, Jan 9, 2018 at 12:41 AM, Michael Jung <mi...@mikej.com> wrote: > On 2018-01-08 13:39, John Baldwin wrote: > >> On Tuesday, November 28, 2017 02:46:03 PM Michael Jung wrote: >> >>> Hi! >>> >>> I've recently up'd my processor count on our poudriere box and have >>> started noticing the error >>> "witness_lock_list_get: witness exhausted" on the console. The kernel >>> *DOES NOT* crash but I >>> thought the report may be useful to someone. >>> >>> $ uname -a >>> FreeBSD poudriere 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r325999: Sun Nov >>> 19 18:41:20 EST 2017 >>> mikej@poudriere:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 >>> >>> The machine is pretty busy running four poudriere build instances. >>> >>> last pid: 76584; load averages: 115.07, 115.96, 98.30 >>> >>> up 6+07:32:59 14:44:03 >>> 763 processes: 117 running, 581 sleeping, 2 zombie, 63 lock >>> CPU: 59.0% user, 0.0% nice, 40.7% system, 0.1% interrupt, 0.1% idle >>> Mem: 12G Active, 2003M Inact, 44G Wired, 29G Free >>> ARC: 28G Total, 11G MFU, 16G MRU, 122M Anon, 359M Header, 1184M Other >>> 25G Compressed, 32G Uncompressed, 1.24:1 Ratio >>> >>> Let me know what additional information I might supply. >>> >> >> This just means that WITNESS stopped working because it ran out of >> pre-allocated objects. In particular the objects used to track how >> many locks are held by how many threads: >> >> /* >> * XXX: This is somewhat bogus, as we assume here that at most 2048 >> threads >> * will hold LOCK_NCHILDREN locks. We handle failure ok, and we should >> * probably be safe for the most part, but it's still a SWAG. >> */ >> #define LOCK_NCHILDREN 5 >> #define LOCK_CHILDCOUNT 2048 >> >> Probably the '2048' (max number of concurrent threads) needs to scale with >> MAXCPU. 2048 threads is probably a bit low on big x86 boxes. >> > > > Thank you for you explanation. We are expanding our ESXi cluster and even > though with standard edition I can only assign 64 vCPU's to a guest and as > much > RAM as I want, I do like to help with edge cases if I can make them occur > pushing > boundaries as I can towards additianional improvements in FreeBSD. > Can you apply this and re-run the test? https://people.freebsd.org/~mjg/witness.diff It bumps the counters to be "high enough" but also starts tracking usage. If you get the message again, bump the values even higher. Once you get a complete poudriere run which did not result in the problem, do: $ sysctl debug.witness.list_used debug.witness.list_max_used to dump the actual usage. -- Mateusz Guzik ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: witness_lock_list_get: witness exhausted
On 2018-01-08 13:39, John Baldwin wrote: On Tuesday, November 28, 2017 02:46:03 PM Michael Jung wrote: Hi! I've recently up'd my processor count on our poudriere box and have started noticing the error "witness_lock_list_get: witness exhausted" on the console. The kernel *DOES NOT* crash but I thought the report may be useful to someone. $ uname -a FreeBSD poudriere 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r325999: Sun Nov 19 18:41:20 EST 2017 mikej@poudriere:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 The machine is pretty busy running four poudriere build instances. last pid: 76584; load averages: 115.07, 115.96, 98.30 up 6+07:32:59 14:44:03 763 processes: 117 running, 581 sleeping, 2 zombie, 63 lock CPU: 59.0% user, 0.0% nice, 40.7% system, 0.1% interrupt, 0.1% idle Mem: 12G Active, 2003M Inact, 44G Wired, 29G Free ARC: 28G Total, 11G MFU, 16G MRU, 122M Anon, 359M Header, 1184M Other 25G Compressed, 32G Uncompressed, 1.24:1 Ratio Let me know what additional information I might supply. This just means that WITNESS stopped working because it ran out of pre-allocated objects. In particular the objects used to track how many locks are held by how many threads: /* * XXX: This is somewhat bogus, as we assume here that at most 2048 threads * will hold LOCK_NCHILDREN locks. We handle failure ok, and we should * probably be safe for the most part, but it's still a SWAG. */ #define LOCK_NCHILDREN 5 #define LOCK_CHILDCOUNT 2048 Probably the '2048' (max number of concurrent threads) needs to scale with MAXCPU. 2048 threads is probably a bit low on big x86 boxes. Thank you for you explanation. We are expanding our ESXi cluster and even though with standard edition I can only assign 64 vCPU's to a guest and as much RAM as I want, I do like to help with edge cases if I can make them occur pushing boundaries as I can towards additianional improvements in FreeBSD. --mikej ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: witness_lock_list_get: witness exhausted
On Tuesday, November 28, 2017 02:46:03 PM Michael Jung wrote: > Hi! > > I've recently up'd my processor count on our poudriere box and have > started noticing the error > "witness_lock_list_get: witness exhausted" on the console. The kernel > *DOES NOT* crash but I > thought the report may be useful to someone. > > $ uname -a > FreeBSD poudriere 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r325999: Sun Nov > 19 18:41:20 EST 2017 > mikej@poudriere:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 > > The machine is pretty busy running four poudriere build instances. > > last pid: 76584; load averages: 115.07, 115.96, 98.30 > > up 6+07:32:59 14:44:03 > 763 processes: 117 running, 581 sleeping, 2 zombie, 63 lock > CPU: 59.0% user, 0.0% nice, 40.7% system, 0.1% interrupt, 0.1% idle > Mem: 12G Active, 2003M Inact, 44G Wired, 29G Free > ARC: 28G Total, 11G MFU, 16G MRU, 122M Anon, 359M Header, 1184M Other > 25G Compressed, 32G Uncompressed, 1.24:1 Ratio > > Let me know what additional information I might supply. This just means that WITNESS stopped working because it ran out of pre-allocated objects. In particular the objects used to track how many locks are held by how many threads: /* * XXX: This is somewhat bogus, as we assume here that at most 2048 threads * will hold LOCK_NCHILDREN locks. We handle failure ok, and we should * probably be safe for the most part, but it's still a SWAG. */ #define LOCK_NCHILDREN 5 #define LOCK_CHILDCOUNT 2048 Probably the '2048' (max number of concurrent threads) needs to scale with MAXCPU. 2048 threads is probably a bit low on big x86 boxes. -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
witness_lock_list_get: witness exhausted
Hi! I've recently up'd my processor count on our poudriere box and have started noticing the error "witness_lock_list_get: witness exhausted" on the console. The kernel *DOES NOT* crash but I thought the report may be useful to someone. $ uname -a FreeBSD poudriere 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r325999: Sun Nov 19 18:41:20 EST 2017 mikej@poudriere:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 The machine is pretty busy running four poudriere build instances. last pid: 76584; load averages: 115.07, 115.96, 98.30 up 6+07:32:59 14:44:03 763 processes: 117 running, 581 sleeping, 2 zombie, 63 lock CPU: 59.0% user, 0.0% nice, 40.7% system, 0.1% interrupt, 0.1% idle Mem: 12G Active, 2003M Inact, 44G Wired, 29G Free ARC: 28G Total, 11G MFU, 16G MRU, 122M Anon, 359M Header, 1184M Other 25G Compressed, 32G Uncompressed, 1.24:1 Ratio Let me know what additional information I might supply. Thanks! --mikej Copyright (c) 1992-2017 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 12.0-CURRENT #1 r325999: Sun Nov 19 18:41:20 EST 2017 mikej@poudriere:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 FreeBSD clang version 5.0.0 (tags/RELEASE_500/final 312559) (based on LLVM 5.0.0svn) WARNING: WITNESS option enabled, expect reduced performance. VT(vga): text 80x25 CPU: Intel(R) Xeon(R) CPU E7- 4850 @ 2.00GHz (1995.00-MHz K8-class CPU) Origin="GenuineIntel" Id=0x206f2 Family=0x6 Model=0x2f Stepping=2 Features=0x1fa3fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,DTS,MMX,FXSR,SSE,SSE2,SS,HTT> Features2=0x83b82203<SSE3,PCLMULQDQ,SSSE3,CX16,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,HV> AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM> AMD Features2=0x1 Structured Extended Features=0x2 TSC: P-state invariant Hypervisor: Origin = "VMwareVMware" real memory = 96636764160 (92160 MB) avail memory = 93873786880 (89525 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 60 CPUs FreeBSD/SMP: 6 package(s) x 10 core(s) random: unblocking device. MADT: Forcing active-low polarity and level trigger for SCI ioapic0 irqs 0-23 on motherboard SMP: AP CPU #46 Launched! SMP: AP CPU #16 Launched! SMP: AP CPU #7 Launched! SMP: AP CPU #51 Launched! SMP: AP CPU #20 Launched! SMP: AP CPU #15 Launched! SMP: AP CPU #1 Launched! SMP: AP CPU #48 Launched! SMP: AP CPU #23 Launched! SMP: AP CPU #27 Launched! SMP: AP CPU #12 Launched! SMP: AP CPU #2 Launched! SMP: AP CPU #55 Launched! SMP: AP CPU #17 Launched! SMP: AP CPU #52 Launched! SMP: AP CPU #18 Launched! SMP: AP CPU #39 Launched! SMP: AP CPU #11 Launched! SMP: AP CPU #41 Launched! SMP: AP CPU #58 Launched! SMP: AP CPU #3 Launched! SMP: AP CPU #26 Launched! SMP: AP CPU #37 Launched! SMP: AP CPU #45 Launched! SMP: AP CPU #40 Launched! SMP: AP CPU #35 Launched! SMP: AP CPU #53 Launched! SMP: AP CPU #50 Launched! SMP: AP CPU #44 Launched! SMP: AP CPU #34 Launched! SMP: AP CPU #30 Launched! SMP: AP CPU #31 Launched! SMP: AP CPU #47 Launched! SMP: AP CPU #13 Launched! SMP: AP CPU #28 Launched! SMP: AP CPU #49 Launched! SMP: AP CPU #54 Launched! SMP: AP CPU #8 Launched! SMP: AP CPU #57 Launched! SMP: AP CPU #29 Launched! SMP: AP CPU #4 Launched! SMP: AP CPU #24 Launched! SMP: AP CPU #33 Launched! SMP: AP CPU #38 Launched! SMP: AP CPU #59 Launched! SMP: AP CPU #9 Launched! SMP: AP CPU #42 Launched! SMP: AP CPU #5 Launched! SMP: AP CPU #6 Launched! SMP: AP CPU #43 Launched! SMP: AP CPU #14 Launched! SMP: AP CPU #19 Launched! SMP: AP CPU #10 Launched! SMP: AP CPU #25 Launched! SMP: AP CPU #56 Launched! SMP: AP CPU #36 Launched! SMP: AP CPU #22 Launched! SMP: AP CPU #32 Launched! SMP: AP CPU #21 Launched! random: entropy device external interface netmap: loaded module [ath_hal] loaded module_register_init: MOD_LOAD (vesa, 0x80f95c20, 0) error 19 kbd1 at kbdmux0 nexus0 vtvga0: on motherboard cryptosoft0: on motherboard acpi0: on motherboard acpi0: Power Button (fixed) hpet0: iomem 0xfed0-0xfed003ff on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 950 cpu0: numa-domain 0 on acpi0 cpu1: numa-domain 5 on acpi0 cpu2: numa-domain 5 on acpi0 cpu3: numa-domain 5 on acpi0 cpu4: numa-domain 5 on acpi0 cpu5: numa-domain 5 on acpi0 cpu6: numa-domain 5 on acpi0 cpu7: numa-domain 5 on acpi0 cpu8: numa-domain 5 on acpi0 cpu9: numa-domain 5 on acpi0 cpu10: numa-domain 5 on acpi0 cpu11: numa-domain 4 on acpi0 cpu12: numa-domain 4 on acpi0 cpu13: numa-domain 4 on acpi0 cpu14: numa-domain 4 on acpi0 cpu15: numa-domain 4 on acpi0 cpu16: numa-domain 4 on acpi0 cpu17: numa-domain 4 on
Re: witness_lock_list_get: witness exhausted
On Tuesday, August 16, 2011 5:59:53 pm Peter Jeremy wrote: I'm getting the above message when running Peter Holm's stress test with INCARNATIONS=150 on a 16-core sparc. Does this mean LOCK_CHILDCOUNT is too low or does it indicate a leak in witness lock_list_entry's somewhere? Most likely the former. The comment on LOCK_CHILDCOUNT indicates it is dimensioned to allow 2048 threads to hold 5 locks each but it's not clear how many threads will get started at a given INCARNATIONS count. Also, each lle holds a bucket of instances, so the same count would only let 1024 threads hold 6 locks each perhaps (depending on the bucket size.. it was originally 4 locks per lle). -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
witness_lock_list_get: witness exhausted
I'm getting the above message when running Peter Holm's stress test with INCARNATIONS=150 on a 16-core sparc. Does this mean LOCK_CHILDCOUNT is too low or does it indicate a leak in witness lock_list_entry's somewhere? The comment on LOCK_CHILDCOUNT indicates it is dimensioned to allow 2048 threads to hold 5 locks each but it's not clear how many threads will get started at a given INCARNATIONS count. -- Peter Jeremy pgpCkToMhjPEH.pgp Description: PGP signature