Re: FYI: example "panic: ARM64TODO: reclaim_pv_chunk" on a Pine64+ 2GB with head -r317015 [mkimg okay; "cp -p" generates the bad context]
On 2017-Apr-27, at 10:26 PM, Mark Millard wrote: > [As the text does not really follow from the earlier text > I'd sent directly I'm top posting a hypothesis about where > so much active memory came from to be so low in available > memory to get an reclaim_pv_chunk attempt.] > > My hypothesis is that the "mdconfig -d"s from vm_copy_base > (from /usr/src/release/tools/vmimage.subr ) did not actually > release the memory resources involved (from vnode backed > mdconfig use): I watched with "vmstat 1" and "mdconfig -d" did release memory (including RAM) each time. > . . . > From the prior top report for the failure, > partially repeated here: > > . . . > Mem: 1618M Active, 17M Inact, 315M Wired, 204M Buf, 15M Free > Swap: 6144M Total, 34M Used, 6110M Free, 348K Out > > PID USERNAMETHR PRI NICE SIZERES SWAP STATE C TIME CPU > COMMAND > 48988 root 4 310 651M 27048K 0K RUN 0 0:03 87.60% > xz -T 0 /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT- > . . . > > The combination 1618M Mem:Active but 34M Swap:Used > and 651M xz:SIZE but 27M xz:RES and 0K xz:SWAP just > seems very odd, like it should not happen. The 17M > Mem:Inact is odd for the context as well. (Mem:Wired, > Mem:Buf, and Mem:Free look normal.) > > An alternate hypothesis would be the memory "leak" > is from mkimg not having it memory-use cleaned up. > This happens after vm_copy_base but before the cp/xz > sequence and is what produces vm.raw. mkimg also release its memory (including RAM) each time. But the later "cp -p" of the large vm.raw that I was producing ended up without leaving the free memory that is expected after it finished. I worked around the issue with (just a personal workaround that helps in other respects as well): Index: Makefile.vm === --- Makefile.vm (revision 317015) +++ Makefile.vm (working copy) @@ -119,15 +119,20 @@ vm-install: .if defined(WITH_VMIMAGES) && !empty(WITH_VMIMAGES) mkdir -p ${DESTDIR}/vmimages -. for FORMAT in ${VMFORMATS} - cp -p ${VMBASE}.${FORMAT} \ - ${DESTDIR}/vmimages/${OSRELEASE}.${FORMAT} -. endfor . if defined(WITH_COMPRESSED_VMIMAGES) && !empty(WITH_COMPRESSED_VMIMAGES) . for FORMAT in ${VMFORMATS} - # Don't keep the originals. There is a copy in ${.OBJDIR} if needed. - ${XZ_CMD} ${DESTDIR}/vmimages/${OSRELEASE}.${FORMAT} + # Tradeoff "cp -p" property for less memory, I/O, and time use. + # Also: -3 got > 30 MiByte/sec effective (source file) for 32 GiByte, mostly-zero image + # on a Pine64+ 2GB and used about 600 MiBytes of "active virtual memory". The about 16 + # minutes was vastly faster than the "cp -p" --and the sha512 and sha256 are vastly + # faster on the compressed file as well. + ${XZ_CMD} -T 0 -3 --stdout --memlimit-compress=50% -v ${VMBASE}.${FORMAT} > ${DESTDIR}/vmimages/${OSRELEASE}.${FORMAT}.xz . endfor +. else +. for FORMAT in ${VMFORMATS} + cp -p ${VMBASE}.${FORMAT} \ + ${DESTDIR}/vmimages/${OSRELEASE}.${FORMAT} +. endfor . endif cd ${DESTDIR}/vmimages && sha512 ${OSRELEASE}* > \ ${DESTDIR}/vmimages/CHECKSUM.SHA512 and using WITH_COMPRESSED_VMIMAGES to avoid a "cp -p" based copy of the large file. Prior reports from capturing text: On 2017-Apr-27, at 7:31 PM, Mark Millard wrote: > [Another example panic. Again no dump. But I have what > a top -PCwaopid froze at this time.] > > On 2017-Apr-27, at 4:22 PM, Mark Millard wrote: > >> Unfortunately for this FYI the attempt to produce a dump >> failed and so all the information that I have is what I >> first captured from the console output: a backtrace. >> >> The context was head -r317015 on a Pine64+ 2GB. At the >> time I was experimenting with trying to build a vm.raw >> from my own build of FreeBSD. The (root) file system >> is on a USB SSD off of a powered USB hub. >> >> panic: ARM64TODO: reclaim_pv_chunk >> cpuid = 1 >> time = 1493332968 >> KDB: stack backtrace: >> db_trace_self() at db_trace_self_wrapper+0x28 >> pc = 0x00605cc0 lr = 0x000869cc >> sp = 0x83ba4f00 fp = 0x83ba5110 >> >> db_trace_self_wrapper() at vpanic+0x164 >> pc = 0x000869cc lr = 0x0031d464 >> sp = 0x83ba5120 fp = 0x83ba5190 >> >> vpanic() at panic+0x4c >> pc = 0x0031d464 lr = 0x0031d2fc >> sp = 0x83ba51a0 fp = 0x83ba5220 >> >> panic() at reclaim_pv_chunk+0x10 >> pc = 0x0031d2fc lr = 0x0061a234 >> sp = 0x83ba5230 fp = 0x83ba5230 >> >> reclaim_pv_chunk() at get_pv_entry+0x240 >> pc = 0x0061a234 lr = 0x00616184 >> sp = 0x83ba5240 fp = 0x83ba5260 >> >> get_pv_entry() at pmap_enter+0x694 >> pc = 0x00616184 lr = 0x006156a0 >> sp =
Re: FYI: example "panic: ARM64TODO: reclaim_pv_chunk" on a Pine64+ 2GB with head -r317015 [mdconfig -d not effectively releasing memory?]
[As the text does not really follow from the earlier text I'd sent directly I'm top posting a hypothesis about where so much active memory came from to be so low in available memory to get an reclaim_pv_chunk attempt.] My hypothesis is that the "mdconfig -d"s from vm_copy_base (from /usr/src/release/tools/vmimage.subr ) did not actually release the memory resources involved (from vnode backed mdconfig use): vm_copy_base() { # Creates a new UFS root filesystem and copies the contents of the # current root filesystem into it. This produces a "clean" disk # image without any remnants of files which were created temporarily # during image-creation and have since been deleted (e.g., downloaded # package archives). mkdir -p ${DESTDIR}/old mdold=$(mdconfig -f ${VMBASE}) mount /dev/${mdold} ${DESTDIR}/old truncate -s ${VMSIZE} ${VMBASE}.tmp mkdir -p ${DESTDIR}/new mdnew=$(mdconfig -f ${VMBASE}.tmp) newfs -L rootfs /dev/${mdnew} mount /dev/${mdnew} ${DESTDIR}/new tar -cf- -C ${DESTDIR}/old . | tar -xUf- -C ${DESTDIR}/new umount_loop /dev/${mdold} rmdir ${DESTDIR}/old mdconfig -d -u ${mdold} umount_loop /dev/${mdnew} rmdir ${DESTDIR}/new tunefs -n enable /dev/${mdnew} mdconfig -d -u ${mdnew} mv ${VMBASE}.tmp ${VMBASE} } Without such prior mdconfig activity the "cp -p" and following "xz -T 0" do not get the large Meme:Active figure in top, "xz -T 0" getting more like 781M Mem:Active with a xz:SIZE of 791M and xz:RES < 800M (varying). Zero xz:SWAP. xz also gets all the cores going, so well over 300% in top always (4 cores) instead of < 100%. In this context both the cp and the xz finish just fine. In other words: no low memory problem without first having the vnode backed mdconfig use. >From the prior top report for the failure, partially repeated here: . . . Mem: 1618M Active, 17M Inact, 315M Wired, 204M Buf, 15M Free Swap: 6144M Total, 34M Used, 6110M Free, 348K Out PID USERNAMETHR PRI NICE SIZERES SWAP STATE C TIME CPU COMMAND 48988 root 4 310 651M 27048K 0K RUN 0 0:03 87.60% xz -T 0 /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT- . . . The combination 1618M Mem:Active but 34M Swap:Used and 651M xz:SIZE but 27M xz:RES and 0K xz:SWAP just seems very odd, like it should not happen. The 17M Mem:Inact is odd for the context as well. (Mem:Wired, Mem:Buf, and Mem:Free look normal.) An alternate hypothesis would be the memory "leak" is from mkimg not having it memory-use cleaned up. This happens after vm_copy_base but before the cp/xz sequence and is what produces vm.raw. For reference of what worked just fine after the post-panic reboot, using the already existing vm.raw (sparse) file as a starting place: # cp -p vm.raw /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT-arm64-aarch64.raw # xz -T 0 /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT-arm64-aarch64.raw # ls -lTt *raw* -rw-r--r-- 1 root wheel 34360566272 Apr 27 18:40:24 2017 vm.raw -rw-r--r-- 1 root wheel 34359746560 Apr 27 18:37:29 2017 vm.raw.nested_bsd -rw-r--r-- 1 root wheel 27917287424 Apr 27 18:34:45 2017 raw.img # du -sm *raw* 1762raw.img 1583vm.raw 1583vm.raw.nested_bsd (Before the .xz replaces the .raw:) # ls -lT /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/ total 33820032 -rw-r--r-- 1 root wheel 34360566272 Apr 27 18:40:24 2017 FreeBSD-12.0-CURRENT-arm64-aarch64.raw # du -sm /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/* 32777 /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT-arm64-aarch64.raw (After xz:) # ls -lT /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/ total 258208 -rw-r--r-- 1 root wheel 264275808 Apr 27 18:40:24 2017 FreeBSD-12.0-CURRENT-arm64-aarch64.raw.xz # du -sm /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/* 253 /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT-arm64-aarch64.raw.xz (Mem:Active returned to 10M when xz finished.) Prior reports from capturing text: On 2017-Apr-27, at 7:31 PM, Mark Millard wrote: > [Another example panic. Again no dump. But I have what > a top -PCwaopid froze at this time.] > > On 2017-Apr-27, at 4:22 PM, Mark Millard wrote: > >> Unfortunately for this FYI the attempt to produce a dump >> failed and so all the information that I have is what I >> first captured from the console output: a backtrace. >> >> The context was head -r317015 on a Pine64+ 2GB. At the >> time I was experimenting with trying to build a vm.raw >> from my own build of FreeBSD. The (root) file system >> is on a USB SSD off of a powered USB hub. >> >> panic: ARM64TODO: reclaim_pv_chunk >> cpuid = 1 >> time = 1493332968 >> KDB: stack backtrace: >> db_trace_self() at db_trace_self_wrapper+0x28 >>pc = 0x00605cc0 lr = 0x000869cc >>sp =
Re: FYI: example "panic: ARM64TODO: reclaim_pv_chunk" on a Pine64+ 2GB with head -r317015 [another example]
[Another example panic. Again no dump. But I have what a top -PCwaopid froze at this time.] On 2017-Apr-27, at 4:22 PM, Mark Millard wrote: > Unfortunately for this FYI the attempt to produce a dump > failed and so all the information that I have is what I > first captured from the console output: a backtrace. > > The context was head -r317015 on a Pine64+ 2GB. At the > time I was experimenting with trying to build a vm.raw > from my own build of FreeBSD. The (root) file system > is on a USB SSD off of a powered USB hub. > > panic: ARM64TODO: reclaim_pv_chunk > cpuid = 1 > time = 1493332968 > KDB: stack backtrace: > db_trace_self() at db_trace_self_wrapper+0x28 > pc = 0x00605cc0 lr = 0x000869cc > sp = 0x83ba4f00 fp = 0x83ba5110 > > db_trace_self_wrapper() at vpanic+0x164 > pc = 0x000869cc lr = 0x0031d464 > sp = 0x83ba5120 fp = 0x83ba5190 > > vpanic() at panic+0x4c > pc = 0x0031d464 lr = 0x0031d2fc > sp = 0x83ba51a0 fp = 0x83ba5220 > > panic() at reclaim_pv_chunk+0x10 > pc = 0x0031d2fc lr = 0x0061a234 > sp = 0x83ba5230 fp = 0x83ba5230 > > reclaim_pv_chunk() at get_pv_entry+0x240 > pc = 0x0061a234 lr = 0x00616184 > sp = 0x83ba5240 fp = 0x83ba5260 > > get_pv_entry() at pmap_enter+0x694 > pc = 0x00616184 lr = 0x006156a0 > sp = 0x83ba5270 fp = 0x83ba5300 > > pmap_enter() at vm_fault_hold+0x28c > pc = 0x006156a0 lr = 0x005b9740 > sp = 0x83ba5310 fp = 0x83ba5460 > > vm_fault_hold() at vm_fault+0x70 > pc = 0x005b9740 lr = 0x005b9464 > sp = 0x83ba5470 fp = 0x83ba54a0 > > vm_fault() at data_abort+0xe0 > pc = 0x005b9464 lr = 0x0061ad94 > sp = 0x83ba54b0 fp = 0x83ba5560 > > data_abort() at handle_el1h_sync+0x70 > pc = 0x0061ad94 lr = 0x00607870 > sp = 0x83ba5570 fp = 0x83ba5680 > > handle_el1h_sync() at kern_select+0x9fc > pc = 0x00607870 lr = 0x0037db3c > sp = 0x83ba5690 fp = 0x83ba58f0 > > kern_select() at sys_select+0x5c > pc = 0x0037db3c lr = 0x0037dc58 > sp = 0x83ba5900 fp = 0x83ba5930 > > sys_select() at do_el0_sync+0xa48 > pc = 0x0037dc58 lr = 0x0061b91c > sp = 0x83ba5940 fp = 0x83ba5a70 > > do_el0_sync() at handle_el0_sync+0x6c > pc = 0x0061b91c lr = 0x006079e8 > sp = 0x83ba5a80 fp = 0x83ba5b90 > > handle_el0_sync() at 0x4948c > pc = 0x006079e8 lr = 0x0004948c > sp = 0x83ba5ba0 fp = 0xd960 This time I got to record from top: (swap is on a swap partition) (pid 49888's SIZE vs. RES and SWAP might be interesting) (as might the Active figure) last pid: 48988; load averages: 0.64, 0.44, 0.38 up 0+04:21:01 19:19:50 32 processes: 2 running, 30 sleeping CPU 0: 13.2% user, 0.0% nice, 23.2% system, 0.3% interrupt, 63.3% idle CPU 1: 4.6% user, 0.0% nice, 23.9% system, 0.0% interrupt, 71.5% idle CPU 2: 2.1% user, 0.0% nice, 23.2% system, 0.0% interrupt, 74.8% idle CPU 3: 3.3% user, 0.0% nice, 23.8% system, 0.0% interrupt, 72.8% idle Mem: 1618M Active, 17M Inact, 315M Wired, 204M Buf, 15M Free Swap: 6144M Total, 34M Used, 6110M Free, 348K Out PID USERNAMETHR PRI NICE SIZERES SWAP STATE C TIME CPU COMMAND 48988 root 4 310 651M 27048K 0K RUN 0 0:03 87.60% xz -T 0 /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT-arm64-aarch64.raw 11983 root 1 220 5068K 0K 0K wait3 0:00 0.00% make vm-image vm-install DESTDIR=/usr/obj/DESTDIRs/vmimage-aarch64 () 11981 root 1 420 7320K 0K 1516K wait1 0:00 0.00% sh /root/sys_build_scripts.aarch64-host/make_noscript_aarch64_nodebug_clang_bootstrap-aarch64-host.sh vm-image vm-install 11980 root 1 200 6656K 1548K 0K select 0 0:02 0.00% [script] 11977 root 1 300 7320K 0K 1516K wait3 0:00 0.00% /bin/sh /root/sys_build_scripts.aarch64-host/make_aarch64_nodebug_clang_bootstrap-aarch64-host.sh vm-image vm-install DEST 2694 root 1 200 8804K 2072K 0K CPU22 0:07 0.17% top -PCwaopid 827 root 1 200 7320K 0K 360K wait0 0:00 0.00% su () 826 markmi1 220 10372K 0K 1532K wait3 0:00 0.00% su () 820 markmi1 240 7320K 0K
FYI: example "panic: ARM64TODO: reclaim_pv_chunk" on a Pine64+ 2GB with head -r317015
Unfortunately for this FYI the attempt to produce a dump failed and so all the information that I have is what I first captured from the console output: a backtrace. The context was head -r317015 on a Pine64+ 2GB. At the time I was experimenting with trying to build a vm.raw from my own build of FreeBSD. The (root) file system is on a USB SSD off of a powered USB hub. panic: ARM64TODO: reclaim_pv_chunk cpuid = 1 time = 1493332968 KDB: stack backtrace: db_trace_self() at db_trace_self_wrapper+0x28 pc = 0x00605cc0 lr = 0x000869cc sp = 0x83ba4f00 fp = 0x83ba5110 db_trace_self_wrapper() at vpanic+0x164 pc = 0x000869cc lr = 0x0031d464 sp = 0x83ba5120 fp = 0x83ba5190 vpanic() at panic+0x4c pc = 0x0031d464 lr = 0x0031d2fc sp = 0x83ba51a0 fp = 0x83ba5220 panic() at reclaim_pv_chunk+0x10 pc = 0x0031d2fc lr = 0x0061a234 sp = 0x83ba5230 fp = 0x83ba5230 reclaim_pv_chunk() at get_pv_entry+0x240 pc = 0x0061a234 lr = 0x00616184 sp = 0x83ba5240 fp = 0x83ba5260 get_pv_entry() at pmap_enter+0x694 pc = 0x00616184 lr = 0x006156a0 sp = 0x83ba5270 fp = 0x83ba5300 pmap_enter() at vm_fault_hold+0x28c pc = 0x006156a0 lr = 0x005b9740 sp = 0x83ba5310 fp = 0x83ba5460 vm_fault_hold() at vm_fault+0x70 pc = 0x005b9740 lr = 0x005b9464 sp = 0x83ba5470 fp = 0x83ba54a0 vm_fault() at data_abort+0xe0 pc = 0x005b9464 lr = 0x0061ad94 sp = 0x83ba54b0 fp = 0x83ba5560 data_abort() at handle_el1h_sync+0x70 pc = 0x0061ad94 lr = 0x00607870 sp = 0x83ba5570 fp = 0x83ba5680 handle_el1h_sync() at kern_select+0x9fc pc = 0x00607870 lr = 0x0037db3c sp = 0x83ba5690 fp = 0x83ba58f0 kern_select() at sys_select+0x5c pc = 0x0037db3c lr = 0x0037dc58 sp = 0x83ba5900 fp = 0x83ba5930 sys_select() at do_el0_sync+0xa48 pc = 0x0037dc58 lr = 0x0061b91c sp = 0x83ba5940 fp = 0x83ba5a70 do_el0_sync() at handle_el0_sync+0x6c pc = 0x0061b91c lr = 0x006079e8 sp = 0x83ba5a80 fp = 0x83ba5b90 handle_el0_sync() at 0x4948c pc = 0x006079e8 lr = 0x0004948c sp = 0x83ba5ba0 fp = 0xd960 === Mark Millard markmi at dsl-only.net ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"