Re: FYI: example "panic: ARM64TODO: reclaim_pv_chunk" on a Pine64+ 2GB with head -r317015 [mkimg okay; "cp -p" generates the bad context]

2017-04-30 Thread Mark Millard
On 2017-Apr-27, at 10:26 PM, Mark Millard  wrote:

> [As the text does not really follow from the earlier text
> I'd sent directly I'm top posting a hypothesis about where
> so much active memory came from to be so low in available
> memory to get an reclaim_pv_chunk attempt.]
> 
> My hypothesis is that the "mdconfig -d"s from vm_copy_base
> (from /usr/src/release/tools/vmimage.subr ) did not actually
> release the memory resources involved (from vnode backed
> mdconfig use):

I watched with "vmstat 1" and "mdconfig -d" did release memory
(including RAM) each time.

> . . .
> From the prior top report for the failure,
> partially repeated here:
> 
> . . .
> Mem: 1618M Active, 17M Inact, 315M Wired, 204M Buf, 15M Free
> Swap: 6144M Total, 34M Used, 6110M Free, 348K Out
> 
>  PID USERNAMETHR PRI NICE   SIZERES   SWAP STATE   C   TIME CPU 
> COMMAND
> 48988 root  4  310   651M 27048K 0K RUN 0   0:03  87.60% 
> xz -T 0 /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT-
> . . .
> 
> The combination 1618M Mem:Active but 34M Swap:Used
> and 651M xz:SIZE but 27M xz:RES and 0K xz:SWAP just
> seems very odd, like it should not happen. The 17M
> Mem:Inact is odd for the context as well. (Mem:Wired,
> Mem:Buf, and Mem:Free look normal.)
> 
> An alternate hypothesis would be the memory "leak"
> is from mkimg not having it memory-use cleaned up.
> This happens after vm_copy_base but before the cp/xz
> sequence and is what produces vm.raw.

mkimg also release its memory (including RAM)
each time.

But the later "cp -p" of the large vm.raw that
I was producing ended up without leaving the
free memory that is expected after it finished.
I worked around the issue with (just a personal
workaround that helps in other respects as well):

Index: Makefile.vm
===
--- Makefile.vm (revision 317015)
+++ Makefile.vm (working copy)
@@ -119,15 +119,20 @@
 vm-install:
 .if defined(WITH_VMIMAGES) && !empty(WITH_VMIMAGES)
mkdir -p ${DESTDIR}/vmimages
-. for FORMAT in ${VMFORMATS}
-   cp -p ${VMBASE}.${FORMAT} \
-   ${DESTDIR}/vmimages/${OSRELEASE}.${FORMAT}
-. endfor
 . if defined(WITH_COMPRESSED_VMIMAGES) && !empty(WITH_COMPRESSED_VMIMAGES)
 .  for FORMAT in ${VMFORMATS}
-   # Don't keep the originals.  There is a copy in ${.OBJDIR} if needed.
-   ${XZ_CMD} ${DESTDIR}/vmimages/${OSRELEASE}.${FORMAT}
+   # Tradeoff "cp -p" property for less memory, I/O, and time use.
+   # Also: -3 got > 30 MiByte/sec effective (source file) for 32 GiByte, 
mostly-zero image
+   # on a Pine64+ 2GB and used about 600 MiBytes of "active virtual 
memory". The about 16
+   # minutes was vastly faster than the "cp -p" --and the sha512 and 
sha256 are vastly
+   # faster on the compressed file as well.
+   ${XZ_CMD} -T 0 -3 --stdout --memlimit-compress=50% -v 
${VMBASE}.${FORMAT} > ${DESTDIR}/vmimages/${OSRELEASE}.${FORMAT}.xz
 .  endfor
+. else
+.  for FORMAT in ${VMFORMATS}
+   cp -p ${VMBASE}.${FORMAT} \
+   ${DESTDIR}/vmimages/${OSRELEASE}.${FORMAT}
+.  endfor
 . endif
cd ${DESTDIR}/vmimages && sha512 ${OSRELEASE}* > \
${DESTDIR}/vmimages/CHECKSUM.SHA512

and using WITH_COMPRESSED_VMIMAGES to avoid a "cp -p"
based copy of the large file.


Prior reports from capturing text:

On 2017-Apr-27, at 7:31 PM, Mark Millard  wrote:

> [Another example panic. Again no dump. But I have what
> a top -PCwaopid froze at this time.]
> 
> On 2017-Apr-27, at 4:22 PM, Mark Millard  wrote:
> 
>> Unfortunately for this FYI the attempt to produce a dump
>> failed and so all the information that I have is what I
>> first captured from the console output: a backtrace.
>> 
>> The context was head -r317015 on a Pine64+ 2GB. At the
>> time I was experimenting with trying to build a vm.raw
>> from my own build of FreeBSD. The (root) file system
>> is on a USB SSD off of a powered USB hub.
>> 
>> panic: ARM64TODO: reclaim_pv_chunk
>> cpuid = 1
>> time = 1493332968
>> KDB: stack backtrace:
>> db_trace_self() at db_trace_self_wrapper+0x28
>>   pc = 0x00605cc0  lr = 0x000869cc
>>   sp = 0x83ba4f00  fp = 0x83ba5110
>> 
>> db_trace_self_wrapper() at vpanic+0x164
>>   pc = 0x000869cc  lr = 0x0031d464
>>   sp = 0x83ba5120  fp = 0x83ba5190
>> 
>> vpanic() at panic+0x4c
>>   pc = 0x0031d464  lr = 0x0031d2fc
>>   sp = 0x83ba51a0  fp = 0x83ba5220
>> 
>> panic() at reclaim_pv_chunk+0x10
>>   pc = 0x0031d2fc  lr = 0x0061a234
>>   sp = 0x83ba5230  fp = 0x83ba5230
>> 
>> reclaim_pv_chunk() at get_pv_entry+0x240
>>   pc = 0x0061a234  lr = 0x00616184
>>   sp = 0x83ba5240  fp = 0x83ba5260
>> 
>> get_pv_entry() at pmap_enter+0x694
>>   pc = 0x00616184  lr = 0x006156a0
>>   sp = 

Re: FYI: example "panic: ARM64TODO: reclaim_pv_chunk" on a Pine64+ 2GB with head -r317015 [mdconfig -d not effectively releasing memory?]

2017-04-27 Thread Mark Millard
[As the text does not really follow from the earlier text
I'd sent directly I'm top posting a hypothesis about where
so much active memory came from to be so low in available
memory to get an reclaim_pv_chunk attempt.]

My hypothesis is that the "mdconfig -d"s from vm_copy_base
(from /usr/src/release/tools/vmimage.subr ) did not actually
release the memory resources involved (from vnode backed
mdconfig use):

vm_copy_base() {
# Creates a new UFS root filesystem and copies the contents of the
# current root filesystem into it.  This produces a "clean" disk
# image without any remnants of files which were created temporarily
# during image-creation and have since been deleted (e.g., downloaded
# package archives).

mkdir -p ${DESTDIR}/old
mdold=$(mdconfig -f ${VMBASE})
mount /dev/${mdold} ${DESTDIR}/old

truncate -s ${VMSIZE} ${VMBASE}.tmp
mkdir -p ${DESTDIR}/new
mdnew=$(mdconfig -f ${VMBASE}.tmp)
newfs -L rootfs /dev/${mdnew}
mount /dev/${mdnew} ${DESTDIR}/new

tar -cf- -C ${DESTDIR}/old . | tar -xUf- -C ${DESTDIR}/new

umount_loop /dev/${mdold}
rmdir ${DESTDIR}/old
mdconfig -d -u ${mdold}

umount_loop /dev/${mdnew}
rmdir ${DESTDIR}/new
tunefs -n enable /dev/${mdnew}
mdconfig -d -u ${mdnew}
mv ${VMBASE}.tmp ${VMBASE}
}

Without such prior mdconfig activity the
"cp -p" and following "xz -T 0" do not get the
large Meme:Active figure in top, "xz -T 0" getting
more like 781M Mem:Active with a xz:SIZE of 791M and
xz:RES < 800M (varying). Zero xz:SWAP. xz also gets
all the cores going, so well over 300% in top always
(4 cores) instead of < 100%. In this context both
the cp and the xz finish just fine.

In other words: no low memory problem without
first having the vnode backed mdconfig use.

>From the prior top report for the failure,
partially repeated here:

. . .
Mem: 1618M Active, 17M Inact, 315M Wired, 204M Buf, 15M Free
Swap: 6144M Total, 34M Used, 6110M Free, 348K Out

  PID USERNAMETHR PRI NICE   SIZERES   SWAP STATE   C   TIME CPU 
COMMAND
48988 root  4  310   651M 27048K 0K RUN 0   0:03  87.60% xz 
-T 0 /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT-
. . .

The combination 1618M Mem:Active but 34M Swap:Used
and 651M xz:SIZE but 27M xz:RES and 0K xz:SWAP just
seems very odd, like it should not happen. The 17M
Mem:Inact is odd for the context as well. (Mem:Wired,
Mem:Buf, and Mem:Free look normal.)

An alternate hypothesis would be the memory "leak"
is from mkimg not having it memory-use cleaned up.
This happens after vm_copy_base but before the cp/xz
sequence and is what produces vm.raw.

For reference of what worked just fine after the
post-panic reboot, using the already existing
vm.raw (sparse) file as a starting place:

# cp -p vm.raw  
/usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT-arm64-aarch64.raw
# xz -T 0 
/usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT-arm64-aarch64.raw

# ls -lTt *raw*
-rw-r--r--  1 root  wheel  34360566272 Apr 27 18:40:24 2017 vm.raw
-rw-r--r--  1 root  wheel  34359746560 Apr 27 18:37:29 2017 vm.raw.nested_bsd
-rw-r--r--  1 root  wheel  27917287424 Apr 27 18:34:45 2017 raw.img

# du -sm *raw*
1762raw.img
1583vm.raw
1583vm.raw.nested_bsd

(Before the .xz replaces the .raw:)

# ls -lT /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/
total 33820032
-rw-r--r--  1 root  wheel  34360566272 Apr 27 18:40:24 2017 
FreeBSD-12.0-CURRENT-arm64-aarch64.raw

# du -sm /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/*
32777   
/usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT-arm64-aarch64.raw

(After xz:)

# ls -lT /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/
total 258208
-rw-r--r--  1 root  wheel  264275808 Apr 27 18:40:24 2017 
FreeBSD-12.0-CURRENT-arm64-aarch64.raw.xz

# du -sm /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/*
253 
/usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT-arm64-aarch64.raw.xz

(Mem:Active returned to 10M when xz finished.)


Prior reports from capturing text:

On 2017-Apr-27, at 7:31 PM, Mark Millard  wrote:

> [Another example panic. Again no dump. But I have what
> a top -PCwaopid froze at this time.]
> 
> On 2017-Apr-27, at 4:22 PM, Mark Millard  wrote:
> 
>> Unfortunately for this FYI the attempt to produce a dump
>> failed and so all the information that I have is what I
>> first captured from the console output: a backtrace.
>> 
>> The context was head -r317015 on a Pine64+ 2GB. At the
>> time I was experimenting with trying to build a vm.raw
>> from my own build of FreeBSD. The (root) file system
>> is on a USB SSD off of a powered USB hub.
>> 
>> panic: ARM64TODO: reclaim_pv_chunk
>> cpuid = 1
>> time = 1493332968
>> KDB: stack backtrace:
>> db_trace_self() at db_trace_self_wrapper+0x28
>>pc = 0x00605cc0  lr = 0x000869cc
>>sp = 

Re: FYI: example "panic: ARM64TODO: reclaim_pv_chunk" on a Pine64+ 2GB with head -r317015 [another example]

2017-04-27 Thread Mark Millard
[Another example panic. Again no dump. But I have what
a top -PCwaopid froze at this time.]

On 2017-Apr-27, at 4:22 PM, Mark Millard  wrote:

> Unfortunately for this FYI the attempt to produce a dump
> failed and so all the information that I have is what I
> first captured from the console output: a backtrace.
> 
> The context was head -r317015 on a Pine64+ 2GB. At the
> time I was experimenting with trying to build a vm.raw
> from my own build of FreeBSD. The (root) file system
> is on a USB SSD off of a powered USB hub.
> 
> panic: ARM64TODO: reclaim_pv_chunk
> cpuid = 1
> time = 1493332968
> KDB: stack backtrace:
> db_trace_self() at db_trace_self_wrapper+0x28
> pc = 0x00605cc0  lr = 0x000869cc
> sp = 0x83ba4f00  fp = 0x83ba5110
> 
> db_trace_self_wrapper() at vpanic+0x164
> pc = 0x000869cc  lr = 0x0031d464
> sp = 0x83ba5120  fp = 0x83ba5190
> 
> vpanic() at panic+0x4c
> pc = 0x0031d464  lr = 0x0031d2fc
> sp = 0x83ba51a0  fp = 0x83ba5220
> 
> panic() at reclaim_pv_chunk+0x10
> pc = 0x0031d2fc  lr = 0x0061a234
> sp = 0x83ba5230  fp = 0x83ba5230
> 
> reclaim_pv_chunk() at get_pv_entry+0x240
> pc = 0x0061a234  lr = 0x00616184
> sp = 0x83ba5240  fp = 0x83ba5260
> 
> get_pv_entry() at pmap_enter+0x694
> pc = 0x00616184  lr = 0x006156a0
> sp = 0x83ba5270  fp = 0x83ba5300
> 
> pmap_enter() at vm_fault_hold+0x28c
> pc = 0x006156a0  lr = 0x005b9740
> sp = 0x83ba5310  fp = 0x83ba5460
> 
> vm_fault_hold() at vm_fault+0x70
> pc = 0x005b9740  lr = 0x005b9464
> sp = 0x83ba5470  fp = 0x83ba54a0
> 
> vm_fault() at data_abort+0xe0
> pc = 0x005b9464  lr = 0x0061ad94
> sp = 0x83ba54b0  fp = 0x83ba5560
> 
> data_abort() at handle_el1h_sync+0x70
> pc = 0x0061ad94  lr = 0x00607870
> sp = 0x83ba5570  fp = 0x83ba5680
> 
> handle_el1h_sync() at kern_select+0x9fc
> pc = 0x00607870  lr = 0x0037db3c
> sp = 0x83ba5690  fp = 0x83ba58f0
> 
> kern_select() at sys_select+0x5c
> pc = 0x0037db3c  lr = 0x0037dc58
> sp = 0x83ba5900  fp = 0x83ba5930
> 
> sys_select() at do_el0_sync+0xa48
> pc = 0x0037dc58  lr = 0x0061b91c
> sp = 0x83ba5940  fp = 0x83ba5a70
> 
> do_el0_sync() at handle_el0_sync+0x6c
> pc = 0x0061b91c  lr = 0x006079e8
> sp = 0x83ba5a80  fp = 0x83ba5b90
> 
> handle_el0_sync() at 0x4948c
> pc = 0x006079e8  lr = 0x0004948c
> sp = 0x83ba5ba0  fp = 0xd960


This time I got to record from top:
(swap is on a swap partition)
(pid 49888's SIZE vs. RES and SWAP might be interesting)
(as might the Active figure)

last pid: 48988;  load averages:  0.64,  0.44,  0.38

up 0+04:21:01  19:19:50
32 processes:  2 running, 30 sleeping
CPU 0: 13.2% user,  0.0% nice, 23.2% system,  0.3% interrupt, 63.3% idle
CPU 1:  4.6% user,  0.0% nice, 23.9% system,  0.0% interrupt, 71.5% idle
CPU 2:  2.1% user,  0.0% nice, 23.2% system,  0.0% interrupt, 74.8% idle
CPU 3:  3.3% user,  0.0% nice, 23.8% system,  0.0% interrupt, 72.8% idle
Mem: 1618M Active, 17M Inact, 315M Wired, 204M Buf, 15M Free
Swap: 6144M Total, 34M Used, 6110M Free, 348K Out

  PID USERNAMETHR PRI NICE   SIZERES   SWAP STATE   C   TIME CPU 
COMMAND
48988 root  4  310   651M 27048K 0K RUN 0   0:03  87.60% xz 
-T 0 
/usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT-arm64-aarch64.raw
11983 root  1  220  5068K 0K 0K wait3   0:00   0.00% 
make vm-image vm-install DESTDIR=/usr/obj/DESTDIRs/vmimage-aarch64 ()
11981 root  1  420  7320K 0K  1516K wait1   0:00   0.00% sh 
/root/sys_build_scripts.aarch64-host/make_noscript_aarch64_nodebug_clang_bootstrap-aarch64-host.sh
 vm-image vm-install 
11980 root  1  200  6656K  1548K 0K select  0   0:02   0.00% 
[script]
11977 root  1  300  7320K 0K  1516K wait3   0:00   0.00% 
/bin/sh 
/root/sys_build_scripts.aarch64-host/make_aarch64_nodebug_clang_bootstrap-aarch64-host.sh
 vm-image vm-install DEST
 2694 root  1  200  8804K  2072K 0K CPU22   0:07   0.17% 
top -PCwaopid
  827 root  1  200  7320K 0K   360K wait0   0:00   0.00% su 
()
  826 markmi1  220 10372K 0K  1532K wait3   0:00   0.00% su 
()
  820 markmi1  240  7320K 0K  

FYI: example "panic: ARM64TODO: reclaim_pv_chunk" on a Pine64+ 2GB with head -r317015

2017-04-27 Thread Mark Millard
Unfortunately for this FYI the attempt to produce a dump
failed and so all the information that I have is what I
first captured from the console output: a backtrace.

The context was head -r317015 on a Pine64+ 2GB. At the
time I was experimenting with trying to build a vm.raw
from my own build of FreeBSD. The (root) file system
is on a USB SSD off of a powered USB hub.

panic: ARM64TODO: reclaim_pv_chunk
cpuid = 1
time = 1493332968
KDB: stack backtrace:
db_trace_self() at db_trace_self_wrapper+0x28
 pc = 0x00605cc0  lr = 0x000869cc
 sp = 0x83ba4f00  fp = 0x83ba5110

db_trace_self_wrapper() at vpanic+0x164
 pc = 0x000869cc  lr = 0x0031d464
 sp = 0x83ba5120  fp = 0x83ba5190

vpanic() at panic+0x4c
 pc = 0x0031d464  lr = 0x0031d2fc
 sp = 0x83ba51a0  fp = 0x83ba5220

panic() at reclaim_pv_chunk+0x10
 pc = 0x0031d2fc  lr = 0x0061a234
 sp = 0x83ba5230  fp = 0x83ba5230

reclaim_pv_chunk() at get_pv_entry+0x240
 pc = 0x0061a234  lr = 0x00616184
 sp = 0x83ba5240  fp = 0x83ba5260

get_pv_entry() at pmap_enter+0x694
 pc = 0x00616184  lr = 0x006156a0
 sp = 0x83ba5270  fp = 0x83ba5300

pmap_enter() at vm_fault_hold+0x28c
 pc = 0x006156a0  lr = 0x005b9740
 sp = 0x83ba5310  fp = 0x83ba5460

vm_fault_hold() at vm_fault+0x70
 pc = 0x005b9740  lr = 0x005b9464
 sp = 0x83ba5470  fp = 0x83ba54a0

vm_fault() at data_abort+0xe0
 pc = 0x005b9464  lr = 0x0061ad94
 sp = 0x83ba54b0  fp = 0x83ba5560

data_abort() at handle_el1h_sync+0x70
 pc = 0x0061ad94  lr = 0x00607870
 sp = 0x83ba5570  fp = 0x83ba5680

handle_el1h_sync() at kern_select+0x9fc
 pc = 0x00607870  lr = 0x0037db3c
 sp = 0x83ba5690  fp = 0x83ba58f0

kern_select() at sys_select+0x5c
 pc = 0x0037db3c  lr = 0x0037dc58
 sp = 0x83ba5900  fp = 0x83ba5930

sys_select() at do_el0_sync+0xa48
 pc = 0x0037dc58  lr = 0x0061b91c
 sp = 0x83ba5940  fp = 0x83ba5a70

do_el0_sync() at handle_el0_sync+0x6c
 pc = 0x0061b91c  lr = 0x006079e8
 sp = 0x83ba5a80  fp = 0x83ba5b90

handle_el0_sync() at 0x4948c
 pc = 0x006079e8  lr = 0x0004948c
 sp = 0x83ba5ba0  fp = 0xd960


===
Mark Millard
markmi at dsl-only.net

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"