Re: svn commit: r360233 - in head: contrib/jemalloc . . . : This partially breaks a 2-socket 32-bit powerpc (old PowerMac G4) based on head -r360311

2020-05-13 Thread Justin Hibbits
Hi Mark,

On Wed, 13 May 2020 01:43:23 -0700
Mark Millard  wrote:

> [I'm adding a reference to an old arm64/aarch64 bug that had
> pages turning to zero, in case this 32-bit powerpc issue is
> somewhat analogous.]
> 
> On 2020-May-13, at 00:29, Mark Millard  wrote:
> 
> > [stress alone is sufficient to have jemalloc asserts fail
> > in stress, no need for a multi-socket G4 either. No need
> > to involve nfsd, mountd, rpcbind or the like. This is not
> > a claim that I know all the problems to be the same, just
> > that a jemalloc reported failure in this simpler context
> > happens and zeroed pages are involved.]
> > 
> > Reminder: head -r360311 based context.
> > 
> > 
> > First I show a single CPU/core PowerMac G4 context failing
> > in stress. (I actually did this later, but it is the
> > simpler context.) I simply moved the media from the
> > 2-socket G4 to this slower, single-cpu/core one.
> > 
> > cpu0: Motorola PowerPC 7400 revision 2.9, 466.42 MHz
> > cpu0: Features 9c00
> > cpu0: HID0 8094c0a4
> > real memory  = 1577857024 (1504 MB)
> > avail memory = 1527508992 (1456 MB)
> > 
> > # stress -m 1 --vm-bytes 1792M
> > stress: info: [1024] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd
> > :
> > /usr/src/contrib/jemalloc/include/jemalloc/internal/arena_inlines_b.h:258:
> > Failed assertion: "slab == extent_slab_get(extent)" stress: FAIL:
> > [1024] (415) <-- worker 1025 got signal 6 stress: WARN: [1024]
> > (417) now reaping child worker processes stress: FAIL: [1024] (451)
> > failed run completed in 243s
> > 
> > (Note: 1792 is the biggest it allowed with M.)
> > 
> > The following still pages in and out and fails:
> > 
> > # stress -m 1 --vm-bytes 1290M
> > stress: info: [1163] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd
> > :
> > /usr/src/contrib/jemalloc/include/jemalloc/internal/arena_inlines_b.h:258:
> > Failed assertion: "slab == extent_slab_get(extent)" . . .
> > 
> > By contrast, the following had no problem for as
> > long as I let it run --and did not page in or out:
> > 
> > # stress -m 1 --vm-bytes 1280M
> > stress: info: [1181] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd
> > 
...
> The following is was a fix for a "pages magically
> turn into zeros" problem on amd64/aarch64. The
> original 32-bit powerpc context did not seem a
> match to me --but the stress test behavior that
> I've just observed seems closer from an
> external-test point of view: swapping is involved.
> 
> My be this will suggest something to someone that
> knows what they are doing.
> 
> (Note: dsl-only.net closed down, so the E-mail
> address reference is no longer valid.)
> 
> Author: kib
> Date: Mon Apr 10 15:32:26 2017
> New Revision: 316679
> URL: 
> https://svnweb.freebsd.org/changeset/base/316679
> 
> 
> Log:
>  Do not lose dirty bits for removing PROT_WRITE on arm64.
> 
>  Arm64 pmap interprets accessed writable ptes as modified, since
>  ARMv8.0 does not track Dirty Bit Modifier in hardware. If writable
> bit is removed, page must be marked as dirty for MI VM.
> 
>  This change is most important for COW, where fork caused losing
>  content of the dirty pages which were not yet scanned by pagedaemon.
> 
>  Reviewed by: alc, andrew
>  Reported and tested by:  Mark Millard 
>  PR:  217138, 217239
>  Sponsored by:The FreeBSD Foundation
>  MFC after:   2 weeks
> 
> Modified:
>  head/sys/arm64/arm64/pmap.c
> 
> Modified: head/sys/arm64/arm64/pmap.c
> ==
> --- head/sys/arm64/arm64/pmap.c   Mon Apr 10 12:35:58
> 2017  (r316678) +++ head/sys/arm64/arm64/pmap.c   Mon Apr
> 10 15:32:26 2017  (r316679) @@ -2481,6 +2481,11 @@
> pmap_protect(pmap_t pmap, vm_offset_t sv sva += L3_SIZE) {
>   l3 = pmap_load(l3p);
>   if (pmap_l3_valid(l3)) {
> + if ((l3 & ATTR_SW_MANAGED) &&
> + pmap_page_dirty(l3)) {
> +
> vm_page_dirty(PHYS_TO_VM_PAGE(l3 &
> + ~ATTR_MASK));
> + }
>   pmap_set(l3p, ATTR_AP(ATTR_AP_RO));
>   PTE_SYNC(l3p);
>   /* XXX: Use pmap_invalidate_range */
> 
> 
> ===
> Mark Millard
> marklmi at yahoo.com
> ( dsl-only.net went
> away in early 2018-Mar)
> 

Thanks for this reference.  I took a quick look at the 3 pmap
implementations we have (haven't check the new radix pmap yet), and it
looks like only mmu_oea.c (32-bit AIM pmap, for G3 and G4) is missing
vm_page_dirty() calls in its pmap_protect() implementation, analogous
to the change you posted right above. Given this, I think it's safe to
say that this missing piece is necessary.  We'll work on a fix for
this; looking at moea64_protect(), there may be additional work needed
to support this as well, so it may take a few days.

- Justin
___
freebsd-current@freebsd.org mailing 

Re: zfs deadlock on r360452 relating to busy vm page

2020-05-13 Thread Mark Johnston
On Wed, May 13, 2020 at 10:45:24AM +0300, Andriy Gapon wrote:
> On 13/05/2020 10:35, Andriy Gapon wrote:
> > On 13/05/2020 01:47, Bryan Drewery wrote:
> >> Trivial repro:
> >>
> >> dd if=/dev/zero of=blah & tail -F blah
> >> ^C
> >> load: 0.21  cmd: tail 72381 [prev->lr_read_cv] 2.17r 0.00u 0.01s 0% 2600k
> >> #0 0x80bce615 at mi_switch+0x155
> >> #1 0x80c1cfea at sleepq_switch+0x11a
> >> #2 0x80b57f0a at _cv_wait+0x15a
> >> #3 0x829ddab6 at rangelock_enter+0x306
> >> #4 0x829ecd3f at zfs_freebsd_getpages+0x14f
> >> #5 0x810e3ab9 at VOP_GETPAGES_APV+0x59
> >> #6 0x80f349e7 at vnode_pager_getpages+0x37
> >> #7 0x80f2a93f at vm_pager_get_pages+0x4f
> >> #8 0x80f054b0 at vm_fault+0x780
> >> #9 0x80f04bde at vm_fault_trap+0x6e
> >> #10 0x8106544e at trap_pfault+0x1ee
> >> #11 0x81064a9c at trap+0x44c
> >> #12 0x8103a978 at calltrap+0x8
> > 
> > In r329363 I re-worked zfs_getpages and introduced range locking to it.
> > At the time I believed that it was safe and maybe it was, please see the 
> > commit
> > message.
> > There, indeed, have been many performance / concurrency improvements to the 
> > VM
> > system and r358443 is one of them.
> 
> Thinking more about it, it could be r352176.
> I think that vm_page_grab_valid (and later vm_page_grab_valid_unlocked) are 
> not
> equivalent to the code that they replaced.
> The original code would check valid field before any locking and it would
> attempt any locking / busing if a page is invalid.  The object was required to
> be locked though.
> The new code tries to busy the page in any case.
> 
> > I am not sure how to resolve the problem best.  Maybe someone who knows the
> > latest VM code better than me can comment on my assumptions stated in the 
> > commit
> > message.

The general trend has been to use the page busy lock as the single point
of synchronization for per-page state.  As you noted, updates to the
valid bits were previously interlocked by the object lock, but this is
coarse-grained and hurts concurrency.  I think you are right that the
range locking in getpages was ok before the recent change, but it seems
preferable to try and address this in ZFS.

> > In illumos (and, I think, in OpenZFS/ZoL) they don't have the range locking 
> > in
> > this corner of the code because of a similar deadlock a long time ago.

Do they just not implement readahead?  Can you explain exactly what the
range lock accomplishes here - is it entirely to ensure that znode block
size remains stable?
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: svn commit: r360233 - in head: contrib/jemalloc . . . : This partially breaks a 2-socket 32-bit powerpc (old PowerMac G4) based on head -r360311

2020-05-13 Thread Mark Millard
[I'm adding a reference to an old arm64/aarch64 bug that had
pages turning to zero, in case this 32-bit powerpc issue is
somewhat analogous.]

On 2020-May-13, at 00:29, Mark Millard  wrote:

> [stress alone is sufficient to have jemalloc asserts fail
> in stress, no need for a multi-socket G4 either. No need
> to involve nfsd, mountd, rpcbind or the like. This is not
> a claim that I know all the problems to be the same, just
> that a jemalloc reported failure in this simpler context
> happens and zeroed pages are involved.]
> 
> Reminder: head -r360311 based context.
> 
> 
> First I show a single CPU/core PowerMac G4 context failing
> in stress. (I actually did this later, but it is the
> simpler context.) I simply moved the media from the
> 2-socket G4 to this slower, single-cpu/core one.
> 
> cpu0: Motorola PowerPC 7400 revision 2.9, 466.42 MHz
> cpu0: Features 9c00
> cpu0: HID0 8094c0a4
> real memory  = 1577857024 (1504 MB)
> avail memory = 1527508992 (1456 MB)
> 
> # stress -m 1 --vm-bytes 1792M
> stress: info: [1024] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd
> : 
> /usr/src/contrib/jemalloc/include/jemalloc/internal/arena_inlines_b.h:258: 
> Failed assertion: "slab == extent_slab_get(extent)"
> stress: FAIL: [1024] (415) <-- worker 1025 got signal 6
> stress: WARN: [1024] (417) now reaping child worker processes
> stress: FAIL: [1024] (451) failed run completed in 243s
> 
> (Note: 1792 is the biggest it allowed with M.)
> 
> The following still pages in and out and fails:
> 
> # stress -m 1 --vm-bytes 1290M
> stress: info: [1163] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd
> : 
> /usr/src/contrib/jemalloc/include/jemalloc/internal/arena_inlines_b.h:258: 
> Failed assertion: "slab == extent_slab_get(extent)"
> . . .
> 
> By contrast, the following had no problem for as
> long as I let it run --and did not page in or out:
> 
> # stress -m 1 --vm-bytes 1280M
> stress: info: [1181] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd
> 
> 
> 
> 
> The 2 socket PowerMac G4 context with 2048 MiByte of RAM . . .
> 
> stress -m 1 --vm-bytes 1792M
> 
> did not (quickly?) fail or page. 1792
> is as large as it would allow with M.
> 
> The following also did not (quickly?) fail
> (and were not paging):
> 
> stress -m 2 --vm-bytes 896M
> stress -m 4 --vm-bytes 448M
> stress -m 8 --vm-bytes 224M
> 
> (Only 1 example was run at a time.)
> 
> But the following all did quickly fail (and were
> paging):
> 
> stress -m 8 --vm-bytes 225M
> stress -m 4 --vm-bytes 449M
> stress -m 2 --vm-bytes 897M
> 
> (Only 1 example was run at a time.)
> 
> I'll note that when I exited an su process
> I ended up with a:
> 
> : /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:200: 
> Failed assertion: "ret == sz_index2size_compute(index)"
> Abort trap (core dumped)
> 
> and a matching su.core file. It appears
> that stress's activity leads to other
> processes also seeing examples of the
> zeroed-page(s) problem (probably su had
> paged some or had been fully swapped
> out):
> 
> (gdb) bt
> #0  thr_kill () at thr_kill.S:4
> #1  0x503821d0 in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52
> #2  0x502e1d20 in abort () at /usr/src/lib/libc/stdlib/abort.c:67
> #3  0x502d6144 in sz_index2size_lookup (index=) at 
> /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:200
> #4  sz_index2size (index=) at 
> /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:207
> #5  ifree (tsd=0x5008b018, ptr=0x50041460, tcache=0x5008b138, 
> slow_path=) at jemalloc_jemalloc.c:2583
> #6  0x502d5cec in __je_free_default (ptr=0x50041460) at 
> jemalloc_jemalloc.c:2784
> #7  0x502d62d4 in __free (ptr=0x50041460) at jemalloc_jemalloc.c:2852
> #8  0x501050cc in openpam_destroy_chain (chain=0x50041480) at 
> /usr/src/contrib/openpam/lib/libpam/openpam_load.c:113
> #9  0x50105094 in openpam_destroy_chain (chain=0x500413c0) at 
> /usr/src/contrib/openpam/lib/libpam/openpam_load.c:111
> #10 0x50105094 in openpam_destroy_chain (chain=0x50041320) at 
> /usr/src/contrib/openpam/lib/libpam/openpam_load.c:111
> #11 0x50105094 in openpam_destroy_chain (chain=0x50041220) at 
> /usr/src/contrib/openpam/lib/libpam/openpam_load.c:111
> #12 0x50105094 in openpam_destroy_chain (chain=0x50041120) at 
> /usr/src/contrib/openpam/lib/libpam/openpam_load.c:111
> #13 0x50105094 in openpam_destroy_chain (chain=0x50041100) at 
> /usr/src/contrib/openpam/lib/libpam/openpam_load.c:111
> #14 0x50105014 in openpam_clear_chains (policy=0x5064) at 
> /usr/src/contrib/openpam/lib/libpam/openpam_load.c:130
> #15 0x50101230 in pam_end (pamh=0x5060, status=) at 
> /usr/src/contrib/openpam/lib/libpam/pam_end.c:83
> #16 0x1001225c in main (argc=, argv=0x0) at 
> /usr/src/usr.bin/su/su.c:477
> 
> (gdb) print/x __je_sz_size2index_tab
> $1 = {0x0 }
> 
> 
> Notes:
> 
> Given that the original problem did not involve
> paging to the swap partition, may be just making
> it to the Laundry list or some such is sufficient,
> something that is also involved when the swap
> space is 

Re: svn commit: r360233 - in head: contrib/jemalloc . . . : This partially breaks a 2-socket 32-bit powerpc (old PowerMac G4) based on head -r360311

2020-05-13 Thread Mark Millard
[stress alone is sufficient to have jemalloc asserts fail
in stress, no need for a multi-socket G4 either. No need
to involve nfsd, mountd, rpcbind or the like. This is not
a claim that I know all the problems to be the same, just
that a jemalloc reported failure in this simpler context
happens and zeroed pages are involved.]

Reminder: head -r360311 based context.


First I show a single CPU/core PowerMac G4 context failing
in stress. (I actually did this later, but it is the
simpler context.) I simply moved the media from the
2-socket G4 to this slower, single-cpu/core one.

cpu0: Motorola PowerPC 7400 revision 2.9, 466.42 MHz
cpu0: Features 9c00
cpu0: HID0 8094c0a4
real memory  = 1577857024 (1504 MB)
avail memory = 1527508992 (1456 MB)

# stress -m 1 --vm-bytes 1792M
stress: info: [1024] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd
: 
/usr/src/contrib/jemalloc/include/jemalloc/internal/arena_inlines_b.h:258: 
Failed assertion: "slab == extent_slab_get(extent)"
stress: FAIL: [1024] (415) <-- worker 1025 got signal 6
stress: WARN: [1024] (417) now reaping child worker processes
stress: FAIL: [1024] (451) failed run completed in 243s

(Note: 1792 is the biggest it allowed with M.)

The following still pages in and out and fails:

# stress -m 1 --vm-bytes 1290M
stress: info: [1163] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd
: 
/usr/src/contrib/jemalloc/include/jemalloc/internal/arena_inlines_b.h:258: 
Failed assertion: "slab == extent_slab_get(extent)"
. . .

By contrast, the following had no problem for as
long as I let it run --and did not page in or out:

# stress -m 1 --vm-bytes 1280M
stress: info: [1181] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd




The 2 socket PowerMac G4 context with 2048 MiByte of RAM . . .

stress -m 1 --vm-bytes 1792M

did not (quickly?) fail or page. 1792
is as large as it would allow with M.

The following also did not (quickly?) fail
(and were not paging):

stress -m 2 --vm-bytes 896M
stress -m 4 --vm-bytes 448M
stress -m 8 --vm-bytes 224M

(Only 1 example was run at a time.)

But the following all did quickly fail (and were
paging):

stress -m 8 --vm-bytes 225M
stress -m 4 --vm-bytes 449M
stress -m 2 --vm-bytes 897M

(Only 1 example was run at a time.)

I'll note that when I exited an su process
I ended up with a:

: /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:200: 
Failed assertion: "ret == sz_index2size_compute(index)"
Abort trap (core dumped)

and a matching su.core file. It appears
that stress's activity leads to other
processes also seeing examples of the
zeroed-page(s) problem (probably su had
paged some or had been fully swapped
out):

(gdb) bt
#0  thr_kill () at thr_kill.S:4
#1  0x503821d0 in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52
#2  0x502e1d20 in abort () at /usr/src/lib/libc/stdlib/abort.c:67
#3  0x502d6144 in sz_index2size_lookup (index=) at 
/usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:200
#4  sz_index2size (index=) at 
/usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:207
#5  ifree (tsd=0x5008b018, ptr=0x50041460, tcache=0x5008b138, 
slow_path=) at jemalloc_jemalloc.c:2583
#6  0x502d5cec in __je_free_default (ptr=0x50041460) at jemalloc_jemalloc.c:2784
#7  0x502d62d4 in __free (ptr=0x50041460) at jemalloc_jemalloc.c:2852
#8  0x501050cc in openpam_destroy_chain (chain=0x50041480) at 
/usr/src/contrib/openpam/lib/libpam/openpam_load.c:113
#9  0x50105094 in openpam_destroy_chain (chain=0x500413c0) at 
/usr/src/contrib/openpam/lib/libpam/openpam_load.c:111
#10 0x50105094 in openpam_destroy_chain (chain=0x50041320) at 
/usr/src/contrib/openpam/lib/libpam/openpam_load.c:111
#11 0x50105094 in openpam_destroy_chain (chain=0x50041220) at 
/usr/src/contrib/openpam/lib/libpam/openpam_load.c:111
#12 0x50105094 in openpam_destroy_chain (chain=0x50041120) at 
/usr/src/contrib/openpam/lib/libpam/openpam_load.c:111
#13 0x50105094 in openpam_destroy_chain (chain=0x50041100) at 
/usr/src/contrib/openpam/lib/libpam/openpam_load.c:111
#14 0x50105014 in openpam_clear_chains (policy=0x5064) at 
/usr/src/contrib/openpam/lib/libpam/openpam_load.c:130
#15 0x50101230 in pam_end (pamh=0x5060, status=) at 
/usr/src/contrib/openpam/lib/libpam/pam_end.c:83
#16 0x1001225c in main (argc=, argv=0x0) at 
/usr/src/usr.bin/su/su.c:477

(gdb) print/x __je_sz_size2index_tab
$1 = {0x0 }


Notes:

Given that the original problem did not involve
paging to the swap partition, may be just making
it to the Laundry list or some such is sufficient,
something that is also involved when the swap
space is partially in use (according to top). Or
sitting in the inactive list for a long time, if
that has some special status.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: svn commit: r360233 - in head: contrib/jemalloc . . . : This partially breaks a 2-socket 32-bit powerpc (old PowerMac G4) based on head -r360311

2020-05-13 Thread Mark Millard
[Yet another new kind of experiment. But this looks
like I can cause problems in fairly sort order on
demand now. Finally! And with that I've much better
evidence for kernel vs. user-space process for making
the zeroed memory appear in, for example, nfsd.]

I've managed to get:

: 
/usr/src/contrib/jemalloc/include/jemalloc/internal/arena_inlines_b.h:258: 
Failed assertion: "slab == extent_slab_get(extent)"
: 
/usr/src/contrib/jemalloc/include/jemalloc/internal/arena_inlines_b.h:258: 
Failed assertion: "slab == extent_slab_get(extent)"

and eventually:

[1]   Segmentation fault (core dumped) stress -m 2 --vm-bytes 1700M

from a user program (stress) while another machine was attempted an nfs
mount during the stress activity:

# mount -onoatime,soft ...:/ /mnt && umount /mnt && rpcinfo -s ...
[tcp] ...:/: RPCPROG_MNT: RPC: Timed out

(To get failure I may have to run the commands
multiple times. Timing details against stress's
activity seem to matter.)

That failure lead to:

# ls -ldT /*.core*
-rw---  1 root  wheel  3899392 May 12 19:52:26 2020 /mountd.core

# ls -ldT *.core*
-rw---  1 root  wheel  2682880 May 12 20:00:26 2020 stress.core

(Note which of nfsd, mountd, or rpcbind need not be
fully repeatable. stress.core seems to be written twice,
probably because of the -m 2 in use.)

The context that let me do this was to first (on the 2 socket
G4 with a full 2048 MiBYte RAM configuration):

stress -m 2 --vm-bytes 1700M &

Note that the stress command keeps the memory busy
and causes paging to the swap/page space. I've not
tried to make it just fit without paging or just
barely paging or such. The original context did not
involve paging or low RAM, so I do not expect paging
to be required but can be involved.

The stress program backtrace is different:

4827return (tls_get_addr_slow(dtvp, index, offset));
4828}
(gdb) bt -full
#0  0x41831b04 in tls_get_addr_common (dtvp=0x4186c010, index=2, 
offset=4294937444) at /usr/src/libexec/rtld-elf/rtld.c:4824
dtv = 0x0
#1  0x4182bfcc in __tls_get_addr (ti=) at 
/usr/src/libexec/rtld-elf/powerpc/reloc.c:848
tp = 
p = 
#2  0x41a83464 in __get_locale () at 
/usr/src/lib/libc/locale/xlocale_private.h:199
No locals.
#3  fprintf (fp=0x41b355f8, fmt=0x1804cbc "%s: FAIL: [%lli] (%d) ") at 
/usr/src/lib/libc/stdio/fprintf.c:57
ap = {{gpr = 2 '\002', fpr = 0 '\000', reserved = 20731, 
overflow_arg_area = 0xdb78, reg_save_area = 0xdae8}}
ret = 
#4  0x01802348 in main (argc=, argv=) at 
stress.c:415
status = 
ret = 6
do_dryrun = 0
retval = 0
children = 1
do_backoff = 
do_hdd_bytes = 
do_hdd = 
do_vm_keep = 0
do_vm_hang = -1
do_vm_stride = 4096
do_vm_bytes = 1782579200
do_vm = 108174317627375616
do_io = 
do_cpu = 
do_timeout = 108176117243859333
starttime = 1589338322
i = 
forks = 
pid = 6140
stoptime = 
runtime = 

Apparently the asserts did not stop the code
and it ran until a failure occurred (via
dtv=0x0).

Stress uses a mutex stored on a page that gets
the "turns into zeros" problem, preventing
the mprotect(ADDR,1,1) type of test because
stress will write on the page. (I've not tried
to find a minimal form of stress run.)

The the same sort of globals are again
zeroed, such as:

(gdb) print/x __je_sz_size2index_tab
$1 = {0x0 }


Another attempt lost rpcbind instead instead of
mountd:

# ls -ldT /*.core
-rw---  1 root  wheel  3899392 May 12 19:52:26 2020 /mountd.core
-rw---  1 root  wheel  3170304 May 12 20:03:00 2020 /rpcbind.core


I again find that when I use gdb 3 times
to:

attach ???
x/x __je_sz_size2index_tab
print (int)mprotext(ADDRESS,1,1)
quit

for each of rpcbind, mountd, and nfsd master that
those processes no longer fail during the
mount/umount/rpcinfo (or are far less likely to).

But it turns out that later when I "service nfsd
stop" nfsd does get the zeroed Memory based assert
and core dumps. (I'd done a bunch of the mount/umount/
rpcinfo sequences before the stop.)

That the failure is during SIGUSR1 based shutdown,
leads me to wonder if killing off some child
process(es) is involved in the problem.

There was *no* evidence of a signal for an attempt
to write the page from the user process. It appears
that the kernel is doing something that changes what
the process sees --instead of the user-space programs
stomping on its own memory content.

I've no clue how to track down the kernel activity
that changes what the process sees on some page(s)
of memory.

(Prior testing with a debug kernel did not report
problems, despite getting an example failure. So
that seems insufficient.)

At least a procedure is now known that does not
involved waiting hours or days.


The procedure (adjusted for how much RAM is present
and number of cpus/cores?) could be appropriate to
run in other contexts than the 32-bit powerpc 

Re: zfs deadlock on r360452 relating to busy vm page

2020-05-13 Thread Andriy Gapon
On 13/05/2020 10:35, Andriy Gapon wrote:
> On 13/05/2020 01:47, Bryan Drewery wrote:
>> Trivial repro:
>>
>> dd if=/dev/zero of=blah & tail -F blah
>> ^C
>> load: 0.21  cmd: tail 72381 [prev->lr_read_cv] 2.17r 0.00u 0.01s 0% 2600k
>> #0 0x80bce615 at mi_switch+0x155
>> #1 0x80c1cfea at sleepq_switch+0x11a
>> #2 0x80b57f0a at _cv_wait+0x15a
>> #3 0x829ddab6 at rangelock_enter+0x306
>> #4 0x829ecd3f at zfs_freebsd_getpages+0x14f
>> #5 0x810e3ab9 at VOP_GETPAGES_APV+0x59
>> #6 0x80f349e7 at vnode_pager_getpages+0x37
>> #7 0x80f2a93f at vm_pager_get_pages+0x4f
>> #8 0x80f054b0 at vm_fault+0x780
>> #9 0x80f04bde at vm_fault_trap+0x6e
>> #10 0x8106544e at trap_pfault+0x1ee
>> #11 0x81064a9c at trap+0x44c
>> #12 0x8103a978 at calltrap+0x8
> 
> In r329363 I re-worked zfs_getpages and introduced range locking to it.
> At the time I believed that it was safe and maybe it was, please see the 
> commit
> message.
> There, indeed, have been many performance / concurrency improvements to the VM
> system and r358443 is one of them.

Thinking more about it, it could be r352176.
I think that vm_page_grab_valid (and later vm_page_grab_valid_unlocked) are not
equivalent to the code that they replaced.
The original code would check valid field before any locking and it would
attempt any locking / busing if a page is invalid.  The object was required to
be locked though.
The new code tries to busy the page in any case.

> I am not sure how to resolve the problem best.  Maybe someone who knows the
> latest VM code better than me can comment on my assumptions stated in the 
> commit
> message.
> 
> In illumos (and, I think, in OpenZFS/ZoL) they don't have the range locking in
> this corner of the code because of a similar deadlock a long time ago.
> 
>> On 5/12/2020 3:13 PM, Bryan Drewery wrote:
 panic: deadlres_td_sleep_q: possible deadlock detected for 
 0xfe25eefa2e00 (find), blocked for 1802392 ticks
> ...
 (kgdb) backtrace
 #0  sched_switch (td=0xfe255eac, flags=) at 
 /usr/src/sys/kern/sched_ule.c:2147
 #1  0x80bce615 in mi_switch (flags=260) at 
 /usr/src/sys/kern/kern_synch.c:542
 #2  0x80c1cfea in sleepq_switch (wchan=0xf810fb57dd48, pri=0) 
 at /usr/src/sys/kern/subr_sleepqueue.c:625
 #3  0x80b57f0a in _cv_wait (cvp=0xf810fb57dd48, 
 lock=0xf80049a99040) at /usr/src/sys/kern/kern_condvar.c:146
 #4  0x82434ab6 in rangelock_enter_reader (rl=0xf80049a99018, 
 new=0xf8022cadb100) at 
 /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_rlock.c:429
 #5  rangelock_enter (rl=0xf80049a99018, off=, 
 len=, type=) at 
 /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_rlock.c:477
 #6  0x82443d3f in zfs_getpages (vp=, 
 ma=0xfe259f204b18, count=, rbehind=0xfe259f204ac4, 
 rahead=0xfe259f204ad0) at 
 /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4500
 #7  zfs_freebsd_getpages (ap=) at 
 /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4567
 #8  0x810e3ab9 in VOP_GETPAGES_APV (vop=0x8250a1e0 
 , a=0xfe259f2049f0) at vnode_if.c:2644
 #9  0x80f349e7 in VOP_GETPAGES (vp=, m=>>> out>, count=, rbehind=, rahead=) at 
 ./vnode_if.h:1171
 #10 vnode_pager_getpages (object=, m=, 
 count=, rbehind=, rahead=) at 
 /usr/src/sys/vm/vnode_pager.c:743
 #11 0x80f2a93f in vm_pager_get_pages (object=0xf806cb637c60, 
 m=0xfe259f204b18, count=1, rbehind=, 
 rahead=) at /usr/src/sys/vm/vm_pager.c:305
 #12 0x80f054b0 in vm_fault_getpages (fs=, nera=0, 
 behindp=, aheadp=) at 
 /usr/src/sys/vm/vm_fault.c:1163
 #13 vm_fault (map=, vaddr=, 
 fault_type=, fault_flags=, m_hold=>>> out>) at /usr/src/sys/vm/vm_fault.c:1394
 #14 0x80f04bde in vm_fault_trap (map=0xfe25653949e8, 
 vaddr=, fault_type=, fault_flags=0, 
 signo=0xfe259f204d04, ucode=0xfe259f204d00) at 
 /usr/src/sys/vm/vm_fault.c:589
 #15 0x8106544e in trap_pfault (frame=0xfe259f204d40, 
 usermode=, signo=, ucode=) at 
 /usr/src/sys/amd64/amd64/trap.c:821
 #16 0x81064a9c in trap (frame=0xfe259f204d40) at 
 /usr/src/sys/amd64/amd64/trap.c:340
 #17 
 #18 0x002034fc in ?? ()
> ...
 (kgdb) thread
 [Current thread is 8 (Thread 101255)]
 (kgdb) backtrace
 #0  sched_switch (td=0xfe25c8e9bc00, flags=) at 
 /usr/src/sys/kern/sched_ule.c:2147
 #1  0x80bce615 in mi_switch (flags=260) at 
 /usr/src/sys/kern/kern_synch.c:542
 #2  0x80c1cfea in sleepq_switch (wchan=0xfe001cbca850, pri=84) 
 at /usr/src/sys/kern/subr_sleepqueue.c:625
 #3  0x80f1de50 in _vm_page_busy_sleep (obj=, 
 

Re: zfs deadlock on r360452 relating to busy vm page

2020-05-13 Thread Andriy Gapon
On 13/05/2020 01:47, Bryan Drewery wrote:
> Trivial repro:
> 
> dd if=/dev/zero of=blah & tail -F blah
> ^C
> load: 0.21  cmd: tail 72381 [prev->lr_read_cv] 2.17r 0.00u 0.01s 0% 2600k
> #0 0x80bce615 at mi_switch+0x155
> #1 0x80c1cfea at sleepq_switch+0x11a
> #2 0x80b57f0a at _cv_wait+0x15a
> #3 0x829ddab6 at rangelock_enter+0x306
> #4 0x829ecd3f at zfs_freebsd_getpages+0x14f
> #5 0x810e3ab9 at VOP_GETPAGES_APV+0x59
> #6 0x80f349e7 at vnode_pager_getpages+0x37
> #7 0x80f2a93f at vm_pager_get_pages+0x4f
> #8 0x80f054b0 at vm_fault+0x780
> #9 0x80f04bde at vm_fault_trap+0x6e
> #10 0x8106544e at trap_pfault+0x1ee
> #11 0x81064a9c at trap+0x44c
> #12 0x8103a978 at calltrap+0x8

In r329363 I re-worked zfs_getpages and introduced range locking to it.
At the time I believed that it was safe and maybe it was, please see the commit
message.
There, indeed, have been many performance / concurrency improvements to the VM
system and r358443 is one of them.
I am not sure how to resolve the problem best.  Maybe someone who knows the
latest VM code better than me can comment on my assumptions stated in the commit
message.

In illumos (and, I think, in OpenZFS/ZoL) they don't have the range locking in
this corner of the code because of a similar deadlock a long time ago.

> On 5/12/2020 3:13 PM, Bryan Drewery wrote:
>>> panic: deadlres_td_sleep_q: possible deadlock detected for 
>>> 0xfe25eefa2e00 (find), blocked for 1802392 ticks
...
>>> (kgdb) backtrace
>>> #0  sched_switch (td=0xfe255eac, flags=) at 
>>> /usr/src/sys/kern/sched_ule.c:2147
>>> #1  0x80bce615 in mi_switch (flags=260) at 
>>> /usr/src/sys/kern/kern_synch.c:542
>>> #2  0x80c1cfea in sleepq_switch (wchan=0xf810fb57dd48, pri=0) 
>>> at /usr/src/sys/kern/subr_sleepqueue.c:625
>>> #3  0x80b57f0a in _cv_wait (cvp=0xf810fb57dd48, 
>>> lock=0xf80049a99040) at /usr/src/sys/kern/kern_condvar.c:146
>>> #4  0x82434ab6 in rangelock_enter_reader (rl=0xf80049a99018, 
>>> new=0xf8022cadb100) at 
>>> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_rlock.c:429
>>> #5  rangelock_enter (rl=0xf80049a99018, off=, 
>>> len=, type=) at 
>>> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_rlock.c:477
>>> #6  0x82443d3f in zfs_getpages (vp=, 
>>> ma=0xfe259f204b18, count=, rbehind=0xfe259f204ac4, 
>>> rahead=0xfe259f204ad0) at 
>>> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4500
>>> #7  zfs_freebsd_getpages (ap=) at 
>>> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4567
>>> #8  0x810e3ab9 in VOP_GETPAGES_APV (vop=0x8250a1e0 
>>> , a=0xfe259f2049f0) at vnode_if.c:2644
>>> #9  0x80f349e7 in VOP_GETPAGES (vp=, m=>> out>, count=, rbehind=, rahead=) at 
>>> ./vnode_if.h:1171
>>> #10 vnode_pager_getpages (object=, m=, 
>>> count=, rbehind=, rahead=) at 
>>> /usr/src/sys/vm/vnode_pager.c:743
>>> #11 0x80f2a93f in vm_pager_get_pages (object=0xf806cb637c60, 
>>> m=0xfe259f204b18, count=1, rbehind=, rahead=) 
>>> at /usr/src/sys/vm/vm_pager.c:305
>>> #12 0x80f054b0 in vm_fault_getpages (fs=, nera=0, 
>>> behindp=, aheadp=) at 
>>> /usr/src/sys/vm/vm_fault.c:1163
>>> #13 vm_fault (map=, vaddr=, 
>>> fault_type=, fault_flags=, m_hold=>> out>) at /usr/src/sys/vm/vm_fault.c:1394
>>> #14 0x80f04bde in vm_fault_trap (map=0xfe25653949e8, 
>>> vaddr=, fault_type=, fault_flags=0, 
>>> signo=0xfe259f204d04, ucode=0xfe259f204d00) at 
>>> /usr/src/sys/vm/vm_fault.c:589
>>> #15 0x8106544e in trap_pfault (frame=0xfe259f204d40, 
>>> usermode=, signo=, ucode=) at 
>>> /usr/src/sys/amd64/amd64/trap.c:821
>>> #16 0x81064a9c in trap (frame=0xfe259f204d40) at 
>>> /usr/src/sys/amd64/amd64/trap.c:340
>>> #17 
>>> #18 0x002034fc in ?? ()
...
>>> (kgdb) thread
>>> [Current thread is 8 (Thread 101255)]
>>> (kgdb) backtrace
>>> #0  sched_switch (td=0xfe25c8e9bc00, flags=) at 
>>> /usr/src/sys/kern/sched_ule.c:2147
>>> #1  0x80bce615 in mi_switch (flags=260) at 
>>> /usr/src/sys/kern/kern_synch.c:542
>>> #2  0x80c1cfea in sleepq_switch (wchan=0xfe001cbca850, pri=84) 
>>> at /usr/src/sys/kern/subr_sleepqueue.c:625
>>> #3  0x80f1de50 in _vm_page_busy_sleep (obj=, 
>>> m=0xfe001cbca850, pindex=, wmesg=, 
>>> allocflags=21504, locked=false) at /usr/src/sys/vm/vm_page.c:1094
>>> #4  0x80f241f7 in vm_page_grab_sleep (object=0xf806cb637c60, 
>>> m=, pindex=, wmesg=, 
>>> allocflags=21504, locked=>> address 0x0>) at /usr/src/sys/vm/vm_page.c:4326
>>> #5  vm_page_acquire_unlocked (object=0xf806cb637c60, pindex=1098, 
>>> prev=, mp=0xfe2717fc6730, allocflags=21504) at 
>>> /usr/src/sys/vm/vm_page.c:4469
>>> #6  0x80f24c61 in vm_page_grab_valid_unlocked 
>>> (mp=0xfe2717fc6730, 

lkpi: print stack trace in WARN_ON ?

2020-05-13 Thread Andriy Gapon


Just to get a bigger exposure: https://reviews.freebsd.org/D24779
I think that this is a good idea and, if I am not mistaken, it should match the
Linux behavior.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"