Re: ZFS deadlock
30.11.2019 15:48, Eugene Grosbein wrote: > Hi! > > I have RAIDZ1 with five GELI-encrypted SSDs da[2-6].eli (non-boot pool). > > I've exported the pool, destroyed da2.eli then successfully imported pool > back in degraded state. > Then I've mounted some file systems successfully but zfs mount for next one > hung on [tx->tx_sync_done_cv] > for 4400 seconds and counting. > > # procstat -kk -L 55464 > PIDTID COMMTDNAME KSTACK > 55464 102422 zfs - mi_switch+0xeb > sleepq_wait+0x2c _cv_wait+0x16e txg_wait_synced+0xa5 dmu_tx_assign+0x48 > zfs_rmnode+0x122 zfs_freebsd_reclaim+0x4e VOP_RECLAIM_APV+0x80 vgonel+0x213 > vrecycle+0x46 zfs_freebsd_inactive+0xd VOP_INACTIVE_APV+0x80 vinactive+0xf0 > vputx+0x2c3 zfs_unlinked_drain+0x1b8 zfsvfs_setup+0x5e zfs_mount+0x5f5 > vfs_domount+0x573 > > It looks like deadlock for me. > > What can I do to resolve this? FreeBSD 11.3-STABLE/amd64 r354667. "zfs mount" just completed successfully after trim -f /dev/da2 (3.5TB) finished in 76 minutes. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
ZFS deadlock
Hi! I have RAIDZ1 with five GELI-encrypted SSDs da[2-6].eli (non-boot pool). I've exported the pool, destroyed da2.eli then successfully imported pool back in degraded state. Then I've mounted some file systems successfully but zfs mount for next one hung on [tx->tx_sync_done_cv] for 4400 seconds and counting. # procstat -kk -L 55464 PIDTID COMMTDNAME KSTACK 55464 102422 zfs - mi_switch+0xeb sleepq_wait+0x2c _cv_wait+0x16e txg_wait_synced+0xa5 dmu_tx_assign+0x48 zfs_rmnode+0x122 zfs_freebsd_reclaim+0x4e VOP_RECLAIM_APV+0x80 vgonel+0x213 vrecycle+0x46 zfs_freebsd_inactive+0xd VOP_INACTIVE_APV+0x80 vinactive+0xf0 vputx+0x2c3 zfs_unlinked_drain+0x1b8 zfsvfs_setup+0x5e zfs_mount+0x5f5 vfs_domount+0x573 It looks like deadlock for me. What can I do to resolve this? FreeBSD 11.3-STABLE/amd64 r354667. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 11/18/2016 13:30, Andriy Gapon wrote: On 14/11/2016 14:00, Henri Hennebert wrote: On 11/14/2016 12:45, Andriy Gapon wrote: Okay. Luckily for us, it seems that 'm' is available in frame 5. It also happens to be the first field of 'struct faultstate'. So, could you please go to frame and print '*m' and '*(struct faultstate *)m' ? (kgdb) fr 4 #4 0x8089d1c1 in vm_page_busy_sleep (m=0xf800df68cd40, wmesg=) at /usr/src/sys/vm/vm_page.c:753 753msleep(m, vm_page_lockptr(m), PVM | PDROP, wmesg, 0); (kgdb) print *m $1 = {plinks = {q = {tqe_next = 0xf800dc5d85b0, tqe_prev = 0xf800debf3bd0}, s = {ss = {sle_next = 0xf800dc5d85b0}, pv = 0xf800debf3bd0}, memguard = {p = 18446735281313646000, v = 18446735281353604048}}, listq = {tqe_next = 0x0, tqe_prev = 0xf800dc5d85c0}, object = 0xf800b62e9c60, pindex = 11, phys_addr = 3389358080, md = {pv_list = { tqh_first = 0x0, tqh_last = 0xf800df68cd78}, pv_gen = 426, pat_mode = 6}, wire_count = 0, busy_lock = 6, hold_count = 0, flags = 0, aflags = 2 '\002', oflags = 0 '\0', queue = 0 '\0', psind = 0 '\0', segind = 3 '\003', order = 13 '\r', pool = 0 '\0', act_count = 0 '\0', valid = 0 '\0', dirty = 0 '\0'} If I interpret this correctly the page is in the 'exclusive busy' state. Unfortunately, I can't tell much beyond that. But I am confident that this is the root cause of the lock-up. (kgdb) print *(struct faultstate *)m $2 = {m = 0xf800dc5d85b0, object = 0xf800debf3bd0, pindex = 0, first_m = 0xf800dc5d85c0, first_object = 0xf800b62e9c60, first_pindex = 11, map = 0xca058000, entry = 0x0, lookup_still_valid = -546779784, vp = 0x601aa} (kgdb) I was wrong on this one as 'm' is actually a pointer, so the above is not correct. Maybe 'info reg' in frame 5 would give a clue about the value of 'fs'. (kgdb) fr 5 #5 0x8089dd4d in vm_page_sleep_if_busy (m=0xf800df68cd40, msg=0x809c51bc "vmpfw") at /usr/src/sys/vm/vm_page.c:1086 1086vm_page_busy_sleep(m, msg); (kgdb) info reg rax0x0 0 rbx0xf800b62e9c78 -8793036514184 rcx0x0 0 rdx0x0 0 rsi0x0 0 rdi0x0 0 rbp0xfe0101836810 0xfe0101836810 rsp0xfe01018367e0 0xfe01018367e0 r8 0x0 0 r9 0x0 0 r100x0 0 r110x0 0 r120xf800b642aa00 -879303520 r130xf800df68cd40 -8792344834752 r140xf800b62e9c60 -8793036514208 r150x809c51bc -2137239108 rip0x8089dd4d 0x8089dd4deflags 0x0 0 cs 0x0 0 ss 0x0 0 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 I don't know what to do from here. I am not sure how to proceed from here. The only thing I can think of is a lock order reversal between the vnode lock and the page busying quasi-lock. But examining the code I can not spot it. Another possibility is a leak of a busy page, but that's hard to debug. How hard is it to reproduce the problem? After 7 days all seems normal only one copy of innd: [root@avoriaz ~]# ps xa|grep inn 1193 - Is 0:01.40 /usr/local/news/bin/innd -r 13498 - IN 0:00.01 /usr/local/news/bin/innfeed 1194 v0- IW 0:00.00 /bin/sh /usr/local/news/bin/innwatch -i 60 I will try to stop and restart innd. All continue to look good: [root@avoriaz ~]# ps xa|grep inn 31673 - Ss 0:00.02 /usr/local/news/bin/innd 31694 - SN 0:00.01 /usr/local/news/bin/innfeed 31674 0 S0:00.01 /bin/sh /usr/local/news/bin/innwatch -i 60 I think to reproduce is just waiting it occurs by itself... One thing here: The deadlock occurs at least 5 times since 10.0R. And always with the directory /usr/local/news/bin Maybe Konstantin would have some ideas or suggestions. Henri ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 14/11/2016 14:00, Henri Hennebert wrote: > On 11/14/2016 12:45, Andriy Gapon wrote: >> Okay. Luckily for us, it seems that 'm' is available in frame 5. It also >> happens to be the first field of 'struct faultstate'. So, could you please >> go >> to frame and print '*m' and '*(struct faultstate *)m' ? >> > (kgdb) fr 4 > #4 0x8089d1c1 in vm_page_busy_sleep (m=0xf800df68cd40, > wmesg= optimized out>) at /usr/src/sys/vm/vm_page.c:753 > 753msleep(m, vm_page_lockptr(m), PVM | PDROP, wmesg, 0); > (kgdb) print *m > $1 = {plinks = {q = {tqe_next = 0xf800dc5d85b0, tqe_prev = > 0xf800debf3bd0}, s = {ss = {sle_next = 0xf800dc5d85b0}, > pv = 0xf800debf3bd0}, memguard = {p = 18446735281313646000, v = > 18446735281353604048}}, listq = {tqe_next = 0x0, > tqe_prev = 0xf800dc5d85c0}, object = 0xf800b62e9c60, pindex = 11, > phys_addr = 3389358080, md = {pv_list = { > tqh_first = 0x0, tqh_last = 0xf800df68cd78}, pv_gen = 426, pat_mode > = > 6}, wire_count = 0, busy_lock = 6, hold_count = 0, > flags = 0, aflags = 2 '\002', oflags = 0 '\0', queue = 0 '\0', psind = 0 > '\0', > segind = 3 '\003', order = 13 '\r', > pool = 0 '\0', act_count = 0 '\0', valid = 0 '\0', dirty = 0 '\0'} If I interpret this correctly the page is in the 'exclusive busy' state. Unfortunately, I can't tell much beyond that. But I am confident that this is the root cause of the lock-up. > (kgdb) print *(struct faultstate *)m > $2 = {m = 0xf800dc5d85b0, object = 0xf800debf3bd0, pindex = 0, > first_m = > 0xf800dc5d85c0, > first_object = 0xf800b62e9c60, first_pindex = 11, map = 0xca058000, > entry > = 0x0, lookup_still_valid = -546779784, > vp = 0x601aa} > (kgdb) I was wrong on this one as 'm' is actually a pointer, so the above is not correct. Maybe 'info reg' in frame 5 would give a clue about the value of 'fs'. I am not sure how to proceed from here. The only thing I can think of is a lock order reversal between the vnode lock and the page busying quasi-lock. But examining the code I can not spot it. Another possibility is a leak of a busy page, but that's hard to debug. How hard is it to reproduce the problem? Maybe Konstantin would have some ideas or suggestions. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 11/14/2016 12:45, Andriy Gapon wrote: On 14/11/2016 11:35, Henri Hennebert wrote: On 11/14/2016 10:07, Andriy Gapon wrote: Hmm, I've just noticed another interesting thread: Thread 668 (Thread 101245): #0 sched_switch (td=0xf800b642aa00, newtd=0xf8000285f000, flags=) at /usr/src/sys/kern/sched_ule.c:1973 #1 0x80561ae2 in mi_switch (flags=, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:455 #2 0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:646 #3 0x805614b1 in _sleep (ident=, lock=, priority=, wmesg=0x809c51bc "vmpfw", sbt=0, pr=, flags=) at /usr/src/sys/kern/kern_synch.c:229 #4 0x8089d1c1 in vm_page_busy_sleep (m=0xf800df68cd40, wmesg=) at /usr/src/sys/vm/vm_page.c:753 #5 0x8089dd4d in vm_page_sleep_if_busy (m=0xf800df68cd40, msg=0x809c51bc "vmpfw") at /usr/src/sys/vm/vm_page.c:1086 #6 0x80886be9 in vm_fault_hold (map=, vaddr=, fault_type=4 '\004', fault_flags=0, m_hold=0x0) at /usr/src/sys/vm/vm_fault.c:495 #7 0x80885448 in vm_fault (map=0xf80011d66000, vaddr=, fault_type=4 '\004', fault_flags=) at /usr/src/sys/vm/vm_fault.c:273 #8 0x808d3c49 in trap_pfault (frame=0xfe0101836c00, usermode=1) at /usr/src/sys/amd64/amd64/trap.c:741 #9 0x808d3386 in trap (frame=0xfe0101836c00) at /usr/src/sys/amd64/amd64/trap.c:333 #10 0x808b7af1 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236 This tread is another program from the news system: 668 Thread 101245 (PID=49124: innfeed) sched_switch (td=0xf800b642aa00, newtd=0xf8000285f000, flags=) at /usr/src/sys/kern/sched_ule.c:1973 I strongly suspect that this is thread that we were looking for. I think that it has the vnode lock in the shared mode while trying to fault in a page. --clip-- Okay. Luckily for us, it seems that 'm' is available in frame 5. It also happens to be the first field of 'struct faultstate'. So, could you please go to frame and print '*m' and '*(struct faultstate *)m' ? (kgdb) fr 4 #4 0x8089d1c1 in vm_page_busy_sleep (m=0xf800df68cd40, wmesg=) at /usr/src/sys/vm/vm_page.c:753 753 msleep(m, vm_page_lockptr(m), PVM | PDROP, wmesg, 0); (kgdb) print *m $1 = {plinks = {q = {tqe_next = 0xf800dc5d85b0, tqe_prev = 0xf800debf3bd0}, s = {ss = {sle_next = 0xf800dc5d85b0}, pv = 0xf800debf3bd0}, memguard = {p = 18446735281313646000, v = 18446735281353604048}}, listq = {tqe_next = 0x0, tqe_prev = 0xf800dc5d85c0}, object = 0xf800b62e9c60, pindex = 11, phys_addr = 3389358080, md = {pv_list = { tqh_first = 0x0, tqh_last = 0xf800df68cd78}, pv_gen = 426, pat_mode = 6}, wire_count = 0, busy_lock = 6, hold_count = 0, flags = 0, aflags = 2 '\002', oflags = 0 '\0', queue = 0 '\0', psind = 0 '\0', segind = 3 '\003', order = 13 '\r', pool = 0 '\0', act_count = 0 '\0', valid = 0 '\0', dirty = 0 '\0'} (kgdb) print *(struct faultstate *)m $2 = {m = 0xf800dc5d85b0, object = 0xf800debf3bd0, pindex = 0, first_m = 0xf800dc5d85c0, first_object = 0xf800b62e9c60, first_pindex = 11, map = 0xca058000, entry = 0x0, lookup_still_valid = -546779784, vp = 0x601aa} (kgdb) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 14/11/2016 11:35, Henri Hennebert wrote: > > > On 11/14/2016 10:07, Andriy Gapon wrote: >> Hmm, I've just noticed another interesting thread: >> Thread 668 (Thread 101245): >> #0 sched_switch (td=0xf800b642aa00, newtd=0xf8000285f000, >> flags=> optimized out>) at /usr/src/sys/kern/sched_ule.c:1973 >> #1 0x80561ae2 in mi_switch (flags=, newtd=0x0) >> at >> /usr/src/sys/kern/kern_synch.c:455 >> #2 0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at >> /usr/src/sys/kern/subr_sleepqueue.c:646 >> #3 0x805614b1 in _sleep (ident=, lock=> optimized out>, priority=, wmesg=0x809c51bc >> "vmpfw", sbt=0, pr=, flags=) at >> /usr/src/sys/kern/kern_synch.c:229 >> #4 0x8089d1c1 in vm_page_busy_sleep (m=0xf800df68cd40, >> wmesg=> optimized out>) at /usr/src/sys/vm/vm_page.c:753 >> #5 0x8089dd4d in vm_page_sleep_if_busy (m=0xf800df68cd40, >> msg=0x809c51bc "vmpfw") at /usr/src/sys/vm/vm_page.c:1086 >> #6 0x80886be9 in vm_fault_hold (map=, >> vaddr=> optimized out>, fault_type=4 '\004', fault_flags=0, m_hold=0x0) at >> /usr/src/sys/vm/vm_fault.c:495 >> #7 0x80885448 in vm_fault (map=0xf80011d66000, vaddr=> optimized out>, fault_type=4 '\004', fault_flags=) at >> /usr/src/sys/vm/vm_fault.c:273 >> #8 0x808d3c49 in trap_pfault (frame=0xfe0101836c00, usermode=1) >> at >> /usr/src/sys/amd64/amd64/trap.c:741 >> #9 0x808d3386 in trap (frame=0xfe0101836c00) at >> /usr/src/sys/amd64/amd64/trap.c:333 >> #10 0x808b7af1 in calltrap () at >> /usr/src/sys/amd64/amd64/exception.S:236 > > This tread is another program from the news system: > 668 Thread 101245 (PID=49124: innfeed) sched_switch (td=0xf800b642aa00, > newtd=0xf8000285f000, flags=) at > /usr/src/sys/kern/sched_ule.c:1973 > >> >> I strongly suspect that this is thread that we were looking for. >> I think that it has the vnode lock in the shared mode while trying to fault >> in a >> page. >> >> Could you please check that by going to frame 6 and printing 'fs' and >> '*fs.vp'? >> It'd be interesting to understand why this thread is waiting here. >> So, please also print '*fs.m' and '*fs.object'. > > No luck :-( > (kgdb) fr 6 > #6 0x80886be9 in vm_fault_hold (map=, > vaddr= optimized out>, fault_type=4 '\004', > fault_flags=0, m_hold=0x0) at /usr/src/sys/vm/vm_fault.c:495 > 495vm_page_sleep_if_busy(fs.m, "vmpfw"); > (kgdb) print fs > Cannot access memory at address 0x1fa0 > (kgdb) Okay. Luckily for us, it seems that 'm' is available in frame 5. It also happens to be the first field of 'struct faultstate'. So, could you please go to frame and print '*m' and '*(struct faultstate *)m' ? -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 11/14/2016 10:07, Andriy Gapon wrote: On 13/11/2016 15:28, Henri Hennebert wrote: On 11/13/2016 11:06, Andriy Gapon wrote: On 12/11/2016 14:40, Henri Hennebert wrote: [snip] Could you please show 'info local' in frame 14? I expected that 'nd' variable would be defined there and it may contain some useful information. No luck there: (kgdb) fr 14 #14 0x80636838 in kern_statat (td=0xf80009ba0500, flag=, fd=-100, path=0x0, pathseg=, sbp=, hook=0x800e2a388) at /usr/src/sys/kern/vfs_syscalls.c:2160 2160if ((error = namei()) != 0) (kgdb) info local rights = nd = error = sb = (kgdb) I also try to get information from the execve of the other treads: for tid 101250: (kgdb) fr 10 #10 0x80508ccc in sys_execve (td=0xf800b6429000, uap=0xfe010184fb80) at /usr/src/sys/kern/kern_exec.c:218 218error = kern_execve(td, , NULL); (kgdb) print *uap $4 = {fname_l_ = 0xfe010184fb80 "`\220\217\002\b", fname = 0x8028f9060 , fname_r_ = 0xfe010184fb88 "`¶ÿÿÿ\177", argv_l_ = 0xfe010184fb88 "`¶ÿÿÿ\177", argv = 0x7fffb660, argv_r_ = 0xfe010184fb90 "\bÜÿÿÿ\177", envv_l_ = 0xfe010184fb90 "\bÜÿÿÿ\177", envv = 0x7fffdc08, envv_r_ = 0xfe010184fb98 ""} (kgdb) for tid 101243: (kgdb) f 15 #15 0x80508ccc in sys_execve (td=0xf800b642b500, uap=0xfe010182cb80) at /usr/src/sys/kern/kern_exec.c:218 218error = kern_execve(td, , NULL); (kgdb) print *uap $5 = {fname_l_ = 0xfe010182cb80 "ÀÏ\205\002\b", fname = 0x80285cfc0 , fname_r_ = 0xfe010182cb88 "`¶ÿÿÿ\177", argv_l_ = 0xfe010182cb88 "`¶ÿÿÿ\177", argv = 0x7fffb660, argv_r_ = 0xfe010182cb90 "\bÜÿÿÿ\177", envv_l_ = 0xfe010182cb90 "\bÜÿÿÿ\177", envv = 0x7fffdc08, envv_r_ = 0xfe010182cb98 ""} (kgdb) I think that you see garbage in those structures because they contain pointers to userland data. Hmm, I've just noticed another interesting thread: Thread 668 (Thread 101245): #0 sched_switch (td=0xf800b642aa00, newtd=0xf8000285f000, flags=) at /usr/src/sys/kern/sched_ule.c:1973 #1 0x80561ae2 in mi_switch (flags=, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:455 #2 0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:646 #3 0x805614b1 in _sleep (ident=, lock=, priority=, wmesg=0x809c51bc "vmpfw", sbt=0, pr=, flags=) at /usr/src/sys/kern/kern_synch.c:229 #4 0x8089d1c1 in vm_page_busy_sleep (m=0xf800df68cd40, wmesg=) at /usr/src/sys/vm/vm_page.c:753 #5 0x8089dd4d in vm_page_sleep_if_busy (m=0xf800df68cd40, msg=0x809c51bc "vmpfw") at /usr/src/sys/vm/vm_page.c:1086 #6 0x80886be9 in vm_fault_hold (map=, vaddr=, fault_type=4 '\004', fault_flags=0, m_hold=0x0) at /usr/src/sys/vm/vm_fault.c:495 #7 0x80885448 in vm_fault (map=0xf80011d66000, vaddr=, fault_type=4 '\004', fault_flags=) at /usr/src/sys/vm/vm_fault.c:273 #8 0x808d3c49 in trap_pfault (frame=0xfe0101836c00, usermode=1) at /usr/src/sys/amd64/amd64/trap.c:741 #9 0x808d3386 in trap (frame=0xfe0101836c00) at /usr/src/sys/amd64/amd64/trap.c:333 #10 0x808b7af1 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236 This tread is another program from the news system: 668 Thread 101245 (PID=49124: innfeed) sched_switch (td=0xf800b642aa00, newtd=0xf8000285f000, flags=out>) at /usr/src/sys/kern/sched_ule.c:1973 I strongly suspect that this is thread that we were looking for. I think that it has the vnode lock in the shared mode while trying to fault in a page. Could you please check that by going to frame 6 and printing 'fs' and '*fs.vp'? It'd be interesting to understand why this thread is waiting here. So, please also print '*fs.m' and '*fs.object'. No luck :-( (kgdb) fr 6 #6 0x80886be9 in vm_fault_hold (map=, vaddr=, fault_type=4 '\004', fault_flags=0, m_hold=0x0) at /usr/src/sys/vm/vm_fault.c:495 495 vm_page_sleep_if_busy(fs.m, "vmpfw"); (kgdb) print fs Cannot access memory at address 0x1fa0 (kgdb) Henri ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 13/11/2016 15:28, Henri Hennebert wrote: > On 11/13/2016 11:06, Andriy Gapon wrote: >> On 12/11/2016 14:40, Henri Hennebert wrote: >>> I attatch it >> >> Thank you! >> So, these two threads are trying to get the lock in the exclusive mode: >> Thread 687 (Thread 101243): >> #0 sched_switch (td=0xf800b642b500, newtd=0xf8000285ea00, >> flags=> optimized out>) at /usr/src/sys/kern/sched_ule.c:1973 >> #1 0x80561ae2 in mi_switch (flags=, newtd=0x0) >> at >> /usr/src/sys/kern/kern_synch.c:455 >> #2 0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at >> /usr/src/sys/kern/subr_sleepqueue.c:646 >> #3 0x8052f854 in sleeplk (lk=, flags=> optimized out>, ilk=, wmesg=0x813be535 "zfs", >> pri=, timo=51) at /usr/src/sys/kern/kern_lock.c:222 >> #4 0x8052f39d in __lockmgr_args (lk=, >> flags=> optimized out>, ilk=, wmesg=, >> pri=, timo=, file=> out>, line=) at /usr/src/sys/kern/kern_lock.c:958 >> #5 0x80616a8c in vop_stdlock (ap=) at >> lockmgr.h:98 >> #6 0x8093784d in VOP_LOCK1_APV (vop=, a=> optimized out>) at vnode_if.c:2087 >> #7 0x8063c5b3 in _vn_lock (vp=, flags=548864, >> file=, line=) at vnode_if.h:859 >> #8 0x8062a5f7 in vget (vp=0xf80049c2c000, flags=548864, >> td=0xf800b642b500) at /usr/src/sys/kern/vfs_subr.c:2523 >> #9 0x806118b9 in cache_lookup (dvp=, vpp=> optimized out>, cnp=, tsp=, >> ticksp=) at /usr/src/sys/kern/vfs_cache.c:686 >> #10 0x806133dc in vfs_cache_lookup (ap=) at >> /usr/src/sys/kern/vfs_cache.c:1081 >> #11 0x80935777 in VOP_LOOKUP_APV (vop=, a=> optimized out>) at vnode_if.c:127 >> #12 0x8061cdf1 in lookup (ndp=) at vnode_if.h:54 >> #13 0x8061c492 in namei (ndp=) at >> /usr/src/sys/kern/vfs_lookup.c:306 >> #14 0x80509395 in kern_execve (td=, args=> optimized out>, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:443 >> #15 0x80508ccc in sys_execve (td=0xf800b642b500, >> uap=0xfe010182cb80) at /usr/src/sys/kern/kern_exec.c:218 >> #16 0x808d449e in amd64_syscall (td=, traced=0) >> at >> subr_syscall.c:135 >> #17 0x808b7ddb in Xfast_syscall () at >> /usr/src/sys/amd64/amd64/exception.S:396 >> >> Thread 681 (Thread 101147): >> #0 sched_switch (td=0xf80065f4e500, newtd=0xf8000285f000, >> flags=> optimized out>) at /usr/src/sys/kern/sched_ule.c:1973 >> #1 0x80561ae2 in mi_switch (flags=, newtd=0x0) >> at >> /usr/src/sys/kern/kern_synch.c:455 >> #2 0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at >> /usr/src/sys/kern/subr_sleepqueue.c:646 >> #3 0x8052f854 in sleeplk (lk=, flags=> optimized out>, ilk=, wmesg=0x813be535 "zfs", >> pri=, timo=51) at /usr/src/sys/kern/kern_lock.c:222 >> #4 0x8052f39d in __lockmgr_args (lk=, >> flags=> optimized out>, ilk=, wmesg=, >> pri=, timo=, file=> out>, line=) at /usr/src/sys/kern/kern_lock.c:958 >> #5 0x80616a8c in vop_stdlock (ap=) at >> lockmgr.h:98 >> #6 0x8093784d in VOP_LOCK1_APV (vop=, a=> optimized out>) at vnode_if.c:2087 >> #7 0x8063c5b3 in _vn_lock (vp=, flags=548864, >> file=, line=) at vnode_if.h:859 >> #8 0x8062a5f7 in vget (vp=0xf80049c2c000, flags=548864, >> td=0xf80065f4e500) at /usr/src/sys/kern/vfs_subr.c:2523 >> #9 0x806118b9 in cache_lookup (dvp=, vpp=> optimized out>, cnp=, tsp=, >> ticksp=) at /usr/src/sys/kern/vfs_cache.c:686 >> #10 0x806133dc in vfs_cache_lookup (ap=) at >> /usr/src/sys/kern/vfs_cache.c:1081 >> #11 0x80935777 in VOP_LOOKUP_APV (vop=, a=> optimized out>) at vnode_if.c:127 >> #12 0x8061cdf1 in lookup (ndp=) at vnode_if.h:54 >> #13 0x8061c492 in namei (ndp=) at >> /usr/src/sys/kern/vfs_lookup.c:306 >> #14 0x80509395 in kern_execve (td=, args=> optimized out>, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:443 >> #15 0x80508ccc in sys_execve (td=0xf80065f4e500, >> uap=0xfe01016b8b80) at /usr/src/sys/kern/kern_exec.c:218 >> #16 0x808d449e in amd64_syscall (td=, traced=0) >> at >> subr_syscall.c:135 >> #17 0x808b7ddb in Xfast_syscall () at >> /usr/src/sys/amd64/amd64/exception.S:396 > > This 2 threads are innd processes. In core.txt.4: > >8 14789 29165 0 24 4 40040 6612 zfs DN- 0:00.00 [innd] >8 29165 1 0 20 0 42496 6888 select Ds- 0:01.33 [innd] >8 49778 29165 0 24 4 40040 6900 zfs DN- 0:00.00 [innd] >8 82034 29165 0 24 4 132 0 zfs DN- 0:00.00 [innd] > > the corresponding info treads are: > > 687 Thread 101243 (PID=49778: innd) sched_switch (td=0xf800b642b500, > newtd=0xf8000285ea00, flags=) at > /usr/src/sys/kern/sched_ule.c:1973 > 681 Thread 101147 (PID=14789: innd) sched_switch (td=0xf80065f4e500, > newtd=0xf8000285f000, flags=) at > /usr/src/sys/kern/sched_ule.c:1973 > 669 Thread 101250 (PID=82034: innd) sched_switch (td=0xf800b6429000, >
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 11/13/2016 14:28, Henri Hennebert wrote: This 2 threads are innd processes. In core.txt.4: 8 14789 29165 0 24 4 40040 6612 zfs DN- 0:00.00 [innd] 8 29165 1 0 20 0 42496 6888 select Ds- 0:01.33 [innd] 8 49778 29165 0 24 4 40040 6900 zfs DN- 0:00.00 [innd] 8 82034 29165 0 24 4 132 0 zfs DN- 0:00.00 [innd] the corresponding info treads are: 687 Thread 101243 (PID=49778: innd) sched_switch (td=0xf800b642b500, newtd=0xf8000285ea00, flags=) at /usr/src/sys/kern/sched_ule.c:1973 681 Thread 101147 (PID=14789: innd) sched_switch (td=0xf80065f4e500, newtd=0xf8000285f000, flags=) at /usr/src/sys/kern/sched_ule.c:1973 669 Thread 101250 (PID=82034: innd) sched_switch (td=0xf800b6429000, newtd=0xf8000285ea00, flags=) at /usr/src/sys/kern/sched_ule.c:1973 665 Thread 101262 (PID=29165: innd) sched_switch (td=0xf800b6b54a00, newtd=0xf8000285ea00, flags=) at /usr/src/sys/kern/sched_ule.c:1973 In case it may help, I have a look at innd. This processes use 2 execv: one to execute /bin/sh and the other to execute itself: /* ** Re-exec ourselves. */ static const char * CCxexec(char *av[]) { char*innd; char*p; int i; if (CCargv == NULL) return "1 no argv!"; innd = concatpath(innconf->pathbin, "innd"); /* Get the pathname. */ p = av[0]; if (*p == '\0' || strcmp(p, "innd") == 0) CCargv[0] = innd; else return "1 Bad value"; #ifdef DO_PERL PLmode(Mode, OMshutdown, av[0]); #endif #ifdef DO_PYTHON PYmode(Mode, OMshutdown, av[0]); #endif JustCleanup(); syslog(L_NOTICE, "%s execv %s", LogName, CCargv[0]); /* Close all fds to protect possible fd leaking accross successive innds. */ for (i=3; i<30; i++) close(i); execv(CCargv[0], CCargv); syslog(L_FATAL, "%s cant execv %s %m", LogName, CCargv[0]); _exit(1); /* NOTREACHED */ return "1 Exit failed"; } The culprit may be /usr/local/news/bin/innd, remember that find is locked in /usr/local/news/bin Henri ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 11/13/2016 11:06, Andriy Gapon wrote: On 12/11/2016 14:40, Henri Hennebert wrote: I attatch it Thank you! So, these two threads are trying to get the lock in the exclusive mode: Thread 687 (Thread 101243): #0 sched_switch (td=0xf800b642b500, newtd=0xf8000285ea00, flags=) at /usr/src/sys/kern/sched_ule.c:1973 #1 0x80561ae2 in mi_switch (flags=, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:455 #2 0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:646 #3 0x8052f854 in sleeplk (lk=, flags=, ilk=, wmesg=0x813be535 "zfs", pri=, timo=51) at /usr/src/sys/kern/kern_lock.c:222 #4 0x8052f39d in __lockmgr_args (lk=, flags=, ilk=, wmesg=, pri=, timo=, file=, line=) at /usr/src/sys/kern/kern_lock.c:958 #5 0x80616a8c in vop_stdlock (ap=) at lockmgr.h:98 #6 0x8093784d in VOP_LOCK1_APV (vop=, a=) at vnode_if.c:2087 #7 0x8063c5b3 in _vn_lock (vp=, flags=548864, file=, line=) at vnode_if.h:859 #8 0x8062a5f7 in vget (vp=0xf80049c2c000, flags=548864, td=0xf800b642b500) at /usr/src/sys/kern/vfs_subr.c:2523 #9 0x806118b9 in cache_lookup (dvp=, vpp=, cnp=, tsp=, ticksp=) at /usr/src/sys/kern/vfs_cache.c:686 #10 0x806133dc in vfs_cache_lookup (ap=) at /usr/src/sys/kern/vfs_cache.c:1081 #11 0x80935777 in VOP_LOOKUP_APV (vop=, a=) at vnode_if.c:127 #12 0x8061cdf1 in lookup (ndp=) at vnode_if.h:54 #13 0x8061c492 in namei (ndp=) at /usr/src/sys/kern/vfs_lookup.c:306 #14 0x80509395 in kern_execve (td=, args=, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:443 #15 0x80508ccc in sys_execve (td=0xf800b642b500, uap=0xfe010182cb80) at /usr/src/sys/kern/kern_exec.c:218 #16 0x808d449e in amd64_syscall (td=, traced=0) at subr_syscall.c:135 #17 0x808b7ddb in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396 Thread 681 (Thread 101147): #0 sched_switch (td=0xf80065f4e500, newtd=0xf8000285f000, flags=) at /usr/src/sys/kern/sched_ule.c:1973 #1 0x80561ae2 in mi_switch (flags=, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:455 #2 0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:646 #3 0x8052f854 in sleeplk (lk=, flags=, ilk=, wmesg=0x813be535 "zfs", pri=, timo=51) at /usr/src/sys/kern/kern_lock.c:222 #4 0x8052f39d in __lockmgr_args (lk=, flags=, ilk=, wmesg=, pri=, timo=, file=, line=) at /usr/src/sys/kern/kern_lock.c:958 #5 0x80616a8c in vop_stdlock (ap=) at lockmgr.h:98 #6 0x8093784d in VOP_LOCK1_APV (vop=, a=) at vnode_if.c:2087 #7 0x8063c5b3 in _vn_lock (vp=, flags=548864, file=, line=) at vnode_if.h:859 #8 0x8062a5f7 in vget (vp=0xf80049c2c000, flags=548864, td=0xf80065f4e500) at /usr/src/sys/kern/vfs_subr.c:2523 #9 0x806118b9 in cache_lookup (dvp=, vpp=, cnp=, tsp=, ticksp=) at /usr/src/sys/kern/vfs_cache.c:686 #10 0x806133dc in vfs_cache_lookup (ap=) at /usr/src/sys/kern/vfs_cache.c:1081 #11 0x80935777 in VOP_LOOKUP_APV (vop=, a=) at vnode_if.c:127 #12 0x8061cdf1 in lookup (ndp=) at vnode_if.h:54 #13 0x8061c492 in namei (ndp=) at /usr/src/sys/kern/vfs_lookup.c:306 #14 0x80509395 in kern_execve (td=, args=, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:443 #15 0x80508ccc in sys_execve (td=0xf80065f4e500, uap=0xfe01016b8b80) at /usr/src/sys/kern/kern_exec.c:218 #16 0x808d449e in amd64_syscall (td=, traced=0) at subr_syscall.c:135 #17 0x808b7ddb in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396 This 2 threads are innd processes. In core.txt.4: 8 14789 29165 0 24 4 40040 6612 zfs DN- 0:00.00 [innd] 8 29165 1 0 20 0 42496 6888 select Ds- 0:01.33 [innd] 8 49778 29165 0 24 4 40040 6900 zfs DN- 0:00.00 [innd] 8 82034 29165 0 24 4 132 0 zfs DN- 0:00.00 [innd] the corresponding info treads are: 687 Thread 101243 (PID=49778: innd) sched_switch (td=0xf800b642b500, newtd=0xf8000285ea00, flags=out>) at /usr/src/sys/kern/sched_ule.c:1973 681 Thread 101147 (PID=14789: innd) sched_switch (td=0xf80065f4e500, newtd=0xf8000285f000, flags=out>) at /usr/src/sys/kern/sched_ule.c:1973 669 Thread 101250 (PID=82034: innd) sched_switch (td=0xf800b6429000, newtd=0xf8000285ea00, flags=out>) at /usr/src/sys/kern/sched_ule.c:1973 665 Thread 101262 (PID=29165: innd) sched_switch (td=0xf800b6b54a00, newtd=0xf8000285ea00, flags=out>) at /usr/src/sys/kern/sched_ule.c:1973 So your missing tread must be 101250: (kgdb) tid 101250 [Switching to thread 669 (Thread 101250)]#0 sched_switch (td=0xf800b6429000, newtd=0xf8000285ea00, flags=) at /usr/src/sys/kern/sched_ule.c:1973 1973cpuid = PCPU_GET(cpuid); Current language: auto; currently minimal (kgdb) bt
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 12/11/2016 14:40, Henri Hennebert wrote: > I attatch it Thank you! So, these two threads are trying to get the lock in the exclusive mode: Thread 687 (Thread 101243): #0 sched_switch (td=0xf800b642b500, newtd=0xf8000285ea00, flags=) at /usr/src/sys/kern/sched_ule.c:1973 #1 0x80561ae2 in mi_switch (flags=, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:455 #2 0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:646 #3 0x8052f854 in sleeplk (lk=, flags=, ilk=, wmesg=0x813be535 "zfs", pri=, timo=51) at /usr/src/sys/kern/kern_lock.c:222 #4 0x8052f39d in __lockmgr_args (lk=, flags=, ilk=, wmesg=, pri=, timo=, file=, line=) at /usr/src/sys/kern/kern_lock.c:958 #5 0x80616a8c in vop_stdlock (ap=) at lockmgr.h:98 #6 0x8093784d in VOP_LOCK1_APV (vop=, a=) at vnode_if.c:2087 #7 0x8063c5b3 in _vn_lock (vp=, flags=548864, file=, line=) at vnode_if.h:859 #8 0x8062a5f7 in vget (vp=0xf80049c2c000, flags=548864, td=0xf800b642b500) at /usr/src/sys/kern/vfs_subr.c:2523 #9 0x806118b9 in cache_lookup (dvp=, vpp=, cnp=, tsp=, ticksp=) at /usr/src/sys/kern/vfs_cache.c:686 #10 0x806133dc in vfs_cache_lookup (ap=) at /usr/src/sys/kern/vfs_cache.c:1081 #11 0x80935777 in VOP_LOOKUP_APV (vop=, a=) at vnode_if.c:127 #12 0x8061cdf1 in lookup (ndp=) at vnode_if.h:54 #13 0x8061c492 in namei (ndp=) at /usr/src/sys/kern/vfs_lookup.c:306 #14 0x80509395 in kern_execve (td=, args=, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:443 #15 0x80508ccc in sys_execve (td=0xf800b642b500, uap=0xfe010182cb80) at /usr/src/sys/kern/kern_exec.c:218 #16 0x808d449e in amd64_syscall (td=, traced=0) at subr_syscall.c:135 #17 0x808b7ddb in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396 Thread 681 (Thread 101147): #0 sched_switch (td=0xf80065f4e500, newtd=0xf8000285f000, flags=) at /usr/src/sys/kern/sched_ule.c:1973 #1 0x80561ae2 in mi_switch (flags=, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:455 #2 0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:646 #3 0x8052f854 in sleeplk (lk=, flags=, ilk=, wmesg=0x813be535 "zfs", pri=, timo=51) at /usr/src/sys/kern/kern_lock.c:222 #4 0x8052f39d in __lockmgr_args (lk=, flags=, ilk=, wmesg=, pri=, timo=, file=, line=) at /usr/src/sys/kern/kern_lock.c:958 #5 0x80616a8c in vop_stdlock (ap=) at lockmgr.h:98 #6 0x8093784d in VOP_LOCK1_APV (vop=, a=) at vnode_if.c:2087 #7 0x8063c5b3 in _vn_lock (vp=, flags=548864, file=, line=) at vnode_if.h:859 #8 0x8062a5f7 in vget (vp=0xf80049c2c000, flags=548864, td=0xf80065f4e500) at /usr/src/sys/kern/vfs_subr.c:2523 #9 0x806118b9 in cache_lookup (dvp=, vpp=, cnp=, tsp=, ticksp=) at /usr/src/sys/kern/vfs_cache.c:686 #10 0x806133dc in vfs_cache_lookup (ap=) at /usr/src/sys/kern/vfs_cache.c:1081 #11 0x80935777 in VOP_LOOKUP_APV (vop=, a=) at vnode_if.c:127 #12 0x8061cdf1 in lookup (ndp=) at vnode_if.h:54 #13 0x8061c492 in namei (ndp=) at /usr/src/sys/kern/vfs_lookup.c:306 #14 0x80509395 in kern_execve (td=, args=, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:443 #15 0x80508ccc in sys_execve (td=0xf80065f4e500, uap=0xfe01016b8b80) at /usr/src/sys/kern/kern_exec.c:218 #16 0x808d449e in amd64_syscall (td=, traced=0) at subr_syscall.c:135 #17 0x808b7ddb in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396 And the original stuck thread wants to get the lock in the shared mode. And there should be another thread that already holds the lock in the shared mode. But I am not able to identify it. I wonder if the original thread could be trying to get the lock recursively... It would be interesting to get more details from thread 101112. You can switch to it using tid command, you can use 'fr' to select frames, 'info local' and 'info args' to see what variables are available (not optimized out) and the you can print any that look interesting. It would be nice to get a file path and a directory vnode where the lookup is called. Thank you. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 11/11/2016 16:50, Henri Hennebert wrote: > > > On 11/11/2016 12:24, Andriy Gapon wrote: >> >> At this stage I would try to get a system crash dump for post-mortem >> analysis. >> There are a few way to do that. You can enter ddb and then run 'dump' and >> 'reset' commands. Or you can just do `sysctl debug.kdb.panic=1`. >> In either case, please double-check that your system has a dump device >> configured. >> > It take some time to upload the dump... > > You can find it at > > http://tignes.restart.be/Xfer/ Could you please open the dump in kgdb and execute the following commands? set logging on set logging redirect on set pagination off thread apply all bt quit After that you should get gdb.txt file in the current directory. I would like to see it. Thank you. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 11/11/2016 12:24, Andriy Gapon wrote: At this stage I would try to get a system crash dump for post-mortem analysis. There are a few way to do that. You can enter ddb and then run 'dump' and 'reset' commands. Or you can just do `sysctl debug.kdb.panic=1`. In either case, please double-check that your system has a dump device configured. It take some time to upload the dump... You can find it at http://tignes.restart.be/Xfer/ Henri ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 10/11/2016 21:41, Henri Hennebert wrote: > On 11/10/2016 19:40, Andriy Gapon wrote: >> On 10/11/2016 19:55, Henri Hennebert wrote: >>> >>> >>> On 11/10/2016 18:33, Andriy Gapon wrote: On 10/11/2016 18:12, Henri Hennebert wrote: > On 11/10/2016 16:54, Andriy Gapon wrote: >> On 10/11/2016 17:20, Henri Hennebert wrote: >>> On 11/10/2016 15:00, Andriy Gapon wrote: Interesting. I can not spot any suspicious thread that would hold the vnode lock. Could you please run kgdb (just like that, no arguments), then execute 'bt' command and then select a frame when _vn_lock is called with 'fr N' command. Then please 'print *vp' and share the result. >>> I Think I miss something in your request: >> >> Oh, sorry! The very first step should be 'tid 101112' to switch to the >> correct >> context. >> > > (kgdb) fr 7 > #7 0x8063c5b3 in _vn_lock (vp=, > flags=2121728, "value optimized out" - not good > file=, > line=) at vnode_if.h:859 > 859vnode_if.h: No such file or directory. > in vnode_if.h > (kgdb) print *vp I am not sure if this output is valid, because of the message above. Could you please try to navigate to nearby frames and see if vp itself has a valid value there. If you can find such a frame please do *vp there. >>> >>> Does this seems better? >> >> Yes! >> >>> (kgdb) fr 8 >>> #8 0x8062a5f7 in vget (vp=0xf80049c2c000, flags=2121728, >>> td=0xf80009ba0500) at /usr/src/sys/kern/vfs_subr.c:2523 >>> 2523if ((error = vn_lock(vp, flags)) != 0) { >>> (kgdb) print *vp >>> $1 = {v_tag = 0x813be535 "zfs", v_op = 0x813d0f70, v_data = >>> 0xf80049c1f420, v_mount = 0xf800093aa660, >>> v_nmntvnodes = {tqe_next = 0xf80049c2c938, tqe_prev = >>> 0xf80049c2bb30}, >>> v_un = {vu_mount = 0x0, vu_socket = 0x0, >>> vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, le_prev >>> = >>> 0x0}, v_cache_src = {lh_first = 0x0}, v_cache_dst = { >>> tqh_first = 0xf800bfc8e3f0, tqh_last = 0xf800bfc8e410}, >>> v_cache_dd = >>> 0x0, v_lock = {lock_object = { >>> lo_name = 0x813be535 "zfs", lo_flags = 117112832, lo_data = 0, >>> lo_witness = 0x0}, lk_lock = 23, lk_exslpfail = 0, >>> lk_timo = 51, lk_pri = 96}, v_interlock = {lock_object = {lo_name = >>> 0x8099e9e0 "vnode interlock", lo_flags = 16973824, >>> lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, v_vnlock = >>> 0xf80049c2c068, v_actfreelist = { >>> tqe_next = 0xf80049c2c938, tqe_prev = 0xf80049ae9bd0}, v_bufobj >>> = >>> {bo_lock = {lock_object = { >>> lo_name = 0x8099e9f0 "bufobj interlock", lo_flags = >>> 86179840, >>> lo_data = 0, lo_witness = 0x0}, rw_lock = 1}, >>> bo_ops = 0x80c4bf70, bo_object = 0xf800b62e9c60, >>> bo_synclist = >>> {le_next = 0x0, le_prev = 0x0}, >>> bo_private = 0xf80049c2c000, __bo_vnode = 0xf80049c2c000, >>> bo_clean = >>> {bv_hd = {tqh_first = 0x0, >>> tqh_last = 0xf80049c2c120}, bv_root = {pt_root = 0}, bv_cnt = >>> 0}, >>> bo_dirty = {bv_hd = {tqh_first = 0x0, >>> tqh_last = 0xf80049c2c140}, bv_root = {pt_root = 0}, bv_cnt = >>> 0}, >>> bo_numoutput = 0, bo_flag = 0, bo_bsize = 131072}, >>> v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters = >>> {tqh_first = 0x0, tqh_last = 0xf80049c2c188}, >>> rl_currdep = 0x0}, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0, >>> v_holdcnt = 9, v_usecount = 6, v_iflag = 512, >>> v_vflag = 32, v_writecount = 0, v_hash = 4833984, v_type = VREG} >>> (kgdb) >> >> flags=2121728 = 0x206000 = LK_SHARED | LK_VNHELD | LK_NODDLKTREAT >> lk_lock = 23 = 0x17 = LK_ONE_SHARER | LK_EXCLUSIVE_WAITERS | >> LK_SHARED_WAITERS | >> LK_SHARE >> >> So, here's what we have here: this thread tries to get a shared lock on the >> vnode, the vnode is already locked in shared mode, but there is an exclusive >> waiter (or, perhaps, multiple waiters). So, this thread can not get the lock >> because of the exclusive waiter. And I do not see an easy way to identify >> that >> waiter. >> >> In the procstat output that you provided earlier there was no other thread in >> vn_lock. Hmm, I see this: >> procstat: sysctl: kern.proc.kstack: 14789: Device busy >> procstat: sysctl: kern.proc.kstack: 82034: Device busy >> >> Could you please check what those two processes are (if they are still >> running)? >> Perhaps try procstat for each of the pids several times. >> At this stage I would try to get a system crash dump for post-mortem analysis. There are a few way to do that. You can enter ddb and then run 'dump' and 'reset' commands. Or you can just do `sysctl debug.kdb.panic=1`. In either case, please double-check that your system has a dump device configured. > This 2 processes are the 2
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 11/10/2016 19:40, Andriy Gapon wrote: On 10/11/2016 19:55, Henri Hennebert wrote: On 11/10/2016 18:33, Andriy Gapon wrote: On 10/11/2016 18:12, Henri Hennebert wrote: On 11/10/2016 16:54, Andriy Gapon wrote: On 10/11/2016 17:20, Henri Hennebert wrote: On 11/10/2016 15:00, Andriy Gapon wrote: Interesting. I can not spot any suspicious thread that would hold the vnode lock. Could you please run kgdb (just like that, no arguments), then execute 'bt' command and then select a frame when _vn_lock is called with 'fr N' command. Then please 'print *vp' and share the result. I Think I miss something in your request: Oh, sorry! The very first step should be 'tid 101112' to switch to the correct context. (kgdb) fr 7 #7 0x8063c5b3 in _vn_lock (vp=, flags=2121728, "value optimized out" - not good file=, line=) at vnode_if.h:859 859vnode_if.h: No such file or directory. in vnode_if.h (kgdb) print *vp I am not sure if this output is valid, because of the message above. Could you please try to navigate to nearby frames and see if vp itself has a valid value there. If you can find such a frame please do *vp there. Does this seems better? Yes! (kgdb) fr 8 #8 0x8062a5f7 in vget (vp=0xf80049c2c000, flags=2121728, td=0xf80009ba0500) at /usr/src/sys/kern/vfs_subr.c:2523 2523if ((error = vn_lock(vp, flags)) != 0) { (kgdb) print *vp $1 = {v_tag = 0x813be535 "zfs", v_op = 0x813d0f70, v_data = 0xf80049c1f420, v_mount = 0xf800093aa660, v_nmntvnodes = {tqe_next = 0xf80049c2c938, tqe_prev = 0xf80049c2bb30}, v_un = {vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, le_prev = 0x0}, v_cache_src = {lh_first = 0x0}, v_cache_dst = { tqh_first = 0xf800bfc8e3f0, tqh_last = 0xf800bfc8e410}, v_cache_dd = 0x0, v_lock = {lock_object = { lo_name = 0x813be535 "zfs", lo_flags = 117112832, lo_data = 0, lo_witness = 0x0}, lk_lock = 23, lk_exslpfail = 0, lk_timo = 51, lk_pri = 96}, v_interlock = {lock_object = {lo_name = 0x8099e9e0 "vnode interlock", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, v_vnlock = 0xf80049c2c068, v_actfreelist = { tqe_next = 0xf80049c2c938, tqe_prev = 0xf80049ae9bd0}, v_bufobj = {bo_lock = {lock_object = { lo_name = 0x8099e9f0 "bufobj interlock", lo_flags = 86179840, lo_data = 0, lo_witness = 0x0}, rw_lock = 1}, bo_ops = 0x80c4bf70, bo_object = 0xf800b62e9c60, bo_synclist = {le_next = 0x0, le_prev = 0x0}, bo_private = 0xf80049c2c000, __bo_vnode = 0xf80049c2c000, bo_clean = {bv_hd = {tqh_first = 0x0, tqh_last = 0xf80049c2c120}, bv_root = {pt_root = 0}, bv_cnt = 0}, bo_dirty = {bv_hd = {tqh_first = 0x0, tqh_last = 0xf80049c2c140}, bv_root = {pt_root = 0}, bv_cnt = 0}, bo_numoutput = 0, bo_flag = 0, bo_bsize = 131072}, v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters = {tqh_first = 0x0, tqh_last = 0xf80049c2c188}, rl_currdep = 0x0}, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0, v_holdcnt = 9, v_usecount = 6, v_iflag = 512, v_vflag = 32, v_writecount = 0, v_hash = 4833984, v_type = VREG} (kgdb) flags=2121728 = 0x206000 = LK_SHARED | LK_VNHELD | LK_NODDLKTREAT lk_lock = 23 = 0x17 = LK_ONE_SHARER | LK_EXCLUSIVE_WAITERS | LK_SHARED_WAITERS | LK_SHARE So, here's what we have here: this thread tries to get a shared lock on the vnode, the vnode is already locked in shared mode, but there is an exclusive waiter (or, perhaps, multiple waiters). So, this thread can not get the lock because of the exclusive waiter. And I do not see an easy way to identify that waiter. In the procstat output that you provided earlier there was no other thread in vn_lock. Hmm, I see this: procstat: sysctl: kern.proc.kstack: 14789: Device busy procstat: sysctl: kern.proc.kstack: 82034: Device busy Could you please check what those two processes are (if they are still running)? Perhaps try procstat for each of the pids several times. This 2 processes are the 2 instances of the innd daemon (news server) which seems in accordance with the directory /usr/local/news/bin. [root@avoriaz ~]# procstat 14789 PID PPID PGID SID TSID THR LOGINWCHAN EMUL COMM 14789 29165 29165 29165 0 1 root zfs FreeBSD ELF64 innd [root@avoriaz ~]# procstat 82034 PID PPID PGID SID TSID THR LOGINWCHAN EMUL COMM 82034 29165 29165 29165 0 1 root zfs FreeBSD ELF64 innd [root@avoriaz ~]# procstat -f 14789 procstat: kinfo_getfile(): Device busy PID COMMFD T V FLAGSREF OFFSET PRO NAME [root@avoriaz ~]# procstat -f 14789 procstat: kinfo_getfile(): Device busy PID COMMFD T V FLAGSREF OFFSET PRO NAME [root@avoriaz ~]# procstat -f 14789 procstat: kinfo_getfile(): Device busy PID COMMFD T V FLAGS
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 10/11/2016 19:55, Henri Hennebert wrote: > > > On 11/10/2016 18:33, Andriy Gapon wrote: >> On 10/11/2016 18:12, Henri Hennebert wrote: >>> On 11/10/2016 16:54, Andriy Gapon wrote: On 10/11/2016 17:20, Henri Hennebert wrote: > On 11/10/2016 15:00, Andriy Gapon wrote: >> Interesting. I can not spot any suspicious thread that would hold the >> vnode >> lock. Could you please run kgdb (just like that, no arguments), then >> execute >> 'bt' command and then select a frame when _vn_lock is called with 'fr N' >> command. Then please 'print *vp' and share the result. >> > I Think I miss something in your request: Oh, sorry! The very first step should be 'tid 101112' to switch to the correct context. >>> >>> (kgdb) fr 7 >>> #7 0x8063c5b3 in _vn_lock (vp=, flags=2121728, >> >> "value optimized out" - not good >> >>> file=, >>> line=) at vnode_if.h:859 >>> 859vnode_if.h: No such file or directory. >>> in vnode_if.h >>> (kgdb) print *vp >> >> I am not sure if this output is valid, because of the message above. >> Could you please try to navigate to nearby frames and see if vp itself has a >> valid value there. If you can find such a frame please do *vp there. >> > > Does this seems better? Yes! > (kgdb) fr 8 > #8 0x8062a5f7 in vget (vp=0xf80049c2c000, flags=2121728, > td=0xf80009ba0500) at /usr/src/sys/kern/vfs_subr.c:2523 > 2523if ((error = vn_lock(vp, flags)) != 0) { > (kgdb) print *vp > $1 = {v_tag = 0x813be535 "zfs", v_op = 0x813d0f70, v_data = > 0xf80049c1f420, v_mount = 0xf800093aa660, > v_nmntvnodes = {tqe_next = 0xf80049c2c938, tqe_prev = > 0xf80049c2bb30}, > v_un = {vu_mount = 0x0, vu_socket = 0x0, > vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, le_prev = > 0x0}, v_cache_src = {lh_first = 0x0}, v_cache_dst = { > tqh_first = 0xf800bfc8e3f0, tqh_last = 0xf800bfc8e410}, > v_cache_dd = > 0x0, v_lock = {lock_object = { > lo_name = 0x813be535 "zfs", lo_flags = 117112832, lo_data = 0, > lo_witness = 0x0}, lk_lock = 23, lk_exslpfail = 0, > lk_timo = 51, lk_pri = 96}, v_interlock = {lock_object = {lo_name = > 0x8099e9e0 "vnode interlock", lo_flags = 16973824, > lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, v_vnlock = > 0xf80049c2c068, v_actfreelist = { > tqe_next = 0xf80049c2c938, tqe_prev = 0xf80049ae9bd0}, v_bufobj = > {bo_lock = {lock_object = { > lo_name = 0x8099e9f0 "bufobj interlock", lo_flags = 86179840, > lo_data = 0, lo_witness = 0x0}, rw_lock = 1}, > bo_ops = 0x80c4bf70, bo_object = 0xf800b62e9c60, bo_synclist = > {le_next = 0x0, le_prev = 0x0}, > bo_private = 0xf80049c2c000, __bo_vnode = 0xf80049c2c000, > bo_clean = > {bv_hd = {tqh_first = 0x0, > tqh_last = 0xf80049c2c120}, bv_root = {pt_root = 0}, bv_cnt = 0}, > bo_dirty = {bv_hd = {tqh_first = 0x0, > tqh_last = 0xf80049c2c140}, bv_root = {pt_root = 0}, bv_cnt = 0}, > bo_numoutput = 0, bo_flag = 0, bo_bsize = 131072}, > v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters = > {tqh_first = 0x0, tqh_last = 0xf80049c2c188}, > rl_currdep = 0x0}, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0, > v_holdcnt = 9, v_usecount = 6, v_iflag = 512, > v_vflag = 32, v_writecount = 0, v_hash = 4833984, v_type = VREG} > (kgdb) flags=2121728 = 0x206000 = LK_SHARED | LK_VNHELD | LK_NODDLKTREAT lk_lock = 23 = 0x17 = LK_ONE_SHARER | LK_EXCLUSIVE_WAITERS | LK_SHARED_WAITERS | LK_SHARE So, here's what we have here: this thread tries to get a shared lock on the vnode, the vnode is already locked in shared mode, but there is an exclusive waiter (or, perhaps, multiple waiters). So, this thread can not get the lock because of the exclusive waiter. And I do not see an easy way to identify that waiter. In the procstat output that you provided earlier there was no other thread in vn_lock. Hmm, I see this: procstat: sysctl: kern.proc.kstack: 14789: Device busy procstat: sysctl: kern.proc.kstack: 82034: Device busy Could you please check what those two processes are (if they are still running)? Perhaps try procstat for each of the pids several times. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 11/10/2016 18:33, Andriy Gapon wrote: On 10/11/2016 18:12, Henri Hennebert wrote: On 11/10/2016 16:54, Andriy Gapon wrote: On 10/11/2016 17:20, Henri Hennebert wrote: On 11/10/2016 15:00, Andriy Gapon wrote: Interesting. I can not spot any suspicious thread that would hold the vnode lock. Could you please run kgdb (just like that, no arguments), then execute 'bt' command and then select a frame when _vn_lock is called with 'fr N' command. Then please 'print *vp' and share the result. I Think I miss something in your request: Oh, sorry! The very first step should be 'tid 101112' to switch to the correct context. (kgdb) fr 7 #7 0x8063c5b3 in _vn_lock (vp=, flags=2121728, "value optimized out" - not good file=, line=) at vnode_if.h:859 859vnode_if.h: No such file or directory. in vnode_if.h (kgdb) print *vp I am not sure if this output is valid, because of the message above. Could you please try to navigate to nearby frames and see if vp itself has a valid value there. If you can find such a frame please do *vp there. Does this seems better? (kgdb) fr 8 #8 0x8062a5f7 in vget (vp=0xf80049c2c000, flags=2121728, td=0xf80009ba0500) at /usr/src/sys/kern/vfs_subr.c:2523 2523if ((error = vn_lock(vp, flags)) != 0) { (kgdb) print *vp $1 = {v_tag = 0x813be535 "zfs", v_op = 0x813d0f70, v_data = 0xf80049c1f420, v_mount = 0xf800093aa660, v_nmntvnodes = {tqe_next = 0xf80049c2c938, tqe_prev = 0xf80049c2bb30}, v_un = {vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, le_prev = 0x0}, v_cache_src = {lh_first = 0x0}, v_cache_dst = { tqh_first = 0xf800bfc8e3f0, tqh_last = 0xf800bfc8e410}, v_cache_dd = 0x0, v_lock = {lock_object = { lo_name = 0x813be535 "zfs", lo_flags = 117112832, lo_data = 0, lo_witness = 0x0}, lk_lock = 23, lk_exslpfail = 0, lk_timo = 51, lk_pri = 96}, v_interlock = {lock_object = {lo_name = 0x8099e9e0 "vnode interlock", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, v_vnlock = 0xf80049c2c068, v_actfreelist = { tqe_next = 0xf80049c2c938, tqe_prev = 0xf80049ae9bd0}, v_bufobj = {bo_lock = {lock_object = { lo_name = 0x8099e9f0 "bufobj interlock", lo_flags = 86179840, lo_data = 0, lo_witness = 0x0}, rw_lock = 1}, bo_ops = 0x80c4bf70, bo_object = 0xf800b62e9c60, bo_synclist = {le_next = 0x0, le_prev = 0x0}, bo_private = 0xf80049c2c000, __bo_vnode = 0xf80049c2c000, bo_clean = {bv_hd = {tqh_first = 0x0, tqh_last = 0xf80049c2c120}, bv_root = {pt_root = 0}, bv_cnt = 0}, bo_dirty = {bv_hd = {tqh_first = 0x0, tqh_last = 0xf80049c2c140}, bv_root = {pt_root = 0}, bv_cnt = 0}, bo_numoutput = 0, bo_flag = 0, bo_bsize = 131072}, v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters = {tqh_first = 0x0, tqh_last = 0xf80049c2c188}, rl_currdep = 0x0}, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0, v_holdcnt = 9, v_usecount = 6, v_iflag = 512, v_vflag = 32, v_writecount = 0, v_hash = 4833984, v_type = VREG} (kgdb) Henri ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 10/11/2016 18:12, Henri Hennebert wrote: > On 11/10/2016 16:54, Andriy Gapon wrote: >> On 10/11/2016 17:20, Henri Hennebert wrote: >>> On 11/10/2016 15:00, Andriy Gapon wrote: Interesting. I can not spot any suspicious thread that would hold the vnode lock. Could you please run kgdb (just like that, no arguments), then execute 'bt' command and then select a frame when _vn_lock is called with 'fr N' command. Then please 'print *vp' and share the result. >>> I Think I miss something in your request: >> >> Oh, sorry! The very first step should be 'tid 101112' to switch to the >> correct >> context. >> > > (kgdb) fr 7 > #7 0x8063c5b3 in _vn_lock (vp=, flags=2121728, "value optimized out" - not good > file=, > line=) at vnode_if.h:859 > 859vnode_if.h: No such file or directory. > in vnode_if.h > (kgdb) print *vp I am not sure if this output is valid, because of the message above. Could you please try to navigate to nearby frames and see if vp itself has a valid value there. If you can find such a frame please do *vp there. > $1 = {v_tag = 0x80faeb78 "â~\231\200", v_op = 0xf80009a41000, > v_data = 0x0, v_mount = 0xf80009a41010, > v_nmntvnodes = {tqe_next = 0x0, tqe_prev = 0x80edc088}, v_un = > {vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0, > vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0xf80009466e90, le_prev = > 0x0}, v_cache_src = {lh_first = 0xfe010186d768}, > v_cache_dst = {tqh_first = 0x0, tqh_last = 0xfeb8a7c0}, v_cache_dd = > 0xf8000284f000, v_lock = {lock_object = { > lo_name = 0xf8002c00ee80 "", lo_flags = 0, lo_data = 0, lo_witness = > 0xf800068bd480}, > lk_lock = 1844673520268056, lk_exslpfail = 153715840, lk_timo = -2048, > lk_pri = 0}, v_interlock = {lock_object = { > lo_name = 0x18af8 address>, lo_flags = 0, lo_data = 0, > lo_witness = 0x0}, mtx_lock = 0}, v_vnlock = 0x0, v_actfreelist = > {tqe_next = 0x0, tqe_prev = 0xf80009ba05c0}, > v_bufobj = {bo_lock = {lock_object = {lo_name = 0xf80009a41000 "", > lo_flags = 1, lo_data = 0, lo_witness = 0x400ff}, > rw_lock = 2}, bo_ops = 0x1, bo_object = 0xf80049c2c068, > bo_synclist = {le_next = 0x813be535, > le_prev = 0x1}, bo_private = 0x0, __bo_vnode = 0x0, > bo_clean = > {bv_hd = {tqh_first = 0x0, tqh_last = 0x0}, > bv_root = {pt_root = 0}, bv_cnt = 0}, bo_dirty = {bv_hd = {tqh_first = > 0xf80088ac8d00, tqh_last = 0xf8003cc5b600}, > bv_root = {pt_root = 2553161591}, bv_cnt = -1741805705}, bo_numoutput = > 31, bo_flag = 0, bo_bsize = 0}, v_pollinfo = 0x0, > v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters = {tqh_first = 0xf88, > tqh_last = 0x19cc}, rl_currdep = 0x3f8}, > v_cstart = 16256, v_lasta = 679, v_lastw = 0, v_clen = 0, v_holdcnt = 0, > v_usecount = 2369, v_iflag = 0, v_vflag = 0, > v_writecount = 0, v_hash = 0, v_type = VNON} > (kgdb) > > Thanks for your time > > Henri -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 11/10/2016 16:54, Andriy Gapon wrote: On 10/11/2016 17:20, Henri Hennebert wrote: On 11/10/2016 15:00, Andriy Gapon wrote: Interesting. I can not spot any suspicious thread that would hold the vnode lock. Could you please run kgdb (just like that, no arguments), then execute 'bt' command and then select a frame when _vn_lock is called with 'fr N' command. Then please 'print *vp' and share the result. I Think I miss something in your request: Oh, sorry! The very first step should be 'tid 101112' to switch to the correct context. (kgdb) fr 7 #7 0x8063c5b3 in _vn_lock (vp=, flags=2121728, file=, line=) at vnode_if.h:859 859 vnode_if.h: No such file or directory. in vnode_if.h (kgdb) print *vp $1 = {v_tag = 0x80faeb78 "â~\231\200", v_op = 0xf80009a41000, v_data = 0x0, v_mount = 0xf80009a41010, v_nmntvnodes = {tqe_next = 0x0, tqe_prev = 0x80edc088}, v_un = {vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0xf80009466e90, le_prev = 0x0}, v_cache_src = {lh_first = 0xfe010186d768}, v_cache_dst = {tqh_first = 0x0, tqh_last = 0xfeb8a7c0}, v_cache_dd = 0xf8000284f000, v_lock = {lock_object = { lo_name = 0xf8002c00ee80 "", lo_flags = 0, lo_data = 0, lo_witness = 0xf800068bd480}, lk_lock = 1844673520268056, lk_exslpfail = 153715840, lk_timo = -2048, lk_pri = 0}, v_interlock = {lock_object = { lo_name = 0x18af8 Bad address>, lo_flags = 0, lo_data = 0, lo_witness = 0x0}, mtx_lock = 0}, v_vnlock = 0x0, v_actfreelist = {tqe_next = 0x0, tqe_prev = 0xf80009ba05c0}, v_bufobj = {bo_lock = {lock_object = {lo_name = 0xf80009a41000 "", lo_flags = 1, lo_data = 0, lo_witness = 0x400ff}, rw_lock = 2}, bo_ops = 0x1, bo_object = 0xf80049c2c068, bo_synclist = {le_next = 0x813be535, le_prev = 0x1}, bo_private = 0x0, __bo_vnode = 0x0, bo_clean = {bv_hd = {tqh_first = 0x0, tqh_last = 0x0}, bv_root = {pt_root = 0}, bv_cnt = 0}, bo_dirty = {bv_hd = {tqh_first = 0xf80088ac8d00, tqh_last = 0xf8003cc5b600}, bv_root = {pt_root = 2553161591}, bv_cnt = -1741805705}, bo_numoutput = 31, bo_flag = 0, bo_bsize = 0}, v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters = {tqh_first = 0xf88, tqh_last = 0x19cc}, rl_currdep = 0x3f8}, v_cstart = 16256, v_lasta = 679, v_lastw = 0, v_clen = 0, v_holdcnt = 0, v_usecount = 2369, v_iflag = 0, v_vflag = 0, v_writecount = 0, v_hash = 0, v_type = VNON} (kgdb) Thanks for your time Henri ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 10/11/2016 17:20, Henri Hennebert wrote: > On 11/10/2016 15:00, Andriy Gapon wrote: >> Interesting. I can not spot any suspicious thread that would hold the vnode >> lock. Could you please run kgdb (just like that, no arguments), then execute >> 'bt' command and then select a frame when _vn_lock is called with 'fr N' >> command. Then please 'print *vp' and share the result. >> > I Think I miss something in your request: Oh, sorry! The very first step should be 'tid 101112' to switch to the correct context. > [root@avoriaz ~]# kgdb > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "amd64-marcel-freebsd"... > Reading symbols from /boot/kernel/zfs.ko...Reading symbols from > /usr/lib/debug//boot/kernel/zfs.ko.debug...done. > done. > Loaded symbols for /boot/kernel/zfs.ko > Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from > /usr/lib/debug//boot/kernel/opensolaris.ko.debug...done. > done. > > --- clip --- > > Loaded symbols for /boot/kernel/accf_data.ko > Reading symbols from /boot/kernel/daemon_saver.ko...Reading symbols from > /usr/lib/debug//boot/kernel/daemon_saver.ko.debug...done. > done. > Loaded symbols for /boot/kernel/daemon_saver.ko > #0 sched_switch (td=0xf8001131da00, newtd=0xf800762a8500, > flags= optimized out>) > at /usr/src/sys/kern/sched_ule.c:1973 > 1973cpuid = PCPU_GET(cpuid); > (kgdb) bt > #0 sched_switch (td=0xf8001131da00, newtd=0xf800762a8500, > flags= optimized out>) > at /usr/src/sys/kern/sched_ule.c:1973 > #1 0x80566b15 in tc_fill_vdso_timehands32 (vdso_th32=0x0) at > /usr/src/sys/kern/kern_tc.c:2121 > #2 0x80555227 in timekeep_push_vdso () at > /usr/src/sys/kern/kern_sharedpage.c:174 > #3 0x80566226 in tc_windup () at /usr/src/sys/kern/kern_tc.c:1426 > #4 0x804eaa41 in hardclock_cnt (cnt=1, usermode= out>) > at /usr/src/sys/kern/kern_clock.c:589 > #5 0x808fac74 in handleevents (now=, fake=0) at > /usr/src/sys/kern/kern_clocksource.c:223 > #6 0x808fb1d7 in timercb (et=0x8100cf20, arg= out>) at /usr/src/sys/kern/kern_clocksource.c:352 > #7 0xf800b6429a00 in ?? () > #8 0x81051080 in vm_page_array () > #9 0x81051098 in vm_page_queue_free_mtx () > #10 0xfe0101818920 in ?? () > #11 0x805399c0 in __mtx_lock_sleep (c=, tid=Error > accessing memory address 0xffac: Bad add\ > ress. > ) at /usr/src/sys/kern/kern_mutex.c:590 > Previous frame inner to this frame (corrupt stack?) > Current language: auto; currently minimal > (kgdb) q > [root@avoriaz ~]# > > I don't find the requested frame > > Henri -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 11/10/2016 15:00, Andriy Gapon wrote: On 10/11/2016 12:30, Henri Hennebert wrote: On 11/10/2016 11:21, Andriy Gapon wrote: On 09/11/2016 15:58, Eric van Gyzen wrote: On 11/09/2016 07:48, Henri Hennebert wrote: I encounter a strange deadlock on FreeBSD avoriaz.restart.bel 11.0-RELEASE-p3 FreeBSD 11.0-RELEASE-p3 #0 r308260: Fri Nov 4 02:51:33 CET 2016 r...@avoriaz.restart.bel:/usr/obj/usr/src/sys/AVORIAZ amd64 This system is exclusively running on zfs. After 3 or 4 days, `periodic daily` is locked in the directory /usr/local/news/bin [root@avoriaz ~]# ps xa|grep find 85656 - D0:01.13 find / ( ! -fstype local -o -fstype rdonly ) -prune -o ( -name [#,]* -o -name .#* -o -name a.out -o -nam 462 1 S+ 0:00.00 grep find [root@avoriaz ~]# procstat -f 85656 PID COMMFD T V FLAGSREF OFFSET PRO NAME 85656 find text v r r--- - - - /usr/bin/find 85656 find cwd v d r--- - - - /usr/local/news/bin 85656 find root v d r--- - - - / 85656 find 0 v c r--- 3 0 - /dev/null 85656 find 1 p - rw-- 1 0 - - 85656 find 2 v r -w-- 7 17 - - 85656 find 3 v d r--- 1 0 - /home/root 85656 find 4 v d r--- 1 0 - /home/root 85656 find 5 v d rn-- 1 533545184 - /usr/local/news/bin [root@avoriaz ~]# If I try `ls /usr/local/news/bin` it is also locked. After `shutdown -r now` the system remain locked after the line '0 0 0 0 0 0' After a reset and reboot I can access /usr/local/news/bin. I delete this directory and reinstall the package `portupgrade -fu news/inn` 5 days later `periodic daily`is locked on the same directory :-o Any idea? I can't help with the deadlock, but someone who _can_ help will probably ask for the output of "procstat -kk PID" with the PID of the "find" process. In fact, it's procstat -kk -a. With just one thread we would see that a thread is blocked on something, but we won't see why that something can not be acquired. I attach the result, Interesting. I can not spot any suspicious thread that would hold the vnode lock. Could you please run kgdb (just like that, no arguments), then execute 'bt' command and then select a frame when _vn_lock is called with 'fr N' command. Then please 'print *vp' and share the result. I Think I miss something in your request: [root@avoriaz ~]# kgdb GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/zfs.ko.debug...done. done. Loaded symbols for /boot/kernel/zfs.ko Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /usr/lib/debug//boot/kernel/opensolaris.ko.debug...done. done. --- clip --- Loaded symbols for /boot/kernel/accf_data.ko Reading symbols from /boot/kernel/daemon_saver.ko...Reading symbols from /usr/lib/debug//boot/kernel/daemon_saver.ko.debug...done. done. Loaded symbols for /boot/kernel/daemon_saver.ko #0 sched_switch (td=0xf8001131da00, newtd=0xf800762a8500, flags=) at /usr/src/sys/kern/sched_ule.c:1973 1973cpuid = PCPU_GET(cpuid); (kgdb) bt #0 sched_switch (td=0xf8001131da00, newtd=0xf800762a8500, flags=) at /usr/src/sys/kern/sched_ule.c:1973 #1 0x80566b15 in tc_fill_vdso_timehands32 (vdso_th32=0x0) at /usr/src/sys/kern/kern_tc.c:2121 #2 0x80555227 in timekeep_push_vdso () at /usr/src/sys/kern/kern_sharedpage.c:174 #3 0x80566226 in tc_windup () at /usr/src/sys/kern/kern_tc.c:1426 #4 0x804eaa41 in hardclock_cnt (cnt=1, usermode=optimized out>) at /usr/src/sys/kern/kern_clock.c:589 #5 0x808fac74 in handleevents (now=, fake=0) at /usr/src/sys/kern/kern_clocksource.c:223 #6 0x808fb1d7 in timercb (et=0x8100cf20, arg=optimized out>) at /usr/src/sys/kern/kern_clocksource.c:352 #7 0xf800b6429a00 in ?? () #8 0x81051080 in vm_page_array () #9 0x81051098 in vm_page_queue_free_mtx () #10 0xfe0101818920 in ?? () #11 0x805399c0 in __mtx_lock_sleep (c=, tid=Error accessing memory address 0xffac: Bad add\ ress. ) at /usr/src/sys/kern/kern_mutex.c:590 Previous frame inner to this frame (corrupt stack?) Current language: auto; currently minimal (kgdb) q [root@avoriaz ~]# I don't find the requested frame Henri ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 10/11/2016 12:30, Henri Hennebert wrote: > On 11/10/2016 11:21, Andriy Gapon wrote: >> On 09/11/2016 15:58, Eric van Gyzen wrote: >>> On 11/09/2016 07:48, Henri Hennebert wrote: I encounter a strange deadlock on FreeBSD avoriaz.restart.bel 11.0-RELEASE-p3 FreeBSD 11.0-RELEASE-p3 #0 r308260: Fri Nov 4 02:51:33 CET 2016 r...@avoriaz.restart.bel:/usr/obj/usr/src/sys/AVORIAZ amd64 This system is exclusively running on zfs. After 3 or 4 days, `periodic daily` is locked in the directory /usr/local/news/bin [root@avoriaz ~]# ps xa|grep find 85656 - D0:01.13 find / ( ! -fstype local -o -fstype rdonly ) -prune -o ( -name [#,]* -o -name .#* -o -name a.out -o -nam 462 1 S+ 0:00.00 grep find [root@avoriaz ~]# procstat -f 85656 PID COMMFD T V FLAGSREF OFFSET PRO NAME 85656 find text v r r--- - - - /usr/bin/find 85656 find cwd v d r--- - - - /usr/local/news/bin 85656 find root v d r--- - - - / 85656 find 0 v c r--- 3 0 - /dev/null 85656 find 1 p - rw-- 1 0 - - 85656 find 2 v r -w-- 7 17 - - 85656 find 3 v d r--- 1 0 - /home/root 85656 find 4 v d r--- 1 0 - /home/root 85656 find 5 v d rn-- 1 533545184 - /usr/local/news/bin [root@avoriaz ~]# If I try `ls /usr/local/news/bin` it is also locked. After `shutdown -r now` the system remain locked after the line '0 0 0 0 0 0' After a reset and reboot I can access /usr/local/news/bin. I delete this directory and reinstall the package `portupgrade -fu news/inn` 5 days later `periodic daily`is locked on the same directory :-o Any idea? >>> >>> I can't help with the deadlock, but someone who _can_ help will probably >>> ask for >>> the output of "procstat -kk PID" with the PID of the "find" process. >> >> In fact, it's procstat -kk -a. With just one thread we would see that a >> thread >> is blocked on something, but we won't see why that something can not be >> acquired. >> >> > I attach the result, Interesting. I can not spot any suspicious thread that would hold the vnode lock. Could you please run kgdb (just like that, no arguments), then execute 'bt' command and then select a frame when _vn_lock is called with 'fr N' command. Then please 'print *vp' and share the result. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 11/10/2016 11:21, Andriy Gapon wrote: On 09/11/2016 15:58, Eric van Gyzen wrote: On 11/09/2016 07:48, Henri Hennebert wrote: I encounter a strange deadlock on FreeBSD avoriaz.restart.bel 11.0-RELEASE-p3 FreeBSD 11.0-RELEASE-p3 #0 r308260: Fri Nov 4 02:51:33 CET 2016 r...@avoriaz.restart.bel:/usr/obj/usr/src/sys/AVORIAZ amd64 This system is exclusively running on zfs. After 3 or 4 days, `periodic daily` is locked in the directory /usr/local/news/bin [root@avoriaz ~]# ps xa|grep find 85656 - D0:01.13 find / ( ! -fstype local -o -fstype rdonly ) -prune -o ( -name [#,]* -o -name .#* -o -name a.out -o -nam 462 1 S+ 0:00.00 grep find [root@avoriaz ~]# procstat -f 85656 PID COMMFD T V FLAGSREF OFFSET PRO NAME 85656 find text v r r--- - - - /usr/bin/find 85656 find cwd v d r--- - - - /usr/local/news/bin 85656 find root v d r--- - - - / 85656 find 0 v c r--- 3 0 - /dev/null 85656 find 1 p - rw-- 1 0 - - 85656 find 2 v r -w-- 7 17 - - 85656 find 3 v d r--- 1 0 - /home/root 85656 find 4 v d r--- 1 0 - /home/root 85656 find 5 v d rn-- 1 533545184 - /usr/local/news/bin [root@avoriaz ~]# If I try `ls /usr/local/news/bin` it is also locked. After `shutdown -r now` the system remain locked after the line '0 0 0 0 0 0' After a reset and reboot I can access /usr/local/news/bin. I delete this directory and reinstall the package `portupgrade -fu news/inn` 5 days later `periodic daily`is locked on the same directory :-o Any idea? I can't help with the deadlock, but someone who _can_ help will probably ask for the output of "procstat -kk PID" with the PID of the "find" process. In fact, it's procstat -kk -a. With just one thread we would see that a thread is blocked on something, but we won't see why that something can not be acquired. I attach the result, Henri [root@avoriaz ~]# procstat -kk -a PIDTID COMM TDNAME KSTACK 0 10 kernel swapper mi_switch+0xd2 sleepq_timedwait+0x3a _sleep+0x281 swapper+0x464 btext+0x2c 0 19 kernel kqueue_ctx taskq mi_switch+0xd2 sleepq_wait+0x3a _sleep+0x2a1 taskqueue_thread_loop+0x141 fork_exit+0x85 fork_trampoline+0xe 0 100012 kernel aiod_kick taskq mi_switch+0xd2 sleepq_wait+0x3a _sleep+0x2a1 taskqueue_thread_loop+0x141 fork_exit+0x85 fork_trampoline+0xe 0 100013 kernel thread taskq mi_switch+0xd2 sleepq_wait+0x3a _sleep+0x2a1 taskqueue_thread_loop+0x141 fork_exit+0x85 fork_trampoline+0xe 0 100018 kernel firmware taskq mi_switch+0xd2 sleepq_wait+0x3a _sleep+0x2a1 taskqueue_thread_loop+0x141 fork_exit+0x85 fork_trampoline+0xe 0 100022 kernel acpi_task_0 mi_switch+0xd2 sleepq_wait+0x3a msleep_spin_sbt+0x1bd taskqueue_thread_loop+0x113 fork_exit+0x85 fork_trampoline+0xe 0 100023 kernel acpi_task_1 mi_switch+0xd2 sleepq_wait+0x3a msleep_spin_sbt+0x1bd taskqueue_thread_loop+0x113 fork_exit+0x85 fork_trampoline+0xe 0 100024 kernel acpi_task_2 mi_switch+0xd2 sleepq_wait+0x3a msleep_spin_sbt+0x1bd taskqueue_thread_loop+0x113 fork_exit+0x85 fork_trampoline+0xe 0 100025 kernel em0 que mi_switch+0xd2 sleepq_wait+0x3a msleep_spin_sbt+0x1bd taskqueue_thread_loop+0x113 fork_exit+0x85 fork_trampoline+0xe 0 100026 kernel em0 txq mi_switch+0xd2 sleepq_wait+0x3a msleep_spin_sbt+0x1bd taskqueue_thread_loop+0x113 fork_exit+0x85 fork_trampoline+0xe 0 100027 kernel em1 taskqmi_switch+0xd2 sleepq_wait+0x3a msleep_spin_sbt+0x1bd taskqueue_thread_loop+0x113 fork_exit+0x85 fork_trampoline+0xe 0 100060 kernel mca taskqmi_switch+0xd2 sleepq_wait+0x3a msleep_spin_sbt+0x1bd taskqueue_thread_loop+0x113 fork_exit+0x85 fork_trampoline+0xe 0 100061 kernel system_taskq_0 mi_switch+0xd2 sleepq_wait+0x3a _sleep+0x2a1 taskqueue_thread_loop+0x141 fork_exit+0x85 fork_trampoline+0xe 0 100062 kernel system_taskq_1 mi_switch+0xd2 sleepq_wait+0x3a _sleep+0x2a1 taskqueue_thread_loop+0x141 fork_exit+0x85 fork_trampoline+0xe 0 100063 kernel dbu_evictmi_switch+0xd2 sleepq_wait+0x3a _sleep+0x2a1 taskqueue_thread_loop+0x141 fork_exit+0x85 fork_trampoline+0xe 0 100072 kernel CAM taskqmi_switch+0xd2 sleepq_wait+0x3a _sleep+0x2a1 taskqueue_thread_loop+0x141 fork_exit+0x85 fork_trampoline+0xe 0 100086 kernel if_config_tqg_0 mi_switch+0xd2 sleepq_wait+0x3a msleep_spin_sbt+0x1bd gtaskqueue_thread_loop+0x113 fork_exit+0x85 fork_trampoline+0xe 0 100087 kernel if_io_tqg_0 mi_switch+0xd2 sleepq_wait+0x3a
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 09/11/2016 15:58, Eric van Gyzen wrote: > On 11/09/2016 07:48, Henri Hennebert wrote: >> I encounter a strange deadlock on >> >> FreeBSD avoriaz.restart.bel 11.0-RELEASE-p3 FreeBSD 11.0-RELEASE-p3 #0 >> r308260: >> Fri Nov 4 02:51:33 CET 2016 >> r...@avoriaz.restart.bel:/usr/obj/usr/src/sys/AVORIAZ amd64 >> >> This system is exclusively running on zfs. >> >> After 3 or 4 days, `periodic daily` is locked in the directory >> /usr/local/news/bin >> >> [root@avoriaz ~]# ps xa|grep find >> 85656 - D0:01.13 find / ( ! -fstype local -o -fstype rdonly ) >> -prune >> -o ( -name [#,]* -o -name .#* -o -name a.out -o -nam >> 462 1 S+ 0:00.00 grep find >> [root@avoriaz ~]# procstat -f 85656 >> PID COMMFD T V FLAGSREF OFFSET PRO NAME >> 85656 find text v r r--- - - - /usr/bin/find >> 85656 find cwd v d r--- - - - /usr/local/news/bin >> 85656 find root v d r--- - - - / >> 85656 find 0 v c r--- 3 0 - /dev/null >> 85656 find 1 p - rw-- 1 0 - - >> 85656 find 2 v r -w-- 7 17 - - >> 85656 find 3 v d r--- 1 0 - /home/root >> 85656 find 4 v d r--- 1 0 - /home/root >> 85656 find 5 v d rn-- 1 533545184 - /usr/local/news/bin >> [root@avoriaz ~]# >> >> If I try `ls /usr/local/news/bin` it is also locked. >> >> After `shutdown -r now` the system remain locked after the line '0 0 0 0 0 0' >> >> After a reset and reboot I can access /usr/local/news/bin. >> >> I delete this directory and reinstall the package `portupgrade -fu news/inn` >> >> 5 days later `periodic daily`is locked on the same directory :-o >> >> Any idea? > > I can't help with the deadlock, but someone who _can_ help will probably ask > for > the output of "procstat -kk PID" with the PID of the "find" process. In fact, it's procstat -kk -a. With just one thread we would see that a thread is blocked on something, but we won't see why that something can not be acquired. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 11/09/2016 19:23, Thierry Thomas wrote: Le mer. 9 nov. 16 à 15:03:49 +0100, Henri Hennebertécrivait : [root@avoriaz ~]# procstat -kk 85656 PIDTID COMM TDNAME KSTACK 85656 101112 find -mi_switch+0xd2 sleepq_wait+0x3a sleeplk+0x1b4 __lockmgr_args+0x356 vop_stdlock+0x3c VOP_LOCK1_APV+0x8d _vn_lock+0x43 vget+0x47 cache_lookup+0x679 vfs_cache_lookup+0xac VOP_LOOKUP_APV+0x87 lookup+0x591 namei+0x572 kern_statat+0xa8 sys_fstatat+0x2c amd64_syscall+0x4ce Xfast_syscall+0xfb It looks similar to the problem reportes in PR 205163 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=205163 May be causes by too small values for some vfs.zfs.arc*. Could you please list sysctl for vfs.zfs.arc_max and others? Regards, [root@avoriaz ~]# sysctl vfs.zfs vfs.zfs.trim.max_interval: 1 vfs.zfs.trim.timeout: 30 vfs.zfs.trim.txg_delay: 32 vfs.zfs.trim.enabled: 1 vfs.zfs.vol.unmap_enabled: 1 vfs.zfs.vol.recursive: 0 vfs.zfs.vol.mode: 1 vfs.zfs.version.zpl: 5 vfs.zfs.version.spa: 5000 vfs.zfs.version.acl: 1 vfs.zfs.version.ioctl: 6 vfs.zfs.debug: 0 vfs.zfs.super_owner: 0 vfs.zfs.sync_pass_rewrite: 2 vfs.zfs.sync_pass_dont_compress: 5 vfs.zfs.sync_pass_deferred_free: 2 vfs.zfs.zio.exclude_metadata: 0 vfs.zfs.zio.use_uma: 1 vfs.zfs.cache_flush_disable: 0 vfs.zfs.zil_replay_disable: 0 vfs.zfs.min_auto_ashift: 9 vfs.zfs.max_auto_ashift: 13 vfs.zfs.vdev.trim_max_pending: 1 vfs.zfs.vdev.bio_delete_disable: 0 vfs.zfs.vdev.bio_flush_disable: 0 vfs.zfs.vdev.write_gap_limit: 4096 vfs.zfs.vdev.read_gap_limit: 32768 vfs.zfs.vdev.aggregation_limit: 131072 vfs.zfs.vdev.trim_max_active: 64 vfs.zfs.vdev.trim_min_active: 1 vfs.zfs.vdev.scrub_max_active: 2 vfs.zfs.vdev.scrub_min_active: 1 vfs.zfs.vdev.async_write_max_active: 10 vfs.zfs.vdev.async_write_min_active: 1 vfs.zfs.vdev.async_read_max_active: 3 vfs.zfs.vdev.async_read_min_active: 1 vfs.zfs.vdev.sync_write_max_active: 10 vfs.zfs.vdev.sync_write_min_active: 10 vfs.zfs.vdev.sync_read_max_active: 10 vfs.zfs.vdev.sync_read_min_active: 10 vfs.zfs.vdev.max_active: 1000 vfs.zfs.vdev.async_write_active_max_dirty_percent: 60 vfs.zfs.vdev.async_write_active_min_dirty_percent: 30 vfs.zfs.vdev.mirror.non_rotating_seek_inc: 1 vfs.zfs.vdev.mirror.non_rotating_inc: 0 vfs.zfs.vdev.mirror.rotating_seek_offset: 1048576 vfs.zfs.vdev.mirror.rotating_seek_inc: 5 vfs.zfs.vdev.mirror.rotating_inc: 0 vfs.zfs.vdev.trim_on_init: 1 vfs.zfs.vdev.cache.bshift: 16 vfs.zfs.vdev.cache.size: 0 vfs.zfs.vdev.cache.max: 16384 vfs.zfs.vdev.metaslabs_per_vdev: 200 vfs.zfs.txg.timeout: 5 vfs.zfs.space_map_blksz: 4096 vfs.zfs.spa_slop_shift: 5 vfs.zfs.spa_asize_inflation: 24 vfs.zfs.deadman_enabled: 1 vfs.zfs.deadman_checktime_ms: 5000 vfs.zfs.deadman_synctime_ms: 100 vfs.zfs.debug_flags: 0 vfs.zfs.recover: 0 vfs.zfs.spa_load_verify_data: 1 vfs.zfs.spa_load_verify_metadata: 1 vfs.zfs.spa_load_verify_maxinflight: 1 vfs.zfs.ccw_retry_interval: 300 vfs.zfs.check_hostid: 1 vfs.zfs.mg_fragmentation_threshold: 85 vfs.zfs.mg_noalloc_threshold: 0 vfs.zfs.condense_pct: 200 vfs.zfs.metaslab.bias_enabled: 1 vfs.zfs.metaslab.lba_weighting_enabled: 1 vfs.zfs.metaslab.fragmentation_factor_enabled: 1 vfs.zfs.metaslab.preload_enabled: 1 vfs.zfs.metaslab.preload_limit: 3 vfs.zfs.metaslab.unload_delay: 8 vfs.zfs.metaslab.load_pct: 50 vfs.zfs.metaslab.min_alloc_size: 33554432 vfs.zfs.metaslab.df_free_pct: 4 vfs.zfs.metaslab.df_alloc_threshold: 131072 vfs.zfs.metaslab.debug_unload: 0 vfs.zfs.metaslab.debug_load: 0 vfs.zfs.metaslab.fragmentation_threshold: 70 vfs.zfs.metaslab.gang_bang: 16777217 vfs.zfs.free_bpobj_enabled: 1 vfs.zfs.free_max_blocks: 18446744073709551615 vfs.zfs.no_scrub_prefetch: 0 vfs.zfs.no_scrub_io: 0 vfs.zfs.resilver_min_time_ms: 3000 vfs.zfs.free_min_time_ms: 1000 vfs.zfs.scan_min_time_ms: 1000 vfs.zfs.scan_idle: 50 vfs.zfs.scrub_delay: 4 vfs.zfs.resilver_delay: 2 vfs.zfs.top_maxinflight: 32 vfs.zfs.zfetch.array_rd_sz: 1048576 vfs.zfs.zfetch.max_distance: 8388608 vfs.zfs.zfetch.min_sec_reap: 2 vfs.zfs.zfetch.max_streams: 8 vfs.zfs.prefetch_disable: 1 vfs.zfs.delay_scale: 50 vfs.zfs.delay_min_dirty_percent: 60 vfs.zfs.dirty_data_sync: 67108864 vfs.zfs.dirty_data_max_percent: 10 vfs.zfs.dirty_data_max_max: 4294967296 vfs.zfs.dirty_data_max: 373664153 vfs.zfs.max_recordsize: 1048576 vfs.zfs.mdcomp_disable: 0 vfs.zfs.nopwrite_enabled: 1 vfs.zfs.dedup.prefetch: 1 vfs.zfs.l2c_only_size: 0 vfs.zfs.mfu_ghost_data_lsize: 24202240 vfs.zfs.mfu_ghost_metadata_lsize: 136404992 vfs.zfs.mfu_ghost_size: 160607232 vfs.zfs.mfu_data_lsize: 449569280 vfs.zfs.mfu_metadata_lsize: 102724608 vfs.zfs.mfu_size: 714202624 vfs.zfs.mru_ghost_data_lsize: 874834432 vfs.zfs.mru_ghost_metadata_lsize: 387692032 vfs.zfs.mru_ghost_size: 1262526464 vfs.zfs.mru_data_lsize: 151275008 vfs.zfs.mru_metadata_lsize: 13547008 vfs.zfs.mru_size: 322614272 vfs.zfs.anon_data_lsize: 0 vfs.zfs.anon_metadata_lsize: 0 vfs.zfs.anon_size: 2916352 vfs.zfs.l2arc_norw: 1
Re: Freebsd 11.0 RELEASE - ZFS deadlock
Le mer. 9 nov. 16 à 15:03:49 +0100, Henri Hennebertécrivait : > [root@avoriaz ~]# procstat -kk 85656 >PIDTID COMM TDNAME KSTACK > 85656 101112 find -mi_switch+0xd2 > sleepq_wait+0x3a sleeplk+0x1b4 __lockmgr_args+0x356 vop_stdlock+0x3c > VOP_LOCK1_APV+0x8d _vn_lock+0x43 vget+0x47 cache_lookup+0x679 > vfs_cache_lookup+0xac VOP_LOOKUP_APV+0x87 lookup+0x591 namei+0x572 > kern_statat+0xa8 sys_fstatat+0x2c amd64_syscall+0x4ce Xfast_syscall+0xfb It looks similar to the problem reportes in PR 205163 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=205163 May be causes by too small values for some vfs.zfs.arc*. Could you please list sysctl for vfs.zfs.arc_max and others? Regards, -- Th. Thomas. signature.asc Description: PGP signature
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 11/09/2016 14:58, Eric van Gyzen wrote: On 11/09/2016 07:48, Henri Hennebert wrote: I encounter a strange deadlock on FreeBSD avoriaz.restart.bel 11.0-RELEASE-p3 FreeBSD 11.0-RELEASE-p3 #0 r308260: Fri Nov 4 02:51:33 CET 2016 r...@avoriaz.restart.bel:/usr/obj/usr/src/sys/AVORIAZ amd64 This system is exclusively running on zfs. After 3 or 4 days, `periodic daily` is locked in the directory /usr/local/news/bin [root@avoriaz ~]# ps xa|grep find 85656 - D0:01.13 find / ( ! -fstype local -o -fstype rdonly ) -prune -o ( -name [#,]* -o -name .#* -o -name a.out -o -nam 462 1 S+ 0:00.00 grep find [root@avoriaz ~]# procstat -f 85656 PID COMMFD T V FLAGSREF OFFSET PRO NAME 85656 find text v r r--- - - - /usr/bin/find 85656 find cwd v d r--- - - - /usr/local/news/bin 85656 find root v d r--- - - - / 85656 find 0 v c r--- 3 0 - /dev/null 85656 find 1 p - rw-- 1 0 - - 85656 find 2 v r -w-- 7 17 - - 85656 find 3 v d r--- 1 0 - /home/root 85656 find 4 v d r--- 1 0 - /home/root 85656 find 5 v d rn-- 1 533545184 - /usr/local/news/bin [root@avoriaz ~]# If I try `ls /usr/local/news/bin` it is also locked. After `shutdown -r now` the system remain locked after the line '0 0 0 0 0 0' After a reset and reboot I can access /usr/local/news/bin. I delete this directory and reinstall the package `portupgrade -fu news/inn` 5 days later `periodic daily`is locked on the same directory :-o Any idea? I can't help with the deadlock, but someone who _can_ help will probably ask for the output of "procstat -kk PID" with the PID of the "find" process. Eric [root@avoriaz ~]# procstat -kk 85656 PIDTID COMM TDNAME KSTACK 85656 101112 find -mi_switch+0xd2 sleepq_wait+0x3a sleeplk+0x1b4 __lockmgr_args+0x356 vop_stdlock+0x3c VOP_LOCK1_APV+0x8d _vn_lock+0x43 vget+0x47 cache_lookup+0x679 vfs_cache_lookup+0xac VOP_LOOKUP_APV+0x87 lookup+0x591 namei+0x572 kern_statat+0xa8 sys_fstatat+0x2c amd64_syscall+0x4ce Xfast_syscall+0xfb Henri ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Freebsd 11.0 RELEASE - ZFS deadlock
On 11/09/2016 07:48, Henri Hennebert wrote: > I encounter a strange deadlock on > > FreeBSD avoriaz.restart.bel 11.0-RELEASE-p3 FreeBSD 11.0-RELEASE-p3 #0 > r308260: > Fri Nov 4 02:51:33 CET 2016 > r...@avoriaz.restart.bel:/usr/obj/usr/src/sys/AVORIAZ amd64 > > This system is exclusively running on zfs. > > After 3 or 4 days, `periodic daily` is locked in the directory > /usr/local/news/bin > > [root@avoriaz ~]# ps xa|grep find > 85656 - D0:01.13 find / ( ! -fstype local -o -fstype rdonly ) -prune > -o ( -name [#,]* -o -name .#* -o -name a.out -o -nam > 462 1 S+ 0:00.00 grep find > [root@avoriaz ~]# procstat -f 85656 > PID COMMFD T V FLAGSREF OFFSET PRO NAME > 85656 find text v r r--- - - - /usr/bin/find > 85656 find cwd v d r--- - - - /usr/local/news/bin > 85656 find root v d r--- - - - / > 85656 find 0 v c r--- 3 0 - /dev/null > 85656 find 1 p - rw-- 1 0 - - > 85656 find 2 v r -w-- 7 17 - - > 85656 find 3 v d r--- 1 0 - /home/root > 85656 find 4 v d r--- 1 0 - /home/root > 85656 find 5 v d rn-- 1 533545184 - /usr/local/news/bin > [root@avoriaz ~]# > > If I try `ls /usr/local/news/bin` it is also locked. > > After `shutdown -r now` the system remain locked after the line '0 0 0 0 0 0' > > After a reset and reboot I can access /usr/local/news/bin. > > I delete this directory and reinstall the package `portupgrade -fu news/inn` > > 5 days later `periodic daily`is locked on the same directory :-o > > Any idea? I can't help with the deadlock, but someone who _can_ help will probably ask for the output of "procstat -kk PID" with the PID of the "find" process. Eric ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Freebsd 11.0 RELEASE - ZFS deadlock
I encounter a strange deadlock on FreeBSD avoriaz.restart.bel 11.0-RELEASE-p3 FreeBSD 11.0-RELEASE-p3 #0 r308260: Fri Nov 4 02:51:33 CET 2016 r...@avoriaz.restart.bel:/usr/obj/usr/src/sys/AVORIAZ amd64 This system is exclusively running on zfs. After 3 or 4 days, `periodic daily` is locked in the directory /usr/local/news/bin [root@avoriaz ~]# ps xa|grep find 85656 - D0:01.13 find / ( ! -fstype local -o -fstype rdonly ) -prune -o ( -name [#,]* -o -name .#* -o -name a.out -o -nam 462 1 S+ 0:00.00 grep find [root@avoriaz ~]# procstat -f 85656 PID COMMFD T V FLAGSREF OFFSET PRO NAME 85656 find text v r r--- - - - /usr/bin/find 85656 find cwd v d r--- - - - /usr/local/news/bin 85656 find root v d r--- - - - / 85656 find 0 v c r--- 3 0 - /dev/null 85656 find 1 p - rw-- 1 0 - - 85656 find 2 v r -w-- 7 17 - - 85656 find 3 v d r--- 1 0 - /home/root 85656 find 4 v d r--- 1 0 - /home/root 85656 find 5 v d rn-- 1 533545184 - /usr/local/news/bin [root@avoriaz ~]# If I try `ls /usr/local/news/bin` it is also locked. After `shutdown -r now` the system remain locked after the line '0 0 0 0 0 0' After a reset and reboot I can access /usr/local/news/bin. I delete this directory and reinstall the package `portupgrade -fu news/inn` 5 days later `periodic daily`is locked on the same directory :-o Any idea? Henri ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS deadlock on rrl-rr_ -- look familiar to anyone?
on 29/01/2013 05:21 Garrett Wollman said the following: When I restarted mountd, it hung waiting on rrl-rr_, but the system may already have been deadlocked at that point. procstat reported: 87678 104365 mountd -mi_switch sleepq_wait _cv_wait rrw_enter zfs_root lookup namei vfs_donmount sys_nmount amd64_syscall Xfast_syscall ... If it happens again procstat -kk -a -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
ZFS deadlock on rrl-rr_ -- look familiar to anyone?
I just had a big fileserver deadlock in an odd way. I was investigating a user's problem, and decided for various reasons to restart mountd. It had been complaining like this: Jan 28 21:06:43 nfs-prod-1 mountd[1108]: can't delete exports for /usr/local/.zfs/snapshot/monthly-2013-01: Invalid argument for a while, which is odd because /usr/local was never exported. When I restarted mountd, it hung waiting on rrl-rr_, but the system may already have been deadlocked at that point. procstat reported: 87678 104365 mountd -mi_switch sleepq_wait _cv_wait rrw_enter zfs_root lookup namei vfs_donmount sys_nmount amd64_syscall Xfast_syscall I was able to run shutdown, and the rc scripts eventually hung in sync(1) and timed out. The kernel then hung trying to do the same thing, but I was able to break into the debugger. The debugger interrupted an idle thread, which was not particularly helpful, but I was able to quickly gather the following information before I had to reset the machine to restore normal service. Locked vnodes 0xfe00536383c0: 0xfe00536383c0: tag syncer, type VNON tag syncer, type VNON usecount 1, writecount 0, refcount 2 mountedhere 0 usecount 1, writecount 0, refcount 2 mountedhere 0 flags (VI(0x200)) flags (VI(0x200)) lock type syncer: EXCL by thread 0xfe00348cc470 (pid 22) lock type syncer: EXCL by thread 0xfe00348cc470 (pid 22) db ps pid ppid pgrp uid state wmesg wchancmd 87996 1 87994 65534 D rrl-rr_ 0xfe0048ff8108 df 87976 1 87726 0 D+ rrl-rr_ 0xfe0048ff8108 sync 87707 1 87705 65534 D rrl-rr_ 0xfe0048ff8108 df 87700 1 87698 65534 D rrl-rr_ 0xfe0048ff8108 df 87678 1 87657 0 D+ rrl-rr_ 0xfe0048ff8108 mountd 87531 1 87529 65534 D rrl-rr_ 0xfe0048ff8108 df 87387 1 87385 65534 D rrl-rr_ 0xfe0048ff8108 df 87380 1 87378 65534 D rrl-rr_ 0xfe0048ff8108 df 87103 1 87101 65534 D rrl-rr_ 0xfe0048ff8108 df 87096 1 87094 65534 D rrl-rr_ 0xfe0048ff8108 df 85193 1 85192 0 D zio-io_ 0xfe10d3e75320 zfs 24 0 0 0 DL sdflush 0x80e50878 [softdepflush] 23 0 0 0 DL vlruwt 0xfe0048c0a940 [vnlru] 22 0 0 0 DL rrl-rr_ 0xfe0048ff8108 [syncer] 21 0 0 0 DL psleep 0x80e3c048 [bufdaemon] 20 0 0 0 DL pgzero 0x80e5a81c [pagezero] 19 0 0 0 DL psleep 0x80e599e8 [vmdaemon] 18 0 0 0 DL psleep 0x80e599ac [pagedaemon] 17 0 0 0 DL gkt:wait 0x80de6c0c [g_mp_kt] 16 0 0 0 DL ipmireq 0xfe00347400b8 [ipmi0: kcs] 9 0 0 0 DL ccb_scan 0x80dc1360 [xpt_thrd] 8 0 0 0 DL waiting_ 0x80e41e80 [sctp_iterator] 7 0 0 0 DL (threaded) [zfskern] 101355 D tx-tx_s 0xfe0050342e10 [txg_thread_enter] 101354 D tx-tx_q 0xfe0050342e30 [txg_thread_enter] 100989 D tx-tx_s 0xfe004fd27a10 [txg_thread_enter] 100988 D tx-tx_q 0xfe004fd27a30 [txg_thread_enter] 100593 D tx-tx_s 0xfe004a8c0a10 [txg_thread_enter] 100592 D tx-tx_q 0xfe004a8c0a30 [txg_thread_enter] 100216 D l2arc_fe 0x81228bc0 [l2arc_feed_thread] 100215 D arc_recl 0x81218d20 [arc_reclaim_thread] 15 0 0 0 DL (threaded) [usb] [32 uninteresting and identical threads deleted] 6 0 0 0 DL mps_scan 0xfe00276816a8 [mps_scan2] 5 0 0 0 DL mps_scan 0xfe0027612ca8 [mps_scan1] 4 0 0 0 DL mps_scan 0xfe00274ef4a8 [mps_scan0] 14 0 0 0 DL -0x80ded764 [yarrow] 3 0 0 0 DL crypto_r 0x80e4e0a0 [crypto returns] 2 0 0 0 DL crypto_w 0x80e4e060 [crypto] 13 0 0 0 DL (threaded) [geom] 100055 D -0x80de6b90 [g_down] 100054 D -0x80de6b88 [g_up] 100053 D -0x80de6b78 [g_event] 12 0 0 0 WL (threaded) [intr] 100189 I [irq1: atkbd0] 100188 I [swi0: uart uart] 100187 I [irq19: atapci1] 100186 I [irq18: atapci0+] 100169 I
8.0RC1, ZFS: deadlock
Hello, I have observed a deadlock condition when using ZFS. We are making a heavy usage of zfs send/zfs receive to keep a replica of a dataset on a remote machine. It can be done at one minute intervals. Maybe we're doing a somehow atypical usage of ZFS, but, well, seems to be a great solution to keep filesystem replicas once this is sorted out. How to reproduce: Set up two systems. A dataset with heavy I/O activity is replicated from the first to the second one. I've used a dataset containing /usr/ obj while I did a make buildworld. Replicate the dataset from the first machine to the second one using an incremental send zfs send -i pool/data...@nminus1 pool/data...@n | ssh destination zfs receive -d pool When there is read activity on the second system, reading the replicated system, I mean, having read access while zfs receive is updating it, there can be a deadlock. We have discovered this doing a test on a hopefully soon in production server, with 8 GB RAM. A Bacula backup agent was running and ZFS deadlocked. I have set up a couple of VMWare Fussion virtual machines in order to test this, and it has deadlocked as well. The virtual machines have little memory, 512 MB, but I don't believe this is the actual problem. There is no complaint about lack of memory. A running top shows processes stuck on zfsvfs last pid: 2051; load averages: 0.00, 0.07, 0.55up 0+01:18:25 12:05:48 37 processes: 1 running, 36 sleeping CPU: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle Mem: 18M Active, 20M Inact, 114M Wired, 40K Cache, 59M Buf, 327M Free Swap: 1024M Total, 1024M Free PID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU COMMAND 1914 root1 620 11932K 2564K zfsvfs 0 0:51 0.00% bsdtar 1093 borjam 1 440 8304K 2464K CPU11 0:32 0.00% top 1913 root1 540 11932K 2600K rrl-r 0 0:19 0.00% bsdtar 1019 root1 440 25108K 4812K select 0 0:05 0.00% sshd 2008 root1 760 13600K 1904K tx-tx 0 0:04 0.00% zfs 1089 borjam 1 440 37040K 5216K select 1 0:04 0.00% sshd 995 root1 760 8252K 2652K pause 0 0:02 0.00% csh 840 root1 440 11044K 3828K select 1 0:02 0.00% sendmail 1086 root1 760 37040K 5156K sbwait 1 0:01 0.00% sshd 850 root1 440 6920K 1612K nanslp 0 0:01 0.00% cron 607 root1 440 5992K 1540K select 1 0:01 0.00% syslogd 1090 borjam 1 760 8252K 2636K pause 1 0:01 0.00% csh 990 borjam 1 440 37040K 5220K select 0 0:00 0.00% sshd 985 root1 480 37040K 5160K sbwait 1 0:00 0.00% sshd 911 root1 440 8252K 2608K ttyin 0 0:00 0.00% csh 991 borjam 1 560 8252K 2636K pause 0 0:00 0.00% csh 844 smmsp 1 460 11044K 3852K pause 0 0:00 0.00% sendmail Interestingly, this has blocked access to all the filesystems. I cannot, for instance, ssh into the machine anymore, even though all the system-important filesystems are on ufs, I was just using ZFS for a test. Any ideas on what information might be useful to collect? I have the vmware machine right now. I've made a couple of VMWare snapshots of it, first before breaking into DDB with the deadlock just started, the second being into DDB (I've broken into DDB with sysctl). Also, a copy of the VMWare virtual machine with snapshots is avaiable on request. Your choice ;) Borja. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 8.0RC1, ZFS: deadlock
On Sep 29, 2009, at 10:29 AM, Borja Marcos wrote: Hello, I have observed a deadlock condition when using ZFS. We are making a heavy usage of zfs send/zfs receive to keep a replica of a dataset on a remote machine. It can be done at one minute intervals. Maybe we're doing a somehow atypical usage of ZFS, but, well, seems to be a great solution to keep filesystem replicas once this is sorted out. How to reproduce: Set up two systems. A dataset with heavy I/O activity is replicated from the first to the second one. I've used a dataset containing / usr/obj while I did a make buildworld. Replicate the dataset from the first machine to the second one using an incremental send zfs send -i pool/data...@nminus1 pool/data...@n | ssh destination zfs receive -d pool When there is read activity on the second system, reading the replicated system, I mean, having read access while zfs receive is updating it, there can be a deadlock. We have discovered this doing a test on a hopefully soon in production server, with 8 GB RAM. A Bacula backup agent was running and ZFS deadlocked. Sorry, forgot to explain what was happening on the second system (the one receiving the incremental snapshots) for the deadlock to happen. It was just running an endless loop, copying the contents of /usr/obj to another dataset, in order to keep the reading activity going on. That's how it has deadlocked. On the original test system an rsync did the same trick. Borja ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 8.0RC1, ZFS: deadlock
On Sep 29, 2009, at 10:29 AM, Borja Marcos wrote: I have observed a deadlock condition when using ZFS. We are making a heavy usage of zfs send/zfs receive to keep a replica of a dataset on a remote machine. It can be done at one minute intervals. Maybe we're doing a somehow atypical usage of ZFS, but, well, seems to be a great solution to keep filesystem replicas once this is sorted out. Not sure the backtraces screenshots will get through... First one is the backtrace for the zfs command. Second one, a tar process doing a cf - . on the dataset being replicated, sending to a pipe. Third one, the receiving tar process, doing an xf - on a second dataset. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
ZFS deadlock
Hello A box of mine running RELENG_7_0 and ZFS over a couple of disks (6 disks, 3 mirrors) seems to have gotten stuck. From Ctrl-T: load: 0.50 cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.43 cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.10 cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.10 cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.11 cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k Worked for a while then that stopped working too (was over ssh). When trying a local login i only got load: 0.09 cmd: login 1611 [zfs] 0.00u 0.00s 0% 208k I found one post like this earlier (by Xin LI), but nobody seemed to have replied... in my current conf, I think my kmem/kmem_max is at 512Mb (not sure though, since I've edited my file yesterday for next reboot), with 2G of system RAM.. Normally I'd run kmem(max) 1G (with arcsize of 512M. currently it is at default), but since I just got back to 2G total mem after some hardware problems I've been runnig at those lows (1G total is kindof tight with zfs..) Well, just wanted to report... The box is not totally dead yet, ie I can still do Ctrl-T on console, but thats it.. I don't really know what more I can do so.. I don't have KDB/DDB. I'll wait another hour or so before I hard reboot it, unless it unlocks or if anyone have any suggestions. Thanks -- Johan Ström Stromnet [EMAIL PROTECTED] http://www.stromnet.se/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ZFS deadlock
On Tue, Apr 08, 2008 at 08:17:38AM +0200, Johan Ström wrote: Hello A box of mine running RELENG_7_0 and ZFS over a couple of disks (6 disks, 3 mirrors) seems to have gotten stuck. From Ctrl-T: load: 0.50 cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.43 cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.10 cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.10 cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.11 cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k Worked for a while then that stopped working too (was over ssh). When trying a local login i only got load: 0.09 cmd: login 1611 [zfs] 0.00u 0.00s 0% 208k I found one post like this earlier (by Xin LI), but nobody seemed to have replied... in my current conf, I think my kmem/kmem_max is at 512Mb (not sure though, since I've edited my file yesterday for next reboot), with 2G of system RAM.. Normally I'd run kmem(max) 1G (with arcsize of 512M. currently it is at default), but since I just got back to 2G total mem after some hardware problems I've been runnig at those lows (1G total is kindof tight with zfs..) Well, just wanted to report... The box is not totally dead yet, ie I can still do Ctrl-T on console, but thats it.. I don't really know what more I can do so.. I don't have KDB/DDB. I'll wait another hour or so before I hard reboot it, unless it unlocks or if anyone have any suggestions. I don't think there are any suggestions left to give. Many people, including myself, have experienced this kind of problem. It's well- documented both on my Common Issues page, and the official FreeBSD ZFS Wiki. ZFS is still considered highly experimental, so if your data is at all important to you, perform backups or switch to another filesystem provider. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ZFS deadlock
Johan Ström wrote: Hello A box of mine running RELENG_7_0 and ZFS over a couple of disks (6 disks, 3 mirrors) seems to have gotten stuck. From Ctrl-T: load: 0.50 cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.43 cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.10 cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.10 cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.11 cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k Worked for a while then that stopped working too (was over ssh). When trying a local login i only got load: 0.09 cmd: login 1611 [zfs] 0.00u 0.00s 0% 208k I found one post like this earlier (by Xin LI), but nobody seemed to have replied... in my current conf, I think my kmem/kmem_max is at 512Mb (not sure though, since I've edited my file yesterday for next reboot), with 2G of system RAM.. Normally I'd run kmem(max) 1G (with arcsize of 512M. currently it is at default), but since I just got back to 2G total mem after some hardware problems I've been runnig at those lows (1G total is kindof tight with zfs..) Well, just wanted to report... The box is not totally dead yet, ie I can still do Ctrl-T on console, but thats it.. I don't really know what more I can do so.. I don't have KDB/DDB. I'll wait another hour or so before I hard reboot it, unless it unlocks or if anyone have any suggestions. The key is to increase your kmem and prevent it from being exhausted. I think more recent OpenSolaris's ZFS code has some improvements but I do not have spare devices at hand to test and debug :( Maybe pjd@ would get a new import at some point? I have cc'ed him. Cheers, -- Xin LI [EMAIL PROTECTED]http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: ZFS deadlock
On Apr 8, 2008, at 9:32 AM, Jeremy Chadwick wrote: On Tue, Apr 08, 2008 at 08:17:38AM +0200, Johan Ström wrote: Hello A box of mine running RELENG_7_0 and ZFS over a couple of disks (6 disks, 3 mirrors) seems to have gotten stuck. From Ctrl-T: load: 0.50 cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.43 cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.10 cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.10 cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.11 cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k Worked for a while then that stopped working too (was over ssh). When trying a local login i only got load: 0.09 cmd: login 1611 [zfs] 0.00u 0.00s 0% 208k I found one post like this earlier (by Xin LI), but nobody seemed to have replied... in my current conf, I think my kmem/kmem_max is at 512Mb (not sure though, since I've edited my file yesterday for next reboot), with 2G of system RAM.. Normally I'd run kmem(max) 1G (with arcsize of 512M. currently it is at default), but since I just got back to 2G total mem after some hardware problems I've been runnig at those lows (1G total is kindof tight with zfs..) Well, just wanted to report... The box is not totally dead yet, ie I can still do Ctrl-T on console, but thats it.. I don't really know what more I can do so.. I don't have KDB/DDB. I'll wait another hour or so before I hard reboot it, unless it unlocks or if anyone have any suggestions. I don't think there are any suggestions left to give. Many people, including myself, have experienced this kind of problem. It's well- documented both on my Common Issues page, and the official FreeBSD ZFS Wiki. Ah.. I guess I was just to restrictive with the googling on zfs:buf_hash_table.ht_locks[i].ht_lock. ZFS is still considered highly experimental, so if your data is at all important to you, perform backups or switch to another filesystem provider. That I am aware of. Thanks.___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ZFS deadlock
On Apr 8, 2008, at 9:37 AM, LI Xin wrote: Johan Ström wrote: Hello A box of mine running RELENG_7_0 and ZFS over a couple of disks (6 disks, 3 mirrors) seems to have gotten stuck. From Ctrl-T: load: 0.50 cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.43 cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.10 cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.10 cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k load: 0.11 cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k Worked for a while then that stopped working too (was over ssh). When trying a local login i only got load: 0.09 cmd: login 1611 [zfs] 0.00u 0.00s 0% 208k I found one post like this earlier (by Xin LI), but nobody seemed to have replied... in my current conf, I think my kmem/kmem_max is at 512Mb (not sure though, since I've edited my file yesterday for next reboot), with 2G of system RAM.. Normally I'd run kmem(max) 1G (with arcsize of 512M. currently it is at default), but since I just got back to 2G total mem after some hardware problems I've been runnig at those lows (1G total is kindof tight with zfs..) Well, just wanted to report... The box is not totally dead yet, ie I can still do Ctrl-T on console, but thats it.. I don't really know what more I can do so.. I don't have KDB/DDB. I'll wait another hour or so before I hard reboot it, unless it unlocks or if anyone have any suggestions. The key is to increase your kmem and prevent it from being exhausted. I think more recent OpenSolaris's ZFS code has some improvements but I do not have spare devices at hand to test and debug :( Yep, never had the problem when I was running with 2G total mem, but then one stick (damn consumer crap) failed and I was left with 1G, and I started to have random problems. Going to tune kmem back up now when I got more mem again, thinking about putting in 4G too.. Maybe pjd@ would get a new import at some point? I have cc'ed him. Cheers, -- Xin LI [EMAIL PROTECTED]http://www.delphij.net/ FreeBSD - The Power to Serve! ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ZFS deadlock
On Apr 8, 2008, at 9:40 AM, LI Xin wrote: For your question: just reboot would be fine, you may want to tune your arc size (to be smaller) and kmem space (to be larger), which would reduce the chance that this would happen, or eliminate it, depending on your workload. Back online now, with kmem/kmem_max to 1G and arcsize to 512M. Are those reasonable on a 2G machine? I think I've read that from somewhere, but cannot find that (arc at least) in the TuningGuide now. This situation is not recoverable and you can trust ZFS that you will not lose data if they are already sync'ed. Actually, I've had a lot of hard crashes lately on this machine (bad hw) but not a single time I have lost data (to my knowledge at least...). In that regard, comparing to UFS, ZFS is waaay better! :) -- Xin LI [EMAIL PROTECTED]http://www.delphij.net/ FreeBSD - The Power to Serve! ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ZFS deadlock
For your question: just reboot would be fine, you may want to tune your arc size (to be smaller) and kmem space (to be larger), which would reduce the chance that this would happen, or eliminate it, depending on your workload. This situation is not recoverable and you can trust ZFS that you will not lose data if they are already sync'ed. -- Xin LI [EMAIL PROTECTED]http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: ZFS deadlock
Johan Ström wrote: On Apr 8, 2008, at 9:40 AM, LI Xin wrote: For your question: just reboot would be fine, you may want to tune your arc size (to be smaller) and kmem space (to be larger), which would reduce the chance that this would happen, or eliminate it, depending on your workload. Back online now, with kmem/kmem_max to 1G and arcsize to 512M. Are those reasonable on a 2G machine? I think I've read that from somewhere, but cannot find that (arc at least) in the TuningGuide now. Depending on your work load you are just buying more time, so reasonable is a matter of perspective. :( I didn't see if you said you are on 32bit or 64bit? Keep in mind the kmem max is 1.5-2G on amd64 regardless of how much memory you have. If 512M arcsize crashes too soon for your tastes you can always lower it down to 256M, or 128M, etc. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ZFS deadlock
Spike Ilacqua wrote: Depending on your work load you are just buying more time, so reasonable is a matter of perspective. :( I didn't see if you said you are on 32bit or 64bit? Keep in mind the kmem max is 1.5-2G on amd64 regardless of how much memory you have. If 512M arcsize crashes too soon for your tastes you can always lower it down to 256M, or 128M, etc. I tried for several weeks to get ZFS stable on a 64bit system with a 1.5G kernel. The best uptime I ever got was 72 hours, the worst was 2, the average about 24. Interestingly, most of the hangs were at off hours, when the system was lightly loaded, had lots of free memory, etc. That suggests to me a slow leak of some sort. Anyway, ZFS is not ready for production. Some people may get lucky, but you can't count on it. Spike Very intresting. With 1.5G of kmem and a 64M arc_max the best uptime I had was 5 days, worst 1 day. Also most of my crashes are off hours as well. Another tidbit of information running things out of /tank instead of /tank/foo/bar/foo seems to lead to longer uptime, you might want to try that as well. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ZFS deadlock
Depending on your work load you are just buying more time, so reasonable is a matter of perspective. :( I didn't see if you said you are on 32bit or 64bit? Keep in mind the kmem max is 1.5-2G on amd64 regardless of how much memory you have. If 512M arcsize crashes too soon for your tastes you can always lower it down to 256M, or 128M, etc. I tried for several weeks to get ZFS stable on a 64bit system with a 1.5G kernel. The best uptime I ever got was 72 hours, the worst was 2, the average about 24. Interestingly, most of the hangs were at off hours, when the system was lightly loaded, had lots of free memory, etc. That suggests to me a slow leak of some sort. Anyway, ZFS is not ready for production. Some people may get lucky, but you can't count on it. Spike ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ZFS deadlock
¿So no chances of ZFS stable on FBSD7? I was actually considering debian over freebsd on a dual AMD64, but if there are no settings that will make it stable... Nevertheless I'd be willing to help debugging ZFS on that machine (Dell T105) as soon as I receive it in a couple of weeks, as I'm in no rush to getting it into production (just tell me what to do ;) ). - Mensaje original De: Spike Ilacqua [EMAIL PROTECTED] Para: Ender [EMAIL PROTECTED] CC: [EMAIL PROTECTED]; freebsd-stable@freebsd.org; Johan Ström [EMAIL PROTECTED] Enviado: martes, 8 de abril, 2008 18:13:32 Asunto: Re: ZFS deadlock Depending on your work load you are just buying more time, so reasonable is a matter of perspective. :( I didn't see if you said you are on 32bit or 64bit? Keep in mind the kmem max is 1.5-2G on amd64 regardless of how much memory you have. If 512M arcsize crashes too soon for your tastes you can always lower it down to 256M, or 128M, etc. I tried for several weeks to get ZFS stable on a 64bit system with a 1.5G kernel. The best uptime I ever got was 72 hours, the worst was 2, the average about 24. Interestingly, most of the hangs were at off hours, when the system was lightly loaded, had lots of free memory, etc. That suggests to me a slow leak of some sort. Anyway, ZFS is not ready for production. Some people may get lucky, but you can't count on it. Spike ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] __ ¿Con Mascota por primera vez? Sé un mejor Amigo. Entra en Yahoo! Respuestas http://es.answers.yahoo.com/info/welcome ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ZFS deadlock
It depends a lot on your workload I'd say. for me its pretty stable on a amd64 7-STABLE box that just does a little light mail and web and package building. for others not so much. info on my system below if anyones interested. Vince (20:12:28 /usr/home/jhary) 0 $ more /boot/loader.conf geom_mirror_load=YES vm.kmem_size=768M vm.kmem_size_max=768M snd_emu10k1_load=YES [EMAIL PROTECTED] (20:12:39 /usr/home/jhary) 0 $ uptime 8:12PM up 13 days, 19:16, 5 users, load averages: 1.21, 0.86, 0.44 [EMAIL PROTECTED] (20:12:50 /usr/home/jhary) 0 $ zfs list NAME USED AVAIL REFER MOUNTPOINT data 164G 64.8G18K /data data/usr 163G 64.8G 163G /usr data/var 306M 64.8G 306M /var [EMAIL PROTECTED] (20:13:00 /usr/home/jhary) 0 $ zpool status pool: data state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM dataONLINE 0 0 0 mirrorONLINE 0 0 0 ad6s2 ONLINE 0 0 0 ad4s2 ONLINE 0 0 0 errors: No known data errors relevent bits from dmesg: CPU: AMD Opteron(tm) Processor 242 (1594.18-MHz K8-class CPU) Origin = AuthenticAMD Id = 0xf5a Stepping = 10 Features=0x78bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2 AMD Features=0xe0500800SYSCALL,NX,MMX+,LM,3DNow!+,3DNow! usable memory = 3210489856 (3061 MB) avail memory = 3103461376 (2959 MB) ACPI APIC Table: A M I OEMAPIC FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 Jisakiel wrote: ¿So no chances of ZFS stable on FBSD7? I was actually considering debian over freebsd on a dual AMD64, but if there are no settings that will make it stable... Nevertheless I'd be willing to help debugging ZFS on that machine (Dell T105) as soon as I receive it in a couple of weeks, as I'm in no rush to getting it into production (just tell me what to do ;) ). - Mensaje original De: Spike Ilacqua [EMAIL PROTECTED] Para: Ender [EMAIL PROTECTED] CC: [EMAIL PROTECTED]; freebsd-stable@freebsd.org; Johan Ström [EMAIL PROTECTED] Enviado: martes, 8 de abril, 2008 18:13:32 Asunto: Re: ZFS deadlock Depending on your work load you are just buying more time, so reasonable is a matter of perspective. :( I didn't see if you said you are on 32bit or 64bit? Keep in mind the kmem max is 1.5-2G on amd64 regardless of how much memory you have. If 512M arcsize crashes too soon for your tastes you can always lower it down to 256M, or 128M, etc. I tried for several weeks to get ZFS stable on a 64bit system with a 1.5G kernel. The best uptime I ever got was 72 hours, the worst was 2, the average about 24. Interestingly, most of the hangs were at off hours, when the system was lightly loaded, had lots of free memory, etc. That suggests to me a slow leak of some sort. Anyway, ZFS is not ready for production. Some people may get lucky, but you can't count on it. Spike ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] __ ¿Con Mascota por primera vez? Sé un mejor Amigo. Entra en Yahoo! Respuestas http://es.answers.yahoo.com/info/welcome ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ZFS deadlock ?
Henri Hennebert wrote: Pawel Jakub Dawidek wrote: On Sat, Nov 10, 2007 at 12:39:27PM +0100, Henri Hennebert wrote: Pawel Jakub Dawidek wrote: On Fri, Nov 09, 2007 at 05:37:00PM +0100, Henri Hennebert wrote: hello To push zfs, I launch 2 scrub at the same time, after ~20 seconds the system freeze: [...] I found a deadlock too. If it's reproducable for you, can you try this patch: I reproduce it after 30 minutes, si I try you patch. http://people.freebsd.org/~pjd/patches/zgd_done.patch when I try to load zfs.ko I get: # kldload zfs link_elf: symbol kproc_create undefined kldload: can't load zfs: No such file or directory What must I add to my config to resolve this symbol / problem Ouch, you don't use HEAD. Try changing kproc_*() to kthread_*(). Today, after more than 10 scrubs, no deadlock. This patch is effective. Maybe this is not related, but when I copy a 600MB file from zfs to a ufs under gjournal, my system freeze completely. A break on the serial console don't go to debugging! Henri ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ZFS deadlock ?
Pawel Jakub Dawidek wrote: On Sat, Nov 10, 2007 at 12:39:27PM +0100, Henri Hennebert wrote: Pawel Jakub Dawidek wrote: On Fri, Nov 09, 2007 at 05:37:00PM +0100, Henri Hennebert wrote: hello To push zfs, I launch 2 scrub at the same time, after ~20 seconds the system freeze: [...] I found a deadlock too. If it's reproducable for you, can you try this patch: I reproduce it after 30 minutes, si I try you patch. http://people.freebsd.org/~pjd/patches/zgd_done.patch when I try to load zfs.ko I get: # kldload zfs link_elf: symbol kproc_create undefined kldload: can't load zfs: No such file or directory What must I add to my config to resolve this symbol / problem Ouch, you don't use HEAD. Try changing kproc_*() to kthread_*(). Today, after more than 10 scrubs, no deadlock. This patch is effective. Thanks Henri ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ZFS deadlock ?
On Fri, Nov 09, 2007 at 05:37:00PM +0100, Henri Hennebert wrote: hello To push zfs, I launch 2 scrub at the same time, after ~20 seconds the system freeze: [...] I found a deadlock too. If it's reproducable for you, can you try this patch: http://people.freebsd.org/~pjd/patches/zgd_done.patch -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpJHYGJwsy6i.pgp Description: PGP signature
Re: ZFS deadlock ?
Pawel Jakub Dawidek wrote: On Fri, Nov 09, 2007 at 05:37:00PM +0100, Henri Hennebert wrote: hello To push zfs, I launch 2 scrub at the same time, after ~20 seconds the system freeze: [...] I found a deadlock too. If it's reproducable for you, can you try this patch: I reproduce it after 30 minutes, si I try you patch. http://people.freebsd.org/~pjd/patches/zgd_done.patch when I try to load zfs.ko I get: # kldload zfs link_elf: symbol kproc_create undefined kldload: can't load zfs: No such file or directory What must I add to my config to resolve this symbol / problem Thanks Henri ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ZFS deadlock ?
On Sat, Nov 10, 2007 at 12:39:27PM +0100, Henri Hennebert wrote: Pawel Jakub Dawidek wrote: On Fri, Nov 09, 2007 at 05:37:00PM +0100, Henri Hennebert wrote: hello To push zfs, I launch 2 scrub at the same time, after ~20 seconds the system freeze: [...] I found a deadlock too. If it's reproducable for you, can you try this patch: I reproduce it after 30 minutes, si I try you patch. http://people.freebsd.org/~pjd/patches/zgd_done.patch when I try to load zfs.ko I get: # kldload zfs link_elf: symbol kproc_create undefined kldload: can't load zfs: No such file or directory What must I add to my config to resolve this symbol / problem Ouch, you don't use HEAD. Try changing kproc_*() to kthread_*(). -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgp0rgr3sIeHJ.pgp Description: PGP signature
Re: ZFS deadlock ?
Pawel Jakub Dawidek wrote: On Sat, Nov 10, 2007 at 12:39:27PM +0100, Henri Hennebert wrote: Pawel Jakub Dawidek wrote: On Fri, Nov 09, 2007 at 05:37:00PM +0100, Henri Hennebert wrote: hello To push zfs, I launch 2 scrub at the same time, after ~20 seconds the system freeze: [...] I found a deadlock too. If it's reproducable for you, can you try this patch: I reproduce it after 30 minutes, si I try you patch. http://people.freebsd.org/~pjd/patches/zgd_done.patch when I try to load zfs.ko I get: # kldload zfs link_elf: symbol kproc_create undefined kldload: can't load zfs: No such file or directory What must I add to my config to resolve this symbol / problem Ouch, you don't use HEAD. Try changing kproc_*() to kthread_*(). It load correctly now... Moreover, no deadlock after multiple scrub in // and some buildworld to make sure... look fine for my config :-) Just to give credit to zfs, scrub encounter 2 IO errors without impact on my data :) /var/log/messages: Nov 10 15:33:00 morzine kernel: ad6: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=299882429 Nov 10 15:33:06 morzine kernel: ad6: TIMEOUT - READ_DMA48 retrying (0 retries left) LBA=299882429 Nov 10 15:33:12 morzine kernel: ad6: FAILURE - READ_DMA48 timed out LBA=299882429 Nov 10 16:55:53 morzine kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly Nov 10 16:55:53 morzine kernel: ad6: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=299883325 Nov 10 16:56:06 morzine kernel: ad6: TIMEOUT - READ_DMA48 retrying (0 retries left) LBA=299883325 Nov 10 16:56:13 morzine kernel: ad6: FAILURE - READ_DMA48 timed out LBA=299883325 ZFS is realy great! I will run more test tomorrow... and keep you posted Thanks Henri ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
ZFS deadlock ?
hello To push zfs, I launch 2 scrub at the same time, after ~20 seconds the system freeze: zpool scrub pool0 zpool scrub pool2 My pools: zpool status pool: pool0 state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM pool0 ONLINE 0 0 0 mirrorONLINE 0 0 0 da0s2 ONLINE 0 0 0 da1s2 ONLINE 0 0 0 errors: No known data errors pool: pool1 state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM pool1 ONLINE 0 0 0 da0s3 ONLINE 0 0 0 da1s3 ONLINE 0 0 0 errors: No known data errors pool: pool2 state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM pool2 ONLINE 0 0 0 raidz1ONLINE 0 0 0 ad4s3 ONLINE 0 0 0 ad6s3 ONLINE 0 0 0 errors: No known data errors I'm running 7.0-BETA2 with patch http://people.freebsd.org/~pjd/patches/vm_kern.c.2.patch Root is on pool0 [EMAIL PROTECTED] ~]# df -h FilesystemSizeUsed Avail Capacity Mounted on pool0 34G 16M 34G 0%/ devfs 1.0K1.0K 0B 100%/dev /dev/mirror/gm0s1a496M220M236M48%/bootfs procfs4.0K4.0K 0B 100%/proc pool0/home 35G1.5G 34G 4%/home pool1 16G128K 16G 0%/pool1 pool1/qemu 24G 16G8.1G66%/pool1/qemu pool1/squid12G 39M 12G 0%/pool1/squid pool2 72G 0B 72G 0%/pool2 pool2/WorkBench64G 21G 43G33%/pool2/WorkBench pool2/backup 32G6.8G 25G21%/pool2/backup pool2/download 72G 0B 72G 0%/pool2/download pool2/morzine 85G 13G 72G16%/pool2/morzine pool2/qemu 16G 16G112M99%/pool2/qemu pool2/sys 73G1.2G 72G 2%/pool2/sys pool0/tmp 34G384K 34G 0%/tmp pool0/usr 40G5.7G 34G14%/usr pool0/var 34G116M 34G 0%/var pool0/var/spool39G5.2G 34G13%/var/spool devfs 1.0K1.0K 0B 100%/var/named/dev I can break on my serial console and her are some informations: db ps pid ppid pgrp uid state wmesg wchancmd 3425 3424 3425 0 RVs cron 3424 1161 1161 0 S ppwait 0xc5f3 cron 3423 589 3423 0 S+ zfs:vq- 0xc5c0b334 zpool 3419 0 0 0 SL vgeom:io 0xc8ba1308 [vdev:worker ad6s3] 3418 0 0 0 SL vgeom:io 0xda70f748 [vdev:worker ad4s3] 3417 0 0 0 SL zfs:(sp 0xc56da318 [spa_scrub_thread] 3415 0 0 0 SL vgeom:io 0xd90229c8 [vdev:worker da1s2] 3414 0 0 0 SL vgeom:io 0xc5892208 [vdev:worker da0s2] 3413 0 0 0 SL zfs:(sp 0xc56db318 [spa_scrub_thread] 3309 998 979 8 S nanslp 0xc0890924 sleep 3136 995 995 8 S select 0xc089b778 initial thread 2610 1490 2610 0 S+ select 0xc089b778 ssh 76040 1 76016 2001 S select 0xc089b778 initial thread 76038 76034 76016 2001 S (threaded) firefox-bin 100333 S ucond0xcb3fc080 firefox-bin 100327 S ucond0xc8ba1e80 firefox-bin 100326 S ucond0xc722abc0 firefox-bin 100323 S ucond0xc5b56680 firefox-bin 1002850xcb466580 firefox-bin 100156 S select 0xc089b778 firefox-bin 100441 S select 0xc089b778 initial thread 76034 76030 76016 2001 S wait 0xcca1e2a8 sh 76030 1 76016 2001 S wait 0xcca1f000 sh 29979 29976 2997970 Rs postgres 29978 29976 2997870 Rs postgres 29976 1 2997670 Ss select 0xc089b778 postgres 25774 1 2577425 Ss pause0xcf074858 sendmail 25770 1 25770 0 Ss select 0xc089b778 sendmail 589 587 589 0 S+ wait 0xcbe18d48 bash 587 1311 587 2001 Ss+ wait 0xca7e52a8 su 7234 1486 7234 0 S+ select 0xc089b778 ssh 58922 1 58922 0 Ss kqread 0xdb632200 cupsd 54279 1 5427953 Ss (threaded) named 100207 S select 0xc089b778 named 100206 S ucond0xc76cadc0 named 100205 S ucond0xd1c3c440 named 100204 S ucond0xd1c3dac0 named 00291
Re: ZFS deadlock ?
On Fri, Nov 09, 2007 at 05:37:00PM +0100, Henri Hennebert wrote: Henri, To push zfs, I launch 2 scrub at the same time, after ~20 seconds the system freeze: zpool scrub pool0 zpool scrub pool2 This won't start the scrubs at the same time, but after each other. And the second will only start if the first one not fails (exitcode == 0) -- Regards, Richard. /* Homo Sapiens non urinat in ventum */ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ZFS deadlock ?
Richard Arends wrote: On Fri, Nov 09, 2007 at 05:37:00PM +0100, Henri Hennebert wrote: Henri, To push zfs, I launch 2 scrub at the same time, after ~20 seconds the system freeze: zpool scrub pool0 zpool scrub pool2 This won't start the scrubs at the same time, but after each other. And the second will only start if the first one not fails (exitcode == 0) Not at all, the scrub is asynchronious, I'm sure of it Henri ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ZFS deadlock ?
On Fri, Nov 09, 2007 at 09:35:59PM +0100, Henri Hennebert wrote: This won't start the scrubs at the same time, but after each other. And the second will only start if the first one not fails (exitcode == 0) Not at all, the scrub is asynchronious, I'm sure of it Running 2 commands seperated by will not run at the same time. Scrub could be asynchronious, i don't know, but that has nothing to do with the way you are running it. See: echo sleep 1 time sleep 2 echo sleep 2 time sleep 2 and: ls -l /notfound echo yes -- Regards, Richard. /* Homo Sapiens non urinat in ventum */ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ZFS deadlock ?
Richard Arends wrote: On Fri, Nov 09, 2007 at 09:35:59PM +0100, Henri Hennebert wrote: This won't start the scrubs at the same time, but after each other. And the second will only start if the first one not fails (exitcode == 0) Not at all, the scrub is asynchronious, I'm sure of it Running 2 commands seperated by will not run at the same time. Scrub could be asynchronious, i don't know, but that has nothing to do with the way you are running it. See: echo sleep 1 time sleep 2 echo sleep 2 time sleep 2 and: ls -l /notfound echo yes Per the man page, zpool scrub *begin* a scrub witch go on in background, so two scrubs are running simustaneously on 2 different pools. Henri ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ZFS deadlock ?
On Fri, Nov 09, 2007 at 11:28:27PM +0100, Henri Hennebert wrote: Richard Arends wrote: On Fri, Nov 09, 2007 at 09:35:59PM +0100, Henri Hennebert wrote: This won't start the scrubs at the same time, but after each other. And the second will only start if the first one not fails (exitcode == 0) Not at all, the scrub is asynchronious, I'm sure of it Running 2 commands seperated by will not run at the same time. Scrub could be asynchronious, i don't know, but that has nothing to do with the way you are running it. See: echo sleep 1 time sleep 2 echo sleep 2 time sleep 2 and: ls -l /notfound echo yes Per the man page, zpool scrub *begin* a scrub witch go on in background, so two scrubs are running simustaneously on 2 different pools. Henri Henri is 100% correct. zpool scrub kicks off a scrub which occurs in the background. I'm not sure I like this behavior that much, but it's not like it's my call :) lothos# zpool list NAMESIZEUSED AVAILCAP HEALTH ALTROOT tank 1.81T368G 1.45T19% ONLINE - lothos# time sh -c zpool scrub tank echo Done\? Done? 0.000u 0.010s 0:04.35 0.2% 116+152k 14+0io 8pf+0w lothos# zpool status tank pool: tank state: ONLINE scrub: scrub in progress, 3.97% done, 0h40m to go config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 raidz1ONLINE 0 0 0 ad4 ONLINE 0 0 0 ad5 ONLINE 0 0 0 ad6 ONLINE 0 0 0 ad7 ONLINE 0 0 0 errors: No known data errors Erik ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ZFS deadlock ?
On Fri, Nov 09, 2007 at 11:28:27PM +0100, Henri Hennebert wrote: Henri, See: echo sleep 1 time sleep 2 echo sleep 2 time sleep 2 and: ls -l /notfound echo yes Per the man page, zpool scrub *begin* a scrub witch go on in background, so two scrubs are running simustaneously on 2 different pools. Okay, i see. I did not know scrub background. I stand corrected! :) -- Regards, Richard. /* Homo Sapiens non urinat in ventum */ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]