Re: ZFS deadlock

2019-11-30 Thread Eugene Grosbein
30.11.2019 15:48, Eugene Grosbein wrote:

> Hi!
> 
> I have RAIDZ1 with five GELI-encrypted SSDs da[2-6].eli (non-boot pool).
> 
> I've exported the pool, destroyed da2.eli then successfully imported pool 
> back in degraded state.
> Then I've mounted some file systems successfully but zfs mount for next one 
> hung on [tx->tx_sync_done_cv]
> for 4400 seconds and counting.
> 
> # procstat -kk -L 55464
>   PIDTID COMMTDNAME  KSTACK
> 55464 102422 zfs -   mi_switch+0xeb 
> sleepq_wait+0x2c _cv_wait+0x16e txg_wait_synced+0xa5 dmu_tx_assign+0x48 
> zfs_rmnode+0x122 zfs_freebsd_reclaim+0x4e VOP_RECLAIM_APV+0x80 vgonel+0x213 
> vrecycle+0x46 zfs_freebsd_inactive+0xd VOP_INACTIVE_APV+0x80 vinactive+0xf0 
> vputx+0x2c3 zfs_unlinked_drain+0x1b8 zfsvfs_setup+0x5e zfs_mount+0x5f5 
> vfs_domount+0x573
> 
> It looks like deadlock for me.
> 
> What can I do to resolve this? FreeBSD 11.3-STABLE/amd64 r354667.

"zfs mount" just completed successfully after trim -f /dev/da2 (3.5TB) finished 
in 76 minutes.


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


ZFS deadlock

2019-11-30 Thread Eugene Grosbein
Hi!

I have RAIDZ1 with five GELI-encrypted SSDs da[2-6].eli (non-boot pool).

I've exported the pool, destroyed da2.eli then successfully imported pool back 
in degraded state.
Then I've mounted some file systems successfully but zfs mount for next one 
hung on [tx->tx_sync_done_cv]
for 4400 seconds and counting.

# procstat -kk -L 55464
  PIDTID COMMTDNAME  KSTACK
55464 102422 zfs -   mi_switch+0xeb 
sleepq_wait+0x2c _cv_wait+0x16e txg_wait_synced+0xa5 dmu_tx_assign+0x48 
zfs_rmnode+0x122 zfs_freebsd_reclaim+0x4e VOP_RECLAIM_APV+0x80 vgonel+0x213 
vrecycle+0x46 zfs_freebsd_inactive+0xd VOP_INACTIVE_APV+0x80 vinactive+0xf0 
vputx+0x2c3 zfs_unlinked_drain+0x1b8 zfsvfs_setup+0x5e zfs_mount+0x5f5 
vfs_domount+0x573

It looks like deadlock for me.

What can I do to resolve this? FreeBSD 11.3-STABLE/amd64 r354667.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-18 Thread Henri Hennebert



On 11/18/2016 13:30, Andriy Gapon wrote:

On 14/11/2016 14:00, Henri Hennebert wrote:

On 11/14/2016 12:45, Andriy Gapon wrote:

Okay.  Luckily for us, it seems that 'm' is available in frame 5.  It also
happens to be the first field of 'struct faultstate'.  So, could you please go
to frame and print '*m' and '*(struct faultstate *)m' ?


(kgdb) fr 4
#4  0x8089d1c1 in vm_page_busy_sleep (m=0xf800df68cd40, wmesg=) at /usr/src/sys/vm/vm_page.c:753
753msleep(m, vm_page_lockptr(m), PVM | PDROP, wmesg, 0);
(kgdb) print *m
$1 = {plinks = {q = {tqe_next = 0xf800dc5d85b0, tqe_prev =
0xf800debf3bd0}, s = {ss = {sle_next = 0xf800dc5d85b0},
  pv = 0xf800debf3bd0}, memguard = {p = 18446735281313646000, v =
18446735281353604048}}, listq = {tqe_next = 0x0,
tqe_prev = 0xf800dc5d85c0}, object = 0xf800b62e9c60, pindex = 11,
phys_addr = 3389358080, md = {pv_list = {
  tqh_first = 0x0, tqh_last = 0xf800df68cd78}, pv_gen = 426, pat_mode =
6}, wire_count = 0, busy_lock = 6, hold_count = 0,
  flags = 0, aflags = 2 '\002', oflags = 0 '\0', queue = 0 '\0', psind = 0 '\0',
segind = 3 '\003', order = 13 '\r',
  pool = 0 '\0', act_count = 0 '\0', valid = 0 '\0', dirty = 0 '\0'}


If I interpret this correctly the page is in the 'exclusive busy' state.
Unfortunately, I can't tell much beyond that.
But I am confident that this is the root cause of the lock-up.


(kgdb) print *(struct faultstate *)m
$2 = {m = 0xf800dc5d85b0, object = 0xf800debf3bd0, pindex = 0, first_m =
0xf800dc5d85c0,
  first_object = 0xf800b62e9c60, first_pindex = 11, map = 0xca058000, entry
= 0x0, lookup_still_valid = -546779784,
  vp = 0x601aa}
(kgdb)


I was wrong on this one as 'm' is actually a pointer, so the above is not
correct.  Maybe 'info reg' in frame 5 would give a clue about the value of 'fs'.


(kgdb) fr 5
#5  0x8089dd4d in vm_page_sleep_if_busy (m=0xf800df68cd40, 
msg=0x809c51bc "vmpfw")

at /usr/src/sys/vm/vm_page.c:1086
1086vm_page_busy_sleep(m, msg);
(kgdb) info reg
rax0x0  0
rbx0xf800b62e9c78   -8793036514184
rcx0x0  0
rdx0x0  0
rsi0x0  0
rdi0x0  0
rbp0xfe0101836810   0xfe0101836810
rsp0xfe01018367e0   0xfe01018367e0
r8 0x0  0
r9 0x0  0
r100x0  0
r110x0  0
r120xf800b642aa00   -879303520
r130xf800df68cd40   -8792344834752
r140xf800b62e9c60   -8793036514208
r150x809c51bc   -2137239108
rip0x8089dd4d	0x8089dd4d 


eflags 0x0  0
cs 0x0  0
ss 0x0  0
ds 0x0  0
es 0x0  0
fs 0x0  0
gs 0x0  0

I don't know what to do from here.


I am not sure how to proceed from here.
The only thing I can think of is a lock order reversal between the vnode lock
and the page busying quasi-lock.  But examining the code I can not spot it.
Another possibility is a leak of a busy page, but that's hard to debug.

How hard is it to reproduce the problem?


After 7 days all seems normal only one copy of innd:

[root@avoriaz ~]# ps xa|grep inn
 1193  -  Is   0:01.40 /usr/local/news/bin/innd -r
13498  -  IN   0:00.01 /usr/local/news/bin/innfeed
 1194 v0- IW   0:00.00 /bin/sh /usr/local/news/bin/innwatch -i 60

I will try to stop and restart innd.

All continue to look good:

[root@avoriaz ~]# ps xa|grep inn
31673  -  Ss   0:00.02 /usr/local/news/bin/innd
31694  -  SN   0:00.01 /usr/local/news/bin/innfeed
31674  0  S0:00.01 /bin/sh /usr/local/news/bin/innwatch -i 60


I think to reproduce is just waiting it occurs by itself...

One thing here: The deadlock occurs at least 5 times since 10.0R. And 
always with the directory /usr/local/news/bin




Maybe Konstantin would have some ideas or suggestions.



Henri
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-18 Thread Andriy Gapon
On 14/11/2016 14:00, Henri Hennebert wrote:
> On 11/14/2016 12:45, Andriy Gapon wrote:
>> Okay.  Luckily for us, it seems that 'm' is available in frame 5.  It also
>> happens to be the first field of 'struct faultstate'.  So, could you please 
>> go
>> to frame and print '*m' and '*(struct faultstate *)m' ?
>>
> (kgdb) fr 4
> #4  0x8089d1c1 in vm_page_busy_sleep (m=0xf800df68cd40, 
> wmesg= optimized out>) at /usr/src/sys/vm/vm_page.c:753
> 753msleep(m, vm_page_lockptr(m), PVM | PDROP, wmesg, 0);
> (kgdb) print *m
> $1 = {plinks = {q = {tqe_next = 0xf800dc5d85b0, tqe_prev =
> 0xf800debf3bd0}, s = {ss = {sle_next = 0xf800dc5d85b0},
>   pv = 0xf800debf3bd0}, memguard = {p = 18446735281313646000, v =
> 18446735281353604048}}, listq = {tqe_next = 0x0,
> tqe_prev = 0xf800dc5d85c0}, object = 0xf800b62e9c60, pindex = 11,
> phys_addr = 3389358080, md = {pv_list = {
>   tqh_first = 0x0, tqh_last = 0xf800df68cd78}, pv_gen = 426, pat_mode 
> =
> 6}, wire_count = 0, busy_lock = 6, hold_count = 0,
>   flags = 0, aflags = 2 '\002', oflags = 0 '\0', queue = 0 '\0', psind = 0 
> '\0',
> segind = 3 '\003', order = 13 '\r',
>   pool = 0 '\0', act_count = 0 '\0', valid = 0 '\0', dirty = 0 '\0'}

If I interpret this correctly the page is in the 'exclusive busy' state.
Unfortunately, I can't tell much beyond that.
But I am confident that this is the root cause of the lock-up.

> (kgdb) print *(struct faultstate *)m
> $2 = {m = 0xf800dc5d85b0, object = 0xf800debf3bd0, pindex = 0, 
> first_m =
> 0xf800dc5d85c0,
>   first_object = 0xf800b62e9c60, first_pindex = 11, map = 0xca058000, 
> entry
> = 0x0, lookup_still_valid = -546779784,
>   vp = 0x601aa}
> (kgdb)

I was wrong on this one as 'm' is actually a pointer, so the above is not
correct.  Maybe 'info reg' in frame 5 would give a clue about the value of 'fs'.

I am not sure how to proceed from here.
The only thing I can think of is a lock order reversal between the vnode lock
and the page busying quasi-lock.  But examining the code I can not spot it.
Another possibility is a leak of a busy page, but that's hard to debug.

How hard is it to reproduce the problem?

Maybe Konstantin would have some ideas or suggestions.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-14 Thread Henri Hennebert



On 11/14/2016 12:45, Andriy Gapon wrote:

On 14/11/2016 11:35, Henri Hennebert wrote:



On 11/14/2016 10:07, Andriy Gapon wrote:

Hmm, I've just noticed another interesting thread:
Thread 668 (Thread 101245):
#0  sched_switch (td=0xf800b642aa00, newtd=0xf8000285f000, flags=) at /usr/src/sys/kern/sched_ule.c:1973
#1  0x80561ae2 in mi_switch (flags=, newtd=0x0) at
/usr/src/sys/kern/kern_synch.c:455
#2  0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at
/usr/src/sys/kern/subr_sleepqueue.c:646
#3  0x805614b1 in _sleep (ident=, lock=, priority=, wmesg=0x809c51bc
"vmpfw", sbt=0, pr=, flags=) at
/usr/src/sys/kern/kern_synch.c:229
#4  0x8089d1c1 in vm_page_busy_sleep (m=0xf800df68cd40, wmesg=) at /usr/src/sys/vm/vm_page.c:753
#5  0x8089dd4d in vm_page_sleep_if_busy (m=0xf800df68cd40,
msg=0x809c51bc "vmpfw") at /usr/src/sys/vm/vm_page.c:1086
#6  0x80886be9 in vm_fault_hold (map=, vaddr=, fault_type=4 '\004', fault_flags=0, m_hold=0x0) at
/usr/src/sys/vm/vm_fault.c:495
#7  0x80885448 in vm_fault (map=0xf80011d66000, vaddr=, fault_type=4 '\004', fault_flags=) at
/usr/src/sys/vm/vm_fault.c:273
#8  0x808d3c49 in trap_pfault (frame=0xfe0101836c00, usermode=1) at
/usr/src/sys/amd64/amd64/trap.c:741
#9  0x808d3386 in trap (frame=0xfe0101836c00) at
/usr/src/sys/amd64/amd64/trap.c:333
#10 0x808b7af1 in calltrap () at 
/usr/src/sys/amd64/amd64/exception.S:236


This tread is another program from the news system:
668 Thread 101245 (PID=49124: innfeed)  sched_switch (td=0xf800b642aa00,
newtd=0xf8000285f000, flags=) at
/usr/src/sys/kern/sched_ule.c:1973



I strongly suspect that this is thread that we were looking for.
I think that it has the vnode lock in the shared mode while trying to fault in a
page.



--clip--



Okay.  Luckily for us, it seems that 'm' is available in frame 5.  It also
happens to be the first field of 'struct faultstate'.  So, could you please go
to frame and print '*m' and '*(struct faultstate *)m' ?


(kgdb) fr 4
#4  0x8089d1c1 in vm_page_busy_sleep (m=0xf800df68cd40, 
wmesg=) at /usr/src/sys/vm/vm_page.c:753

753 msleep(m, vm_page_lockptr(m), PVM | PDROP, wmesg, 0);
(kgdb) print *m
$1 = {plinks = {q = {tqe_next = 0xf800dc5d85b0, tqe_prev = 
0xf800debf3bd0}, s = {ss = {sle_next = 0xf800dc5d85b0},
  pv = 0xf800debf3bd0}, memguard = {p = 18446735281313646000, v 
= 18446735281353604048}}, listq = {tqe_next = 0x0,
tqe_prev = 0xf800dc5d85c0}, object = 0xf800b62e9c60, pindex 
= 11, phys_addr = 3389358080, md = {pv_list = {
  tqh_first = 0x0, tqh_last = 0xf800df68cd78}, pv_gen = 426, 
pat_mode = 6}, wire_count = 0, busy_lock = 6, hold_count = 0,
  flags = 0, aflags = 2 '\002', oflags = 0 '\0', queue = 0 '\0', psind 
= 0 '\0', segind = 3 '\003', order = 13 '\r',

  pool = 0 '\0', act_count = 0 '\0', valid = 0 '\0', dirty = 0 '\0'}
(kgdb) print *(struct faultstate *)m
$2 = {m = 0xf800dc5d85b0, object = 0xf800debf3bd0, pindex = 0, 
first_m = 0xf800dc5d85c0,
  first_object = 0xf800b62e9c60, first_pindex = 11, map = 
0xca058000, entry = 0x0, lookup_still_valid = -546779784,

  vp = 0x601aa}
(kgdb)
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-14 Thread Andriy Gapon
On 14/11/2016 11:35, Henri Hennebert wrote:
> 
> 
> On 11/14/2016 10:07, Andriy Gapon wrote:
>> Hmm, I've just noticed another interesting thread:
>> Thread 668 (Thread 101245):
>> #0  sched_switch (td=0xf800b642aa00, newtd=0xf8000285f000, 
>> flags=> optimized out>) at /usr/src/sys/kern/sched_ule.c:1973
>> #1  0x80561ae2 in mi_switch (flags=, newtd=0x0) 
>> at
>> /usr/src/sys/kern/kern_synch.c:455
>> #2  0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at
>> /usr/src/sys/kern/subr_sleepqueue.c:646
>> #3  0x805614b1 in _sleep (ident=, lock=> optimized out>, priority=, wmesg=0x809c51bc
>> "vmpfw", sbt=0, pr=, flags=) at
>> /usr/src/sys/kern/kern_synch.c:229
>> #4  0x8089d1c1 in vm_page_busy_sleep (m=0xf800df68cd40, 
>> wmesg=> optimized out>) at /usr/src/sys/vm/vm_page.c:753
>> #5  0x8089dd4d in vm_page_sleep_if_busy (m=0xf800df68cd40,
>> msg=0x809c51bc "vmpfw") at /usr/src/sys/vm/vm_page.c:1086
>> #6  0x80886be9 in vm_fault_hold (map=, 
>> vaddr=> optimized out>, fault_type=4 '\004', fault_flags=0, m_hold=0x0) at
>> /usr/src/sys/vm/vm_fault.c:495
>> #7  0x80885448 in vm_fault (map=0xf80011d66000, vaddr=> optimized out>, fault_type=4 '\004', fault_flags=) at
>> /usr/src/sys/vm/vm_fault.c:273
>> #8  0x808d3c49 in trap_pfault (frame=0xfe0101836c00, usermode=1) 
>> at
>> /usr/src/sys/amd64/amd64/trap.c:741
>> #9  0x808d3386 in trap (frame=0xfe0101836c00) at
>> /usr/src/sys/amd64/amd64/trap.c:333
>> #10 0x808b7af1 in calltrap () at 
>> /usr/src/sys/amd64/amd64/exception.S:236
> 
> This tread is another program from the news system:
> 668 Thread 101245 (PID=49124: innfeed)  sched_switch (td=0xf800b642aa00,
> newtd=0xf8000285f000, flags=) at
> /usr/src/sys/kern/sched_ule.c:1973
> 
>>
>> I strongly suspect that this is thread that we were looking for.
>> I think that it has the vnode lock in the shared mode while trying to fault 
>> in a
>> page.
>>
>> Could you please check that by going to frame 6 and printing 'fs' and 
>> '*fs.vp'?
>> It'd be interesting to understand why this thread is waiting here.
>> So, please also print '*fs.m' and '*fs.object'.
> 
> No luck :-(
> (kgdb) fr 6
> #6  0x80886be9 in vm_fault_hold (map=, 
> vaddr= optimized out>, fault_type=4 '\004',
> fault_flags=0, m_hold=0x0) at /usr/src/sys/vm/vm_fault.c:495
> 495vm_page_sleep_if_busy(fs.m, "vmpfw");
> (kgdb) print fs
> Cannot access memory at address 0x1fa0
> (kgdb)

Okay.  Luckily for us, it seems that 'm' is available in frame 5.  It also
happens to be the first field of 'struct faultstate'.  So, could you please go
to frame and print '*m' and '*(struct faultstate *)m' ?

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-14 Thread Henri Hennebert



On 11/14/2016 10:07, Andriy Gapon wrote:

On 13/11/2016 15:28, Henri Hennebert wrote:

On 11/13/2016 11:06, Andriy Gapon wrote:

On 12/11/2016 14:40, Henri Hennebert wrote:



[snip]

Could you please show 'info local' in frame 14?
I expected that 'nd' variable would be defined there and it may contain some
useful information.


No luck there:

(kgdb) fr 14
#14 0x80636838 in kern_statat (td=0xf80009ba0500, 
flag=, fd=-100, path=0x0,
pathseg=, sbp=, 
hook=0x800e2a388) at /usr/src/sys/kern/vfs_syscalls.c:2160

2160if ((error = namei()) != 0)
(kgdb) info local
rights = 
nd = 
error = 
sb = 
(kgdb)



I also try to get information from the execve of the other treads:

for tid 101250:
(kgdb) fr 10
#10 0x80508ccc in sys_execve (td=0xf800b6429000,
uap=0xfe010184fb80) at /usr/src/sys/kern/kern_exec.c:218
218error = kern_execve(td, , NULL);
(kgdb) print *uap
$4 = {fname_l_ = 0xfe010184fb80 "`\220\217\002\b", fname = 0x8028f9060
,
  fname_r_ = 0xfe010184fb88 "`¶ÿÿÿ\177", argv_l_ = 0xfe010184fb88
"`¶ÿÿÿ\177", argv = 0x7fffb660,
  argv_r_ = 0xfe010184fb90 "\bÜÿÿÿ\177", envv_l_ = 0xfe010184fb90
"\bÜÿÿÿ\177", envv = 0x7fffdc08,
  envv_r_ = 0xfe010184fb98 ""}
(kgdb)

for tid 101243:

(kgdb) f 15
#15 0x80508ccc in sys_execve (td=0xf800b642b500,
uap=0xfe010182cb80) at /usr/src/sys/kern/kern_exec.c:218
218error = kern_execve(td, , NULL);
(kgdb) print *uap
$5 = {fname_l_ = 0xfe010182cb80 "ÀÏ\205\002\b", fname = 0x80285cfc0 ,
  fname_r_ = 0xfe010182cb88 "`¶ÿÿÿ\177", argv_l_ = 0xfe010182cb88
"`¶ÿÿÿ\177", argv = 0x7fffb660,
  argv_r_ = 0xfe010182cb90 "\bÜÿÿÿ\177", envv_l_ = 0xfe010182cb90
"\bÜÿÿÿ\177", envv = 0x7fffdc08,
  envv_r_ = 0xfe010182cb98 ""}
(kgdb)


I think that you see garbage in those structures because they contain pointers
to userland data.

Hmm, I've just noticed another interesting thread:
Thread 668 (Thread 101245):
#0  sched_switch (td=0xf800b642aa00, newtd=0xf8000285f000, flags=) at /usr/src/sys/kern/sched_ule.c:1973
#1  0x80561ae2 in mi_switch (flags=, newtd=0x0) at
/usr/src/sys/kern/kern_synch.c:455
#2  0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at
/usr/src/sys/kern/subr_sleepqueue.c:646
#3  0x805614b1 in _sleep (ident=, lock=, priority=, wmesg=0x809c51bc
"vmpfw", sbt=0, pr=, flags=) at
/usr/src/sys/kern/kern_synch.c:229
#4  0x8089d1c1 in vm_page_busy_sleep (m=0xf800df68cd40, wmesg=) at /usr/src/sys/vm/vm_page.c:753
#5  0x8089dd4d in vm_page_sleep_if_busy (m=0xf800df68cd40,
msg=0x809c51bc "vmpfw") at /usr/src/sys/vm/vm_page.c:1086
#6  0x80886be9 in vm_fault_hold (map=, vaddr=, fault_type=4 '\004', fault_flags=0, m_hold=0x0) at
/usr/src/sys/vm/vm_fault.c:495
#7  0x80885448 in vm_fault (map=0xf80011d66000, vaddr=, fault_type=4 '\004', fault_flags=) at
/usr/src/sys/vm/vm_fault.c:273
#8  0x808d3c49 in trap_pfault (frame=0xfe0101836c00, usermode=1) at
/usr/src/sys/amd64/amd64/trap.c:741
#9  0x808d3386 in trap (frame=0xfe0101836c00) at
/usr/src/sys/amd64/amd64/trap.c:333
#10 0x808b7af1 in calltrap () at 
/usr/src/sys/amd64/amd64/exception.S:236


This tread is another program from the news system:
668 Thread 101245 (PID=49124: innfeed)  sched_switch 
(td=0xf800b642aa00, newtd=0xf8000285f000, flags=out>) at /usr/src/sys/kern/sched_ule.c:1973




I strongly suspect that this is thread that we were looking for.
I think that it has the vnode lock in the shared mode while trying to fault in a
page.

Could you please check that by going to frame 6 and printing 'fs' and '*fs.vp'?
It'd be interesting to understand why this thread is waiting here.
So, please also print '*fs.m' and '*fs.object'.


No luck :-(
(kgdb) fr 6
#6  0x80886be9 in vm_fault_hold (map=, 
vaddr=, fault_type=4 '\004',

fault_flags=0, m_hold=0x0) at /usr/src/sys/vm/vm_fault.c:495
495 vm_page_sleep_if_busy(fs.m, 
"vmpfw");
(kgdb) print fs
Cannot access memory at address 0x1fa0
(kgdb)

Henri
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-14 Thread Andriy Gapon
On 13/11/2016 15:28, Henri Hennebert wrote:
> On 11/13/2016 11:06, Andriy Gapon wrote:
>> On 12/11/2016 14:40, Henri Hennebert wrote:
>>> I attatch it
>>
>> Thank you!
>> So, these two threads are trying to get the lock in the exclusive mode:
>> Thread 687 (Thread 101243):
>> #0  sched_switch (td=0xf800b642b500, newtd=0xf8000285ea00, 
>> flags=> optimized out>) at /usr/src/sys/kern/sched_ule.c:1973
>> #1  0x80561ae2 in mi_switch (flags=, newtd=0x0) 
>> at
>> /usr/src/sys/kern/kern_synch.c:455
>> #2  0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at
>> /usr/src/sys/kern/subr_sleepqueue.c:646
>> #3  0x8052f854 in sleeplk (lk=, flags=> optimized out>, ilk=, wmesg=0x813be535 "zfs",
>> pri=, timo=51) at /usr/src/sys/kern/kern_lock.c:222
>> #4  0x8052f39d in __lockmgr_args (lk=, 
>> flags=> optimized out>, ilk=, wmesg=,
>> pri=, timo=, file=> out>, line=) at /usr/src/sys/kern/kern_lock.c:958
>> #5  0x80616a8c in vop_stdlock (ap=) at 
>> lockmgr.h:98
>> #6  0x8093784d in VOP_LOCK1_APV (vop=, a=> optimized out>) at vnode_if.c:2087
>> #7  0x8063c5b3 in _vn_lock (vp=, flags=548864,
>> file=, line=) at vnode_if.h:859
>> #8  0x8062a5f7 in vget (vp=0xf80049c2c000, flags=548864,
>> td=0xf800b642b500) at /usr/src/sys/kern/vfs_subr.c:2523
>> #9  0x806118b9 in cache_lookup (dvp=, vpp=> optimized out>, cnp=, tsp=,
>> ticksp=) at /usr/src/sys/kern/vfs_cache.c:686
>> #10 0x806133dc in vfs_cache_lookup (ap=) at
>> /usr/src/sys/kern/vfs_cache.c:1081
>> #11 0x80935777 in VOP_LOOKUP_APV (vop=, a=> optimized out>) at vnode_if.c:127
>> #12 0x8061cdf1 in lookup (ndp=) at vnode_if.h:54
>> #13 0x8061c492 in namei (ndp=) at
>> /usr/src/sys/kern/vfs_lookup.c:306
>> #14 0x80509395 in kern_execve (td=, args=> optimized out>, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:443
>> #15 0x80508ccc in sys_execve (td=0xf800b642b500,
>> uap=0xfe010182cb80) at /usr/src/sys/kern/kern_exec.c:218
>> #16 0x808d449e in amd64_syscall (td=, traced=0) 
>> at
>> subr_syscall.c:135
>> #17 0x808b7ddb in Xfast_syscall () at
>> /usr/src/sys/amd64/amd64/exception.S:396
>>
>> Thread 681 (Thread 101147):
>> #0  sched_switch (td=0xf80065f4e500, newtd=0xf8000285f000, 
>> flags=> optimized out>) at /usr/src/sys/kern/sched_ule.c:1973
>> #1  0x80561ae2 in mi_switch (flags=, newtd=0x0) 
>> at
>> /usr/src/sys/kern/kern_synch.c:455
>> #2  0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at
>> /usr/src/sys/kern/subr_sleepqueue.c:646
>> #3  0x8052f854 in sleeplk (lk=, flags=> optimized out>, ilk=, wmesg=0x813be535 "zfs",
>> pri=, timo=51) at /usr/src/sys/kern/kern_lock.c:222
>> #4  0x8052f39d in __lockmgr_args (lk=, 
>> flags=> optimized out>, ilk=, wmesg=,
>> pri=, timo=, file=> out>, line=) at /usr/src/sys/kern/kern_lock.c:958
>> #5  0x80616a8c in vop_stdlock (ap=) at 
>> lockmgr.h:98
>> #6  0x8093784d in VOP_LOCK1_APV (vop=, a=> optimized out>) at vnode_if.c:2087
>> #7  0x8063c5b3 in _vn_lock (vp=, flags=548864,
>> file=, line=) at vnode_if.h:859
>> #8  0x8062a5f7 in vget (vp=0xf80049c2c000, flags=548864,
>> td=0xf80065f4e500) at /usr/src/sys/kern/vfs_subr.c:2523
>> #9  0x806118b9 in cache_lookup (dvp=, vpp=> optimized out>, cnp=, tsp=,
>> ticksp=) at /usr/src/sys/kern/vfs_cache.c:686
>> #10 0x806133dc in vfs_cache_lookup (ap=) at
>> /usr/src/sys/kern/vfs_cache.c:1081
>> #11 0x80935777 in VOP_LOOKUP_APV (vop=, a=> optimized out>) at vnode_if.c:127
>> #12 0x8061cdf1 in lookup (ndp=) at vnode_if.h:54
>> #13 0x8061c492 in namei (ndp=) at
>> /usr/src/sys/kern/vfs_lookup.c:306
>> #14 0x80509395 in kern_execve (td=, args=> optimized out>, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:443
>> #15 0x80508ccc in sys_execve (td=0xf80065f4e500,
>> uap=0xfe01016b8b80) at /usr/src/sys/kern/kern_exec.c:218
>> #16 0x808d449e in amd64_syscall (td=, traced=0) 
>> at
>> subr_syscall.c:135
>> #17 0x808b7ddb in Xfast_syscall () at
>> /usr/src/sys/amd64/amd64/exception.S:396
> 
> This 2 threads are innd processes. In core.txt.4:
> 
>8 14789 29165   0   24  4   40040   6612 zfs  DN- 0:00.00 [innd]
>8 29165 1   0   20  0   42496   6888 select   Ds- 0:01.33 [innd]
>8 49778 29165   0   24  4   40040   6900 zfs  DN- 0:00.00 [innd]
>8 82034 29165   0   24  4 132  0 zfs  DN- 0:00.00 [innd]
> 
> the corresponding info treads are:
> 
>   687 Thread 101243 (PID=49778: innd)  sched_switch (td=0xf800b642b500,
> newtd=0xf8000285ea00, flags=) at
> /usr/src/sys/kern/sched_ule.c:1973
>   681 Thread 101147 (PID=14789: innd)  sched_switch (td=0xf80065f4e500,
> newtd=0xf8000285f000, flags=) at
> /usr/src/sys/kern/sched_ule.c:1973
>   669 Thread 101250 (PID=82034: innd)  sched_switch (td=0xf800b6429000,
> 

Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-13 Thread Henri Hennebert

On 11/13/2016 14:28, Henri Hennebert wrote:

This 2 threads are innd processes. In core.txt.4:

   8 14789 29165   0   24  4   40040   6612 zfs  DN- 0:00.00 [innd]
   8 29165 1   0   20  0   42496   6888 select   Ds- 0:01.33 [innd]
   8 49778 29165   0   24  4   40040   6900 zfs  DN- 0:00.00 [innd]
   8 82034 29165   0   24  4 132  0 zfs  DN- 0:00.00 [innd]

the corresponding info treads are:

  687 Thread 101243 (PID=49778: innd)  sched_switch
(td=0xf800b642b500, newtd=0xf8000285ea00, flags=) at /usr/src/sys/kern/sched_ule.c:1973
  681 Thread 101147 (PID=14789: innd)  sched_switch
(td=0xf80065f4e500, newtd=0xf8000285f000, flags=) at /usr/src/sys/kern/sched_ule.c:1973
  669 Thread 101250 (PID=82034: innd)  sched_switch
(td=0xf800b6429000, newtd=0xf8000285ea00, flags=) at /usr/src/sys/kern/sched_ule.c:1973
  665 Thread 101262 (PID=29165: innd)  sched_switch
(td=0xf800b6b54a00, newtd=0xf8000285ea00, flags=) at /usr/src/sys/kern/sched_ule.c:1973


In case it may help, I have a look at innd. This processes use 2 execv:

one to execute /bin/sh and the other to execute itself:

/*
**  Re-exec ourselves.
*/
static const char *
CCxexec(char *av[])
{
char*innd;
char*p;
int i;

if (CCargv == NULL)
return "1 no argv!";

innd = concatpath(innconf->pathbin, "innd");
/* Get the pathname. */
p = av[0];
if (*p == '\0' || strcmp(p, "innd") == 0)
CCargv[0] = innd;
else
return "1 Bad value";

#ifdef DO_PERL
PLmode(Mode, OMshutdown, av[0]);
#endif
#ifdef DO_PYTHON
PYmode(Mode, OMshutdown, av[0]);
#endif
JustCleanup();
syslog(L_NOTICE, "%s execv %s", LogName, CCargv[0]);

/* Close all fds to protect possible fd leaking accross successive 
innds. */

for (i=3; i<30; i++)
close(i);

execv(CCargv[0], CCargv);
syslog(L_FATAL, "%s cant execv %s %m", LogName, CCargv[0]);
_exit(1);
/* NOTREACHED */
return "1 Exit failed";
}

The culprit may be /usr/local/news/bin/innd,

remember that find is locked in /usr/local/news/bin

Henri
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-13 Thread Henri Hennebert

On 11/13/2016 11:06, Andriy Gapon wrote:

On 12/11/2016 14:40, Henri Hennebert wrote:

I attatch it


Thank you!
So, these two threads are trying to get the lock in the exclusive mode:
Thread 687 (Thread 101243):
#0  sched_switch (td=0xf800b642b500, newtd=0xf8000285ea00, flags=) at /usr/src/sys/kern/sched_ule.c:1973
#1  0x80561ae2 in mi_switch (flags=, newtd=0x0) at
/usr/src/sys/kern/kern_synch.c:455
#2  0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at
/usr/src/sys/kern/subr_sleepqueue.c:646
#3  0x8052f854 in sleeplk (lk=, flags=, ilk=, wmesg=0x813be535 "zfs",
pri=, timo=51) at /usr/src/sys/kern/kern_lock.c:222
#4  0x8052f39d in __lockmgr_args (lk=, flags=, ilk=, wmesg=,
pri=, timo=, file=, line=) at /usr/src/sys/kern/kern_lock.c:958
#5  0x80616a8c in vop_stdlock (ap=) at lockmgr.h:98
#6  0x8093784d in VOP_LOCK1_APV (vop=, a=) at vnode_if.c:2087
#7  0x8063c5b3 in _vn_lock (vp=, flags=548864,
file=, line=) at vnode_if.h:859
#8  0x8062a5f7 in vget (vp=0xf80049c2c000, flags=548864,
td=0xf800b642b500) at /usr/src/sys/kern/vfs_subr.c:2523
#9  0x806118b9 in cache_lookup (dvp=, vpp=, cnp=, tsp=,
ticksp=) at /usr/src/sys/kern/vfs_cache.c:686
#10 0x806133dc in vfs_cache_lookup (ap=) at
/usr/src/sys/kern/vfs_cache.c:1081
#11 0x80935777 in VOP_LOOKUP_APV (vop=, a=) at vnode_if.c:127
#12 0x8061cdf1 in lookup (ndp=) at vnode_if.h:54
#13 0x8061c492 in namei (ndp=) at
/usr/src/sys/kern/vfs_lookup.c:306
#14 0x80509395 in kern_execve (td=, args=, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:443
#15 0x80508ccc in sys_execve (td=0xf800b642b500,
uap=0xfe010182cb80) at /usr/src/sys/kern/kern_exec.c:218
#16 0x808d449e in amd64_syscall (td=, traced=0) at
subr_syscall.c:135
#17 0x808b7ddb in Xfast_syscall () at
/usr/src/sys/amd64/amd64/exception.S:396

Thread 681 (Thread 101147):
#0  sched_switch (td=0xf80065f4e500, newtd=0xf8000285f000, flags=) at /usr/src/sys/kern/sched_ule.c:1973
#1  0x80561ae2 in mi_switch (flags=, newtd=0x0) at
/usr/src/sys/kern/kern_synch.c:455
#2  0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at
/usr/src/sys/kern/subr_sleepqueue.c:646
#3  0x8052f854 in sleeplk (lk=, flags=, ilk=, wmesg=0x813be535 "zfs",
pri=, timo=51) at /usr/src/sys/kern/kern_lock.c:222
#4  0x8052f39d in __lockmgr_args (lk=, flags=, ilk=, wmesg=,
pri=, timo=, file=, line=) at /usr/src/sys/kern/kern_lock.c:958
#5  0x80616a8c in vop_stdlock (ap=) at lockmgr.h:98
#6  0x8093784d in VOP_LOCK1_APV (vop=, a=) at vnode_if.c:2087
#7  0x8063c5b3 in _vn_lock (vp=, flags=548864,
file=, line=) at vnode_if.h:859
#8  0x8062a5f7 in vget (vp=0xf80049c2c000, flags=548864,
td=0xf80065f4e500) at /usr/src/sys/kern/vfs_subr.c:2523
#9  0x806118b9 in cache_lookup (dvp=, vpp=, cnp=, tsp=,
ticksp=) at /usr/src/sys/kern/vfs_cache.c:686
#10 0x806133dc in vfs_cache_lookup (ap=) at
/usr/src/sys/kern/vfs_cache.c:1081
#11 0x80935777 in VOP_LOOKUP_APV (vop=, a=) at vnode_if.c:127
#12 0x8061cdf1 in lookup (ndp=) at vnode_if.h:54
#13 0x8061c492 in namei (ndp=) at
/usr/src/sys/kern/vfs_lookup.c:306
#14 0x80509395 in kern_execve (td=, args=, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:443
#15 0x80508ccc in sys_execve (td=0xf80065f4e500,
uap=0xfe01016b8b80) at /usr/src/sys/kern/kern_exec.c:218
#16 0x808d449e in amd64_syscall (td=, traced=0) at
subr_syscall.c:135
#17 0x808b7ddb in Xfast_syscall () at
/usr/src/sys/amd64/amd64/exception.S:396


This 2 threads are innd processes. In core.txt.4:

   8 14789 29165   0   24  4   40040   6612 zfs  DN- 
0:00.00 [innd]
   8 29165 1   0   20  0   42496   6888 select   Ds- 
0:01.33 [innd]
   8 49778 29165   0   24  4   40040   6900 zfs  DN- 
0:00.00 [innd]
   8 82034 29165   0   24  4 132  0 zfs  DN- 
0:00.00 [innd]


the corresponding info treads are:

  687 Thread 101243 (PID=49778: innd)  sched_switch 
(td=0xf800b642b500, newtd=0xf8000285ea00, flags=out>) at /usr/src/sys/kern/sched_ule.c:1973
  681 Thread 101147 (PID=14789: innd)  sched_switch 
(td=0xf80065f4e500, newtd=0xf8000285f000, flags=out>) at /usr/src/sys/kern/sched_ule.c:1973
  669 Thread 101250 (PID=82034: innd)  sched_switch 
(td=0xf800b6429000, newtd=0xf8000285ea00, flags=out>) at /usr/src/sys/kern/sched_ule.c:1973
  665 Thread 101262 (PID=29165: innd)  sched_switch 
(td=0xf800b6b54a00, newtd=0xf8000285ea00, flags=out>) at /usr/src/sys/kern/sched_ule.c:1973


So your missing tread must be 101250:

(kgdb) tid 101250
[Switching to thread 669 (Thread 101250)]#0  sched_switch 
(td=0xf800b6429000, newtd=0xf8000285ea00,

flags=) at /usr/src/sys/kern/sched_ule.c:1973
1973cpuid = PCPU_GET(cpuid);
Current language:  auto; currently minimal
(kgdb) bt

Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-13 Thread Andriy Gapon
On 12/11/2016 14:40, Henri Hennebert wrote:
> I attatch it

Thank you!
So, these two threads are trying to get the lock in the exclusive mode:
Thread 687 (Thread 101243):
#0  sched_switch (td=0xf800b642b500, newtd=0xf8000285ea00, flags=) at /usr/src/sys/kern/sched_ule.c:1973
#1  0x80561ae2 in mi_switch (flags=, newtd=0x0) at
/usr/src/sys/kern/kern_synch.c:455
#2  0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at
/usr/src/sys/kern/subr_sleepqueue.c:646
#3  0x8052f854 in sleeplk (lk=, flags=, ilk=, wmesg=0x813be535 "zfs",
pri=, timo=51) at /usr/src/sys/kern/kern_lock.c:222
#4  0x8052f39d in __lockmgr_args (lk=, flags=, ilk=, wmesg=,
pri=, timo=, file=, line=) at /usr/src/sys/kern/kern_lock.c:958
#5  0x80616a8c in vop_stdlock (ap=) at lockmgr.h:98
#6  0x8093784d in VOP_LOCK1_APV (vop=, a=) at vnode_if.c:2087
#7  0x8063c5b3 in _vn_lock (vp=, flags=548864,
file=, line=) at vnode_if.h:859
#8  0x8062a5f7 in vget (vp=0xf80049c2c000, flags=548864,
td=0xf800b642b500) at /usr/src/sys/kern/vfs_subr.c:2523
#9  0x806118b9 in cache_lookup (dvp=, vpp=, cnp=, tsp=,
ticksp=) at /usr/src/sys/kern/vfs_cache.c:686
#10 0x806133dc in vfs_cache_lookup (ap=) at
/usr/src/sys/kern/vfs_cache.c:1081
#11 0x80935777 in VOP_LOOKUP_APV (vop=, a=) at vnode_if.c:127
#12 0x8061cdf1 in lookup (ndp=) at vnode_if.h:54
#13 0x8061c492 in namei (ndp=) at
/usr/src/sys/kern/vfs_lookup.c:306
#14 0x80509395 in kern_execve (td=, args=, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:443
#15 0x80508ccc in sys_execve (td=0xf800b642b500,
uap=0xfe010182cb80) at /usr/src/sys/kern/kern_exec.c:218
#16 0x808d449e in amd64_syscall (td=, traced=0) at
subr_syscall.c:135
#17 0x808b7ddb in Xfast_syscall () at
/usr/src/sys/amd64/amd64/exception.S:396

Thread 681 (Thread 101147):
#0  sched_switch (td=0xf80065f4e500, newtd=0xf8000285f000, flags=) at /usr/src/sys/kern/sched_ule.c:1973
#1  0x80561ae2 in mi_switch (flags=, newtd=0x0) at
/usr/src/sys/kern/kern_synch.c:455
#2  0x805ae8da in sleepq_wait (wchan=0x0, pri=0) at
/usr/src/sys/kern/subr_sleepqueue.c:646
#3  0x8052f854 in sleeplk (lk=, flags=, ilk=, wmesg=0x813be535 "zfs",
pri=, timo=51) at /usr/src/sys/kern/kern_lock.c:222
#4  0x8052f39d in __lockmgr_args (lk=, flags=, ilk=, wmesg=,
pri=, timo=, file=, line=) at /usr/src/sys/kern/kern_lock.c:958
#5  0x80616a8c in vop_stdlock (ap=) at lockmgr.h:98
#6  0x8093784d in VOP_LOCK1_APV (vop=, a=) at vnode_if.c:2087
#7  0x8063c5b3 in _vn_lock (vp=, flags=548864,
file=, line=) at vnode_if.h:859
#8  0x8062a5f7 in vget (vp=0xf80049c2c000, flags=548864,
td=0xf80065f4e500) at /usr/src/sys/kern/vfs_subr.c:2523
#9  0x806118b9 in cache_lookup (dvp=, vpp=, cnp=, tsp=,
ticksp=) at /usr/src/sys/kern/vfs_cache.c:686
#10 0x806133dc in vfs_cache_lookup (ap=) at
/usr/src/sys/kern/vfs_cache.c:1081
#11 0x80935777 in VOP_LOOKUP_APV (vop=, a=) at vnode_if.c:127
#12 0x8061cdf1 in lookup (ndp=) at vnode_if.h:54
#13 0x8061c492 in namei (ndp=) at
/usr/src/sys/kern/vfs_lookup.c:306
#14 0x80509395 in kern_execve (td=, args=, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:443
#15 0x80508ccc in sys_execve (td=0xf80065f4e500,
uap=0xfe01016b8b80) at /usr/src/sys/kern/kern_exec.c:218
#16 0x808d449e in amd64_syscall (td=, traced=0) at
subr_syscall.c:135
#17 0x808b7ddb in Xfast_syscall () at
/usr/src/sys/amd64/amd64/exception.S:396

And the original stuck thread wants to get the lock in the shared mode.
And there should be another thread that already holds the lock in the shared
mode.  But I am not able to identify it.  I wonder if the original thread could
be trying to get the lock recursively...

It would be interesting to get more details from thread 101112.
You can switch to it using tid command, you can use 'fr' to select frames, 'info
local' and 'info args' to see what variables are available (not optimized out)
and the you can print any that look interesting.  It would be nice to get a file
path and a directory vnode where the lookup is called.

Thank you.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-12 Thread Andriy Gapon
On 11/11/2016 16:50, Henri Hennebert wrote:
> 
> 
> On 11/11/2016 12:24, Andriy Gapon wrote:
>>
>> At this stage I would try to get a system crash dump for post-mortem 
>> analysis.
>> There are a few way to do that.  You can enter ddb and then run 'dump' and
>> 'reset' commands.  Or you can just do `sysctl debug.kdb.panic=1`.
>> In either case, please double-check that your system has a dump device
>> configured.
>>
> It take some time to upload the dump...
> 
> You can find it at
> 
> http://tignes.restart.be/Xfer/

Could you please open the dump in kgdb and execute the following commands?

set logging on
set logging redirect on
set pagination off
thread apply all bt
quit

After that you should get gdb.txt file in the current directory.
I would like to see it.

Thank you.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-11 Thread Henri Hennebert



On 11/11/2016 12:24, Andriy Gapon wrote:


At this stage I would try to get a system crash dump for post-mortem analysis.
There are a few way to do that.  You can enter ddb and then run 'dump' and
'reset' commands.  Or you can just do `sysctl debug.kdb.panic=1`.
In either case, please double-check that your system has a dump device 
configured.


It take some time to upload the dump...

You can find it at

http://tignes.restart.be/Xfer/

Henri
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-11 Thread Andriy Gapon
On 10/11/2016 21:41, Henri Hennebert wrote:
> On 11/10/2016 19:40, Andriy Gapon wrote:
>> On 10/11/2016 19:55, Henri Hennebert wrote:
>>>
>>>
>>> On 11/10/2016 18:33, Andriy Gapon wrote:
 On 10/11/2016 18:12, Henri Hennebert wrote:
> On 11/10/2016 16:54, Andriy Gapon wrote:
>> On 10/11/2016 17:20, Henri Hennebert wrote:
>>> On 11/10/2016 15:00, Andriy Gapon wrote:
 Interesting.  I can not spot any suspicious thread that would hold the
 vnode
 lock.  Could you please run kgdb (just like that, no arguments), then
 execute
 'bt' command and then select a frame when _vn_lock is called with 'fr 
 N'
 command.  Then please 'print *vp' and share the result.

>>> I Think I miss something in your request:
>>
>> Oh, sorry!  The very first step should be 'tid 101112' to switch to the
>> correct
>> context.
>>
>
> (kgdb) fr 7
> #7  0x8063c5b3 in _vn_lock (vp=, 
> flags=2121728,

 "value optimized out" - not good

> file=,
> line=) at vnode_if.h:859
> 859vnode_if.h: No such file or directory.
> in vnode_if.h
> (kgdb) print *vp

 I am not sure if this output is valid, because of the message above.
 Could you please try to navigate to nearby frames and see if vp itself has 
 a
 valid value there.  If you can find such a frame please do *vp  there.

>>>
>>> Does this seems better?
>>
>> Yes!
>>
>>> (kgdb) fr 8
>>> #8  0x8062a5f7 in vget (vp=0xf80049c2c000, flags=2121728,
>>> td=0xf80009ba0500) at /usr/src/sys/kern/vfs_subr.c:2523
>>> 2523if ((error = vn_lock(vp, flags)) != 0) {
>>> (kgdb) print *vp
>>> $1 = {v_tag = 0x813be535 "zfs", v_op = 0x813d0f70, v_data =
>>> 0xf80049c1f420, v_mount = 0xf800093aa660,
>>>   v_nmntvnodes = {tqe_next = 0xf80049c2c938, tqe_prev = 
>>> 0xf80049c2bb30},
>>> v_un = {vu_mount = 0x0, vu_socket = 0x0,
>>> vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, le_prev 
>>> =
>>> 0x0}, v_cache_src = {lh_first = 0x0}, v_cache_dst = {
>>> tqh_first = 0xf800bfc8e3f0, tqh_last = 0xf800bfc8e410}, 
>>> v_cache_dd =
>>> 0x0, v_lock = {lock_object = {
>>>   lo_name = 0x813be535 "zfs", lo_flags = 117112832, lo_data = 0,
>>> lo_witness = 0x0}, lk_lock = 23, lk_exslpfail = 0,
>>> lk_timo = 51, lk_pri = 96}, v_interlock = {lock_object = {lo_name =
>>> 0x8099e9e0 "vnode interlock", lo_flags = 16973824,
>>>   lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, v_vnlock =
>>> 0xf80049c2c068, v_actfreelist = {
>>> tqe_next = 0xf80049c2c938, tqe_prev = 0xf80049ae9bd0}, v_bufobj 
>>> =
>>> {bo_lock = {lock_object = {
>>> lo_name = 0x8099e9f0 "bufobj interlock", lo_flags = 
>>> 86179840,
>>> lo_data = 0, lo_witness = 0x0}, rw_lock = 1},
>>> bo_ops = 0x80c4bf70, bo_object = 0xf800b62e9c60, 
>>> bo_synclist =
>>> {le_next = 0x0, le_prev = 0x0},
>>> bo_private = 0xf80049c2c000, __bo_vnode = 0xf80049c2c000, 
>>> bo_clean =
>>> {bv_hd = {tqh_first = 0x0,
>>> tqh_last = 0xf80049c2c120}, bv_root = {pt_root = 0}, bv_cnt = 
>>> 0},
>>> bo_dirty = {bv_hd = {tqh_first = 0x0,
>>> tqh_last = 0xf80049c2c140}, bv_root = {pt_root = 0}, bv_cnt = 
>>> 0},
>>> bo_numoutput = 0, bo_flag = 0, bo_bsize = 131072},
>>>   v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters =
>>> {tqh_first = 0x0, tqh_last = 0xf80049c2c188},
>>> rl_currdep = 0x0}, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0,
>>> v_holdcnt = 9, v_usecount = 6, v_iflag = 512,
>>>   v_vflag = 32, v_writecount = 0, v_hash = 4833984, v_type = VREG}
>>> (kgdb)
>>
>> flags=2121728 = 0x206000 = LK_SHARED | LK_VNHELD | LK_NODDLKTREAT
>> lk_lock = 23 = 0x17 = LK_ONE_SHARER | LK_EXCLUSIVE_WAITERS | 
>> LK_SHARED_WAITERS |
>> LK_SHARE
>>
>> So, here's what we have here: this thread tries to get a shared lock on the
>> vnode, the vnode is already locked in shared mode, but there is an exclusive
>> waiter (or, perhaps, multiple waiters).  So, this thread can not get the lock
>> because of the exclusive waiter.  And I do not see an easy way to identify 
>> that
>> waiter.
>>
>> In the procstat output that you provided earlier there was no other thread in
>> vn_lock.  Hmm, I see this:
>> procstat: sysctl: kern.proc.kstack: 14789: Device busy
>> procstat: sysctl: kern.proc.kstack: 82034: Device busy
>>
>> Could you please check what those two processes are (if they are still 
>> running)?
>> Perhaps try procstat for each of the pids several times.
>>

At this stage I would try to get a system crash dump for post-mortem analysis.
There are a few way to do that.  You can enter ddb and then run 'dump' and
'reset' commands.  Or you can just do `sysctl debug.kdb.panic=1`.
In either case, please double-check that your system has a dump device 
configured.

> This 2 processes are the 2 

Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-10 Thread Henri Hennebert

On 11/10/2016 19:40, Andriy Gapon wrote:

On 10/11/2016 19:55, Henri Hennebert wrote:



On 11/10/2016 18:33, Andriy Gapon wrote:

On 10/11/2016 18:12, Henri Hennebert wrote:

On 11/10/2016 16:54, Andriy Gapon wrote:

On 10/11/2016 17:20, Henri Hennebert wrote:

On 11/10/2016 15:00, Andriy Gapon wrote:

Interesting.  I can not spot any suspicious thread that would hold the vnode
lock.  Could you please run kgdb (just like that, no arguments), then execute
'bt' command and then select a frame when _vn_lock is called with 'fr N'
command.  Then please 'print *vp' and share the result.


I Think I miss something in your request:


Oh, sorry!  The very first step should be 'tid 101112' to switch to the correct
context.



(kgdb) fr 7
#7  0x8063c5b3 in _vn_lock (vp=, flags=2121728,


"value optimized out" - not good


file=,
line=) at vnode_if.h:859
859vnode_if.h: No such file or directory.
in vnode_if.h
(kgdb) print *vp


I am not sure if this output is valid, because of the message above.
Could you please try to navigate to nearby frames and see if vp itself has a
valid value there.  If you can find such a frame please do *vp  there.



Does this seems better?


Yes!


(kgdb) fr 8
#8  0x8062a5f7 in vget (vp=0xf80049c2c000, flags=2121728,
td=0xf80009ba0500) at /usr/src/sys/kern/vfs_subr.c:2523
2523if ((error = vn_lock(vp, flags)) != 0) {
(kgdb) print *vp
$1 = {v_tag = 0x813be535 "zfs", v_op = 0x813d0f70, v_data =
0xf80049c1f420, v_mount = 0xf800093aa660,
  v_nmntvnodes = {tqe_next = 0xf80049c2c938, tqe_prev = 0xf80049c2bb30},
v_un = {vu_mount = 0x0, vu_socket = 0x0,
vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, le_prev =
0x0}, v_cache_src = {lh_first = 0x0}, v_cache_dst = {
tqh_first = 0xf800bfc8e3f0, tqh_last = 0xf800bfc8e410}, v_cache_dd =
0x0, v_lock = {lock_object = {
  lo_name = 0x813be535 "zfs", lo_flags = 117112832, lo_data = 0,
lo_witness = 0x0}, lk_lock = 23, lk_exslpfail = 0,
lk_timo = 51, lk_pri = 96}, v_interlock = {lock_object = {lo_name =
0x8099e9e0 "vnode interlock", lo_flags = 16973824,
  lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, v_vnlock =
0xf80049c2c068, v_actfreelist = {
tqe_next = 0xf80049c2c938, tqe_prev = 0xf80049ae9bd0}, v_bufobj =
{bo_lock = {lock_object = {
lo_name = 0x8099e9f0 "bufobj interlock", lo_flags = 86179840,
lo_data = 0, lo_witness = 0x0}, rw_lock = 1},
bo_ops = 0x80c4bf70, bo_object = 0xf800b62e9c60, bo_synclist =
{le_next = 0x0, le_prev = 0x0},
bo_private = 0xf80049c2c000, __bo_vnode = 0xf80049c2c000, bo_clean =
{bv_hd = {tqh_first = 0x0,
tqh_last = 0xf80049c2c120}, bv_root = {pt_root = 0}, bv_cnt = 0},
bo_dirty = {bv_hd = {tqh_first = 0x0,
tqh_last = 0xf80049c2c140}, bv_root = {pt_root = 0}, bv_cnt = 0},
bo_numoutput = 0, bo_flag = 0, bo_bsize = 131072},
  v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters =
{tqh_first = 0x0, tqh_last = 0xf80049c2c188},
rl_currdep = 0x0}, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0,
v_holdcnt = 9, v_usecount = 6, v_iflag = 512,
  v_vflag = 32, v_writecount = 0, v_hash = 4833984, v_type = VREG}
(kgdb)


flags=2121728 = 0x206000 = LK_SHARED | LK_VNHELD | LK_NODDLKTREAT
lk_lock = 23 = 0x17 = LK_ONE_SHARER | LK_EXCLUSIVE_WAITERS | LK_SHARED_WAITERS |
LK_SHARE

So, here's what we have here: this thread tries to get a shared lock on the
vnode, the vnode is already locked in shared mode, but there is an exclusive
waiter (or, perhaps, multiple waiters).  So, this thread can not get the lock
because of the exclusive waiter.  And I do not see an easy way to identify that
waiter.

In the procstat output that you provided earlier there was no other thread in
vn_lock.  Hmm, I see this:
procstat: sysctl: kern.proc.kstack: 14789: Device busy
procstat: sysctl: kern.proc.kstack: 82034: Device busy

Could you please check what those two processes are (if they are still running)?
Perhaps try procstat for each of the pids several times.



This 2 processes are the 2 instances of the innd daemon (news server) 
which seems in accordance with the directory /usr/local/news/bin.


[root@avoriaz ~]# procstat 14789
  PID  PPID  PGID   SID  TSID THR LOGINWCHAN EMUL  COMM
14789 29165 29165 29165 0   1 root zfs   FreeBSD ELF64 innd
[root@avoriaz ~]# procstat 82034
  PID  PPID  PGID   SID  TSID THR LOGINWCHAN EMUL  COMM
82034 29165 29165 29165 0   1 root zfs   FreeBSD ELF64 innd
[root@avoriaz ~]# procstat -f 14789
procstat: kinfo_getfile(): Device busy
  PID COMMFD T V FLAGSREF  OFFSET PRO NAME
[root@avoriaz ~]# procstat -f 14789
procstat: kinfo_getfile(): Device busy
  PID COMMFD T V FLAGSREF  OFFSET PRO NAME
[root@avoriaz ~]# procstat -f 14789
procstat: kinfo_getfile(): Device busy
  PID COMMFD T V FLAGS 

Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-10 Thread Andriy Gapon
On 10/11/2016 19:55, Henri Hennebert wrote:
> 
> 
> On 11/10/2016 18:33, Andriy Gapon wrote:
>> On 10/11/2016 18:12, Henri Hennebert wrote:
>>> On 11/10/2016 16:54, Andriy Gapon wrote:
 On 10/11/2016 17:20, Henri Hennebert wrote:
> On 11/10/2016 15:00, Andriy Gapon wrote:
>> Interesting.  I can not spot any suspicious thread that would hold the 
>> vnode
>> lock.  Could you please run kgdb (just like that, no arguments), then 
>> execute
>> 'bt' command and then select a frame when _vn_lock is called with 'fr N'
>> command.  Then please 'print *vp' and share the result.
>>
> I Think I miss something in your request:

 Oh, sorry!  The very first step should be 'tid 101112' to switch to the 
 correct
 context.

>>>
>>> (kgdb) fr 7
>>> #7  0x8063c5b3 in _vn_lock (vp=, flags=2121728,
>>
>> "value optimized out" - not good
>>
>>> file=,
>>> line=) at vnode_if.h:859
>>> 859vnode_if.h: No such file or directory.
>>> in vnode_if.h
>>> (kgdb) print *vp
>>
>> I am not sure if this output is valid, because of the message above.
>> Could you please try to navigate to nearby frames and see if vp itself has a
>> valid value there.  If you can find such a frame please do *vp  there.
>>
> 
> Does this seems better?

Yes!

> (kgdb) fr 8
> #8  0x8062a5f7 in vget (vp=0xf80049c2c000, flags=2121728,
> td=0xf80009ba0500) at /usr/src/sys/kern/vfs_subr.c:2523
> 2523if ((error = vn_lock(vp, flags)) != 0) {
> (kgdb) print *vp
> $1 = {v_tag = 0x813be535 "zfs", v_op = 0x813d0f70, v_data =
> 0xf80049c1f420, v_mount = 0xf800093aa660,
>   v_nmntvnodes = {tqe_next = 0xf80049c2c938, tqe_prev = 
> 0xf80049c2bb30},
> v_un = {vu_mount = 0x0, vu_socket = 0x0,
> vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, le_prev =
> 0x0}, v_cache_src = {lh_first = 0x0}, v_cache_dst = {
> tqh_first = 0xf800bfc8e3f0, tqh_last = 0xf800bfc8e410}, 
> v_cache_dd =
> 0x0, v_lock = {lock_object = {
>   lo_name = 0x813be535 "zfs", lo_flags = 117112832, lo_data = 0,
> lo_witness = 0x0}, lk_lock = 23, lk_exslpfail = 0,
> lk_timo = 51, lk_pri = 96}, v_interlock = {lock_object = {lo_name =
> 0x8099e9e0 "vnode interlock", lo_flags = 16973824,
>   lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, v_vnlock =
> 0xf80049c2c068, v_actfreelist = {
> tqe_next = 0xf80049c2c938, tqe_prev = 0xf80049ae9bd0}, v_bufobj =
> {bo_lock = {lock_object = {
> lo_name = 0x8099e9f0 "bufobj interlock", lo_flags = 86179840,
> lo_data = 0, lo_witness = 0x0}, rw_lock = 1},
> bo_ops = 0x80c4bf70, bo_object = 0xf800b62e9c60, bo_synclist =
> {le_next = 0x0, le_prev = 0x0},
> bo_private = 0xf80049c2c000, __bo_vnode = 0xf80049c2c000, 
> bo_clean =
> {bv_hd = {tqh_first = 0x0,
> tqh_last = 0xf80049c2c120}, bv_root = {pt_root = 0}, bv_cnt = 0},
> bo_dirty = {bv_hd = {tqh_first = 0x0,
> tqh_last = 0xf80049c2c140}, bv_root = {pt_root = 0}, bv_cnt = 0},
> bo_numoutput = 0, bo_flag = 0, bo_bsize = 131072},
>   v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters =
> {tqh_first = 0x0, tqh_last = 0xf80049c2c188},
> rl_currdep = 0x0}, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0,
> v_holdcnt = 9, v_usecount = 6, v_iflag = 512,
>   v_vflag = 32, v_writecount = 0, v_hash = 4833984, v_type = VREG}
> (kgdb)

flags=2121728 = 0x206000 = LK_SHARED | LK_VNHELD | LK_NODDLKTREAT
lk_lock = 23 = 0x17 = LK_ONE_SHARER | LK_EXCLUSIVE_WAITERS | LK_SHARED_WAITERS |
LK_SHARE

So, here's what we have here: this thread tries to get a shared lock on the
vnode, the vnode is already locked in shared mode, but there is an exclusive
waiter (or, perhaps, multiple waiters).  So, this thread can not get the lock
because of the exclusive waiter.  And I do not see an easy way to identify that
waiter.

In the procstat output that you provided earlier there was no other thread in
vn_lock.  Hmm, I see this:
procstat: sysctl: kern.proc.kstack: 14789: Device busy
procstat: sysctl: kern.proc.kstack: 82034: Device busy

Could you please check what those two processes are (if they are still running)?
Perhaps try procstat for each of the pids several times.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-10 Thread Henri Hennebert



On 11/10/2016 18:33, Andriy Gapon wrote:

On 10/11/2016 18:12, Henri Hennebert wrote:

On 11/10/2016 16:54, Andriy Gapon wrote:

On 10/11/2016 17:20, Henri Hennebert wrote:

On 11/10/2016 15:00, Andriy Gapon wrote:

Interesting.  I can not spot any suspicious thread that would hold the vnode
lock.  Could you please run kgdb (just like that, no arguments), then execute
'bt' command and then select a frame when _vn_lock is called with 'fr N'
command.  Then please 'print *vp' and share the result.


I Think I miss something in your request:


Oh, sorry!  The very first step should be 'tid 101112' to switch to the correct
context.



(kgdb) fr 7
#7  0x8063c5b3 in _vn_lock (vp=, flags=2121728,


"value optimized out" - not good


file=,
line=) at vnode_if.h:859
859vnode_if.h: No such file or directory.
in vnode_if.h
(kgdb) print *vp


I am not sure if this output is valid, because of the message above.
Could you please try to navigate to nearby frames and see if vp itself has a
valid value there.  If you can find such a frame please do *vp  there.



Does this seems better?

(kgdb) fr 8
#8  0x8062a5f7 in vget (vp=0xf80049c2c000, flags=2121728, 
td=0xf80009ba0500) at /usr/src/sys/kern/vfs_subr.c:2523

2523if ((error = vn_lock(vp, flags)) != 0) {
(kgdb) print *vp
$1 = {v_tag = 0x813be535 "zfs", v_op = 0x813d0f70, 
v_data = 0xf80049c1f420, v_mount = 0xf800093aa660,
  v_nmntvnodes = {tqe_next = 0xf80049c2c938, tqe_prev = 
0xf80049c2bb30}, v_un = {vu_mount = 0x0, vu_socket = 0x0,
vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, 
le_prev = 0x0}, v_cache_src = {lh_first = 0x0}, v_cache_dst = {
tqh_first = 0xf800bfc8e3f0, tqh_last = 0xf800bfc8e410}, 
v_cache_dd = 0x0, v_lock = {lock_object = {
  lo_name = 0x813be535 "zfs", lo_flags = 117112832, lo_data 
= 0, lo_witness = 0x0}, lk_lock = 23, lk_exslpfail = 0,
lk_timo = 51, lk_pri = 96}, v_interlock = {lock_object = {lo_name = 
0x8099e9e0 "vnode interlock", lo_flags = 16973824,
  lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, v_vnlock = 
0xf80049c2c068, v_actfreelist = {
tqe_next = 0xf80049c2c938, tqe_prev = 0xf80049ae9bd0}, 
v_bufobj = {bo_lock = {lock_object = {
lo_name = 0x8099e9f0 "bufobj interlock", lo_flags = 
86179840, lo_data = 0, lo_witness = 0x0}, rw_lock = 1},
bo_ops = 0x80c4bf70, bo_object = 0xf800b62e9c60, 
bo_synclist = {le_next = 0x0, le_prev = 0x0},
bo_private = 0xf80049c2c000, __bo_vnode = 0xf80049c2c000, 
bo_clean = {bv_hd = {tqh_first = 0x0,
tqh_last = 0xf80049c2c120}, bv_root = {pt_root = 0}, bv_cnt 
= 0}, bo_dirty = {bv_hd = {tqh_first = 0x0,
tqh_last = 0xf80049c2c140}, bv_root = {pt_root = 0}, bv_cnt 
= 0}, bo_numoutput = 0, bo_flag = 0, bo_bsize = 131072},
  v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters = 
{tqh_first = 0x0, tqh_last = 0xf80049c2c188},
rl_currdep = 0x0}, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 
0, v_holdcnt = 9, v_usecount = 6, v_iflag = 512,

  v_vflag = 32, v_writecount = 0, v_hash = 4833984, v_type = VREG}
(kgdb)

Henri
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-10 Thread Andriy Gapon
On 10/11/2016 18:12, Henri Hennebert wrote:
> On 11/10/2016 16:54, Andriy Gapon wrote:
>> On 10/11/2016 17:20, Henri Hennebert wrote:
>>> On 11/10/2016 15:00, Andriy Gapon wrote:
 Interesting.  I can not spot any suspicious thread that would hold the 
 vnode
 lock.  Could you please run kgdb (just like that, no arguments), then 
 execute
 'bt' command and then select a frame when _vn_lock is called with 'fr N'
 command.  Then please 'print *vp' and share the result.

>>> I Think I miss something in your request:
>>
>> Oh, sorry!  The very first step should be 'tid 101112' to switch to the 
>> correct
>> context.
>>
> 
> (kgdb) fr 7
> #7  0x8063c5b3 in _vn_lock (vp=, flags=2121728,

"value optimized out" - not good

> file=,
> line=) at vnode_if.h:859
> 859vnode_if.h: No such file or directory.
> in vnode_if.h
> (kgdb) print *vp

I am not sure if this output is valid, because of the message above.
Could you please try to navigate to nearby frames and see if vp itself has a
valid value there.  If you can find such a frame please do *vp  there.

> $1 = {v_tag = 0x80faeb78 "â~\231\200", v_op = 0xf80009a41000,
> v_data = 0x0, v_mount = 0xf80009a41010,
>   v_nmntvnodes = {tqe_next = 0x0, tqe_prev = 0x80edc088}, v_un =
> {vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0,
> vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0xf80009466e90, le_prev =
> 0x0}, v_cache_src = {lh_first = 0xfe010186d768},
>   v_cache_dst = {tqh_first = 0x0, tqh_last = 0xfeb8a7c0}, v_cache_dd =
> 0xf8000284f000, v_lock = {lock_object = {
>   lo_name = 0xf8002c00ee80 "", lo_flags = 0, lo_data = 0, lo_witness =
> 0xf800068bd480},
> lk_lock = 1844673520268056, lk_exslpfail = 153715840, lk_timo = -2048,
> lk_pri = 0}, v_interlock = {lock_object = {
>   lo_name = 0x18af8  address>, lo_flags = 0, lo_data = 0,
>   lo_witness = 0x0}, mtx_lock = 0}, v_vnlock = 0x0, v_actfreelist =
> {tqe_next = 0x0, tqe_prev = 0xf80009ba05c0},
>   v_bufobj = {bo_lock = {lock_object = {lo_name = 0xf80009a41000 "",
> lo_flags = 1, lo_data = 0, lo_witness = 0x400ff},
>   rw_lock = 2}, bo_ops = 0x1, bo_object = 0xf80049c2c068,
> bo_synclist = {le_next = 0x813be535,
>   le_prev = 0x1}, bo_private = 0x0, __bo_vnode = 0x0, 
> bo_clean =
> {bv_hd = {tqh_first = 0x0, tqh_last = 0x0},
>   bv_root = {pt_root = 0}, bv_cnt = 0}, bo_dirty = {bv_hd = {tqh_first =
> 0xf80088ac8d00, tqh_last = 0xf8003cc5b600},
>   bv_root = {pt_root = 2553161591}, bv_cnt = -1741805705}, bo_numoutput =
> 31, bo_flag = 0, bo_bsize = 0}, v_pollinfo = 0x0,
>   v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters = {tqh_first = 0xf88,
> tqh_last = 0x19cc}, rl_currdep = 0x3f8},
>   v_cstart = 16256, v_lasta = 679, v_lastw = 0, v_clen = 0, v_holdcnt = 0,
> v_usecount = 2369, v_iflag = 0, v_vflag = 0,
>   v_writecount = 0, v_hash = 0, v_type = VNON}
> (kgdb)
> 
> Thanks for your time
> 
> Henri


-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-10 Thread Henri Hennebert

On 11/10/2016 16:54, Andriy Gapon wrote:

On 10/11/2016 17:20, Henri Hennebert wrote:

On 11/10/2016 15:00, Andriy Gapon wrote:

Interesting.  I can not spot any suspicious thread that would hold the vnode
lock.  Could you please run kgdb (just like that, no arguments), then execute
'bt' command and then select a frame when _vn_lock is called with 'fr N'
command.  Then please 'print *vp' and share the result.


I Think I miss something in your request:


Oh, sorry!  The very first step should be 'tid 101112' to switch to the correct
context.



(kgdb) fr 7
#7  0x8063c5b3 in _vn_lock (vp=, 
flags=2121728, file=,

line=) at vnode_if.h:859
859 vnode_if.h: No such file or directory.
in vnode_if.h
(kgdb) print *vp
$1 = {v_tag = 0x80faeb78 "â~\231\200", v_op = 
0xf80009a41000, v_data = 0x0, v_mount = 0xf80009a41010,
  v_nmntvnodes = {tqe_next = 0x0, tqe_prev = 0x80edc088}, v_un 
= {vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0,
vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0xf80009466e90, 
le_prev = 0x0}, v_cache_src = {lh_first = 0xfe010186d768},
  v_cache_dst = {tqh_first = 0x0, tqh_last = 0xfeb8a7c0}, 
v_cache_dd = 0xf8000284f000, v_lock = {lock_object = {
  lo_name = 0xf8002c00ee80 "", lo_flags = 0, lo_data = 0, 
lo_witness = 0xf800068bd480},
lk_lock = 1844673520268056, lk_exslpfail = 153715840, lk_timo = 
-2048, lk_pri = 0}, v_interlock = {lock_object = {
  lo_name = 0x18af8 Bad address>, lo_flags = 0, lo_data = 0,
  lo_witness = 0x0}, mtx_lock = 0}, v_vnlock = 0x0, v_actfreelist = 
{tqe_next = 0x0, tqe_prev = 0xf80009ba05c0},
  v_bufobj = {bo_lock = {lock_object = {lo_name = 0xf80009a41000 
"", lo_flags = 1, lo_data = 0, lo_witness = 0x400ff},
  rw_lock = 2}, bo_ops = 0x1, bo_object = 
0xf80049c2c068, bo_synclist = {le_next = 0x813be535,
  le_prev = 0x1}, bo_private = 0x0, __bo_vnode = 0x0, 
bo_clean = {bv_hd = {tqh_first = 0x0, tqh_last = 0x0},
  bv_root = {pt_root = 0}, bv_cnt = 0}, bo_dirty = {bv_hd = 
{tqh_first = 0xf80088ac8d00, tqh_last = 0xf8003cc5b600},
  bv_root = {pt_root = 2553161591}, bv_cnt = -1741805705}, 
bo_numoutput = 31, bo_flag = 0, bo_bsize = 0}, v_pollinfo = 0x0,
  v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters = {tqh_first = 
0xf88, tqh_last = 0x19cc}, rl_currdep = 0x3f8},
  v_cstart = 16256, v_lasta = 679, v_lastw = 0, v_clen = 0, v_holdcnt = 
0, v_usecount = 2369, v_iflag = 0, v_vflag = 0,

  v_writecount = 0, v_hash = 0, v_type = VNON}
(kgdb)

Thanks for your time

Henri
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-10 Thread Andriy Gapon
On 10/11/2016 17:20, Henri Hennebert wrote:
> On 11/10/2016 15:00, Andriy Gapon wrote:
>> Interesting.  I can not spot any suspicious thread that would hold the vnode
>> lock.  Could you please run kgdb (just like that, no arguments), then execute
>> 'bt' command and then select a frame when _vn_lock is called with 'fr N'
>> command.  Then please 'print *vp' and share the result.
>>
> I Think I miss something in your request:

Oh, sorry!  The very first step should be 'tid 101112' to switch to the correct
context.

> [root@avoriaz ~]# kgdb
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...
> Reading symbols from /boot/kernel/zfs.ko...Reading symbols from
> /usr/lib/debug//boot/kernel/zfs.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel/zfs.ko
> Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from
> /usr/lib/debug//boot/kernel/opensolaris.ko.debug...done.
> done.
> 
> --- clip ---
> 
> Loaded symbols for /boot/kernel/accf_data.ko
> Reading symbols from /boot/kernel/daemon_saver.ko...Reading symbols from
> /usr/lib/debug//boot/kernel/daemon_saver.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel/daemon_saver.ko
> #0  sched_switch (td=0xf8001131da00, newtd=0xf800762a8500, 
> flags= optimized out>)
> at /usr/src/sys/kern/sched_ule.c:1973
> 1973cpuid = PCPU_GET(cpuid);
> (kgdb) bt
> #0  sched_switch (td=0xf8001131da00, newtd=0xf800762a8500, 
> flags= optimized out>)
> at /usr/src/sys/kern/sched_ule.c:1973
> #1  0x80566b15 in tc_fill_vdso_timehands32 (vdso_th32=0x0) at
> /usr/src/sys/kern/kern_tc.c:2121
> #2  0x80555227 in timekeep_push_vdso () at
> /usr/src/sys/kern/kern_sharedpage.c:174
> #3  0x80566226 in tc_windup () at /usr/src/sys/kern/kern_tc.c:1426
> #4  0x804eaa41 in hardclock_cnt (cnt=1, usermode= out>)
> at /usr/src/sys/kern/kern_clock.c:589
> #5  0x808fac74 in handleevents (now=, fake=0) at
> /usr/src/sys/kern/kern_clocksource.c:223
> #6  0x808fb1d7 in timercb (et=0x8100cf20, arg= out>) at /usr/src/sys/kern/kern_clocksource.c:352
> #7  0xf800b6429a00 in ?? ()
> #8  0x81051080 in vm_page_array ()
> #9  0x81051098 in vm_page_queue_free_mtx ()
> #10 0xfe0101818920 in ?? ()
> #11 0x805399c0 in __mtx_lock_sleep (c=, tid=Error
> accessing memory address 0xffac: Bad add\
> ress.
> ) at /usr/src/sys/kern/kern_mutex.c:590
> Previous frame inner to this frame (corrupt stack?)
> Current language:  auto; currently minimal
> (kgdb) q
> [root@avoriaz ~]#
> 
> I don't find the requested frame
> 
> Henri


-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-10 Thread Henri Hennebert

On 11/10/2016 15:00, Andriy Gapon wrote:

On 10/11/2016 12:30, Henri Hennebert wrote:

On 11/10/2016 11:21, Andriy Gapon wrote:

On 09/11/2016 15:58, Eric van Gyzen wrote:

On 11/09/2016 07:48, Henri Hennebert wrote:

I encounter a strange deadlock on

FreeBSD avoriaz.restart.bel 11.0-RELEASE-p3 FreeBSD 11.0-RELEASE-p3 #0 r308260:
Fri Nov  4 02:51:33 CET 2016
r...@avoriaz.restart.bel:/usr/obj/usr/src/sys/AVORIAZ  amd64

This system is exclusively running on zfs.

After 3 or 4 days, `periodic daily` is locked in the directory
/usr/local/news/bin

[root@avoriaz ~]# ps xa|grep find
85656  -  D0:01.13 find / ( ! -fstype local -o -fstype rdonly ) -prune
-o ( -name [#,]* -o -name .#* -o -name a.out -o -nam
  462  1  S+   0:00.00 grep find
[root@avoriaz ~]# procstat -f 85656
  PID COMMFD T V FLAGSREF  OFFSET PRO NAME
85656 find  text v r r---   -   - - /usr/bin/find
85656 find   cwd v d r---   -   - - /usr/local/news/bin
85656 find  root v d r---   -   - - /
85656 find 0 v c r---   3   0 - /dev/null
85656 find 1 p - rw--   1   0 - -
85656 find 2 v r -w--   7  17 - -
85656 find 3 v d r---   1   0 - /home/root
85656 find 4 v d r---   1   0 - /home/root
85656 find 5 v d rn--   1 533545184 - /usr/local/news/bin
[root@avoriaz ~]#

If I try `ls /usr/local/news/bin` it is also locked.

After `shutdown -r now` the system remain locked after the line '0 0 0 0 0 0'

After a reset and reboot  I can access /usr/local/news/bin.

I delete this directory and reinstall the package `portupgrade -fu news/inn`

5 days later `periodic daily`is locked on the same directory :-o

Any idea?


I can't help with the deadlock, but someone who _can_ help will probably ask for
the output of "procstat -kk PID" with the PID of the "find" process.


In fact, it's procstat -kk -a.  With just one thread we would see that a thread
is blocked on something, but we won't see why that something can not be 
acquired.



I attach the result,


Interesting.  I can not spot any suspicious thread that would hold the vnode
lock.  Could you please run kgdb (just like that, no arguments), then execute
'bt' command and then select a frame when _vn_lock is called with 'fr N'
command.  Then please 'print *vp' and share the result.


I Think I miss something in your request:

[root@avoriaz ~]# kgdb
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain 
conditions.

Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...
Reading symbols from /boot/kernel/zfs.ko...Reading symbols from 
/usr/lib/debug//boot/kernel/zfs.ko.debug...done.

done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from 
/usr/lib/debug//boot/kernel/opensolaris.ko.debug...done.

done.

--- clip ---

Loaded symbols for /boot/kernel/accf_data.ko
Reading symbols from /boot/kernel/daemon_saver.ko...Reading symbols from 
/usr/lib/debug//boot/kernel/daemon_saver.ko.debug...done.

done.
Loaded symbols for /boot/kernel/daemon_saver.ko
#0  sched_switch (td=0xf8001131da00, newtd=0xf800762a8500, 
flags=)

at /usr/src/sys/kern/sched_ule.c:1973
1973cpuid = PCPU_GET(cpuid);
(kgdb) bt
#0  sched_switch (td=0xf8001131da00, newtd=0xf800762a8500, 
flags=)

at /usr/src/sys/kern/sched_ule.c:1973
#1  0x80566b15 in tc_fill_vdso_timehands32 (vdso_th32=0x0) at 
/usr/src/sys/kern/kern_tc.c:2121
#2  0x80555227 in timekeep_push_vdso () at 
/usr/src/sys/kern/kern_sharedpage.c:174

#3  0x80566226 in tc_windup () at /usr/src/sys/kern/kern_tc.c:1426
#4  0x804eaa41 in hardclock_cnt (cnt=1, usermode=optimized out>) at /usr/src/sys/kern/kern_clock.c:589
#5  0x808fac74 in handleevents (now=, 
fake=0) at /usr/src/sys/kern/kern_clocksource.c:223
#6  0x808fb1d7 in timercb (et=0x8100cf20, arg=optimized out>) at /usr/src/sys/kern/kern_clocksource.c:352

#7  0xf800b6429a00 in ?? ()
#8  0x81051080 in vm_page_array ()
#9  0x81051098 in vm_page_queue_free_mtx ()
#10 0xfe0101818920 in ?? ()
#11 0x805399c0 in __mtx_lock_sleep (c=, 
tid=Error accessing memory address 0xffac: Bad add\

ress.
) at /usr/src/sys/kern/kern_mutex.c:590
Previous frame inner to this frame (corrupt stack?)
Current language:  auto; currently minimal
(kgdb) q
[root@avoriaz ~]#

I don't find the requested frame

Henri
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To 

Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-10 Thread Andriy Gapon
On 10/11/2016 12:30, Henri Hennebert wrote:
> On 11/10/2016 11:21, Andriy Gapon wrote:
>> On 09/11/2016 15:58, Eric van Gyzen wrote:
>>> On 11/09/2016 07:48, Henri Hennebert wrote:
 I encounter a strange deadlock on

 FreeBSD avoriaz.restart.bel 11.0-RELEASE-p3 FreeBSD 11.0-RELEASE-p3 #0 
 r308260:
 Fri Nov  4 02:51:33 CET 2016
 r...@avoriaz.restart.bel:/usr/obj/usr/src/sys/AVORIAZ  amd64

 This system is exclusively running on zfs.

 After 3 or 4 days, `periodic daily` is locked in the directory
 /usr/local/news/bin

 [root@avoriaz ~]# ps xa|grep find
 85656  -  D0:01.13 find / ( ! -fstype local -o -fstype rdonly ) 
 -prune
 -o ( -name [#,]* -o -name .#* -o -name a.out -o -nam
   462  1  S+   0:00.00 grep find
 [root@avoriaz ~]# procstat -f 85656
   PID COMMFD T V FLAGSREF  OFFSET PRO NAME
 85656 find  text v r r---   -   - - /usr/bin/find
 85656 find   cwd v d r---   -   - - /usr/local/news/bin
 85656 find  root v d r---   -   - - /
 85656 find 0 v c r---   3   0 - /dev/null
 85656 find 1 p - rw--   1   0 - -
 85656 find 2 v r -w--   7  17 - -
 85656 find 3 v d r---   1   0 - /home/root
 85656 find 4 v d r---   1   0 - /home/root
 85656 find 5 v d rn--   1 533545184 - 
 /usr/local/news/bin
 [root@avoriaz ~]#

 If I try `ls /usr/local/news/bin` it is also locked.

 After `shutdown -r now` the system remain locked after the line '0 0 0 0 0 
 0'

 After a reset and reboot  I can access /usr/local/news/bin.

 I delete this directory and reinstall the package `portupgrade -fu 
 news/inn`

 5 days later `periodic daily`is locked on the same directory :-o

 Any idea?
>>>
>>> I can't help with the deadlock, but someone who _can_ help will probably 
>>> ask for
>>> the output of "procstat -kk PID" with the PID of the "find" process.
>>
>> In fact, it's procstat -kk -a.  With just one thread we would see that a 
>> thread
>> is blocked on something, but we won't see why that something can not be 
>> acquired.
>>
>>
> I attach the result,

Interesting.  I can not spot any suspicious thread that would hold the vnode
lock.  Could you please run kgdb (just like that, no arguments), then execute
'bt' command and then select a frame when _vn_lock is called with 'fr N'
command.  Then please 'print *vp' and share the result.



-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-10 Thread Henri Hennebert

On 11/10/2016 11:21, Andriy Gapon wrote:

On 09/11/2016 15:58, Eric van Gyzen wrote:

On 11/09/2016 07:48, Henri Hennebert wrote:

I encounter a strange deadlock on

FreeBSD avoriaz.restart.bel 11.0-RELEASE-p3 FreeBSD 11.0-RELEASE-p3 #0 r308260:
Fri Nov  4 02:51:33 CET 2016
r...@avoriaz.restart.bel:/usr/obj/usr/src/sys/AVORIAZ  amd64

This system is exclusively running on zfs.

After 3 or 4 days, `periodic daily` is locked in the directory 
/usr/local/news/bin

[root@avoriaz ~]# ps xa|grep find
85656  -  D0:01.13 find / ( ! -fstype local -o -fstype rdonly ) -prune
-o ( -name [#,]* -o -name .#* -o -name a.out -o -nam
  462  1  S+   0:00.00 grep find
[root@avoriaz ~]# procstat -f 85656
  PID COMMFD T V FLAGSREF  OFFSET PRO NAME
85656 find  text v r r---   -   - - /usr/bin/find
85656 find   cwd v d r---   -   - - /usr/local/news/bin
85656 find  root v d r---   -   - - /
85656 find 0 v c r---   3   0 - /dev/null
85656 find 1 p - rw--   1   0 - -
85656 find 2 v r -w--   7  17 - -
85656 find 3 v d r---   1   0 - /home/root
85656 find 4 v d r---   1   0 - /home/root
85656 find 5 v d rn--   1 533545184 - /usr/local/news/bin
[root@avoriaz ~]#

If I try `ls /usr/local/news/bin` it is also locked.

After `shutdown -r now` the system remain locked after the line '0 0 0 0 0 0'

After a reset and reboot  I can access /usr/local/news/bin.

I delete this directory and reinstall the package `portupgrade -fu news/inn`

5 days later `periodic daily`is locked on the same directory :-o

Any idea?


I can't help with the deadlock, but someone who _can_ help will probably ask for
the output of "procstat -kk PID" with the PID of the "find" process.


In fact, it's procstat -kk -a.  With just one thread we would see that a thread
is blocked on something, but we won't see why that something can not be 
acquired.



I attach the result,

Henri
[root@avoriaz ~]# procstat -kk -a
  PIDTID COMM TDNAME   KSTACK   
0 10 kernel   swapper  mi_switch+0xd2 
sleepq_timedwait+0x3a _sleep+0x281 swapper+0x464 btext+0x2c 
0 19 kernel   kqueue_ctx taskq mi_switch+0xd2 sleepq_wait+0x3a 
_sleep+0x2a1 taskqueue_thread_loop+0x141 fork_exit+0x85 fork_trampoline+0xe 
0 100012 kernel   aiod_kick taskq  mi_switch+0xd2 sleepq_wait+0x3a 
_sleep+0x2a1 taskqueue_thread_loop+0x141 fork_exit+0x85 fork_trampoline+0xe 
0 100013 kernel   thread taskq mi_switch+0xd2 sleepq_wait+0x3a 
_sleep+0x2a1 taskqueue_thread_loop+0x141 fork_exit+0x85 fork_trampoline+0xe 
0 100018 kernel   firmware taskq   mi_switch+0xd2 sleepq_wait+0x3a 
_sleep+0x2a1 taskqueue_thread_loop+0x141 fork_exit+0x85 fork_trampoline+0xe 
0 100022 kernel   acpi_task_0  mi_switch+0xd2 sleepq_wait+0x3a 
msleep_spin_sbt+0x1bd taskqueue_thread_loop+0x113 fork_exit+0x85 
fork_trampoline+0xe 
0 100023 kernel   acpi_task_1  mi_switch+0xd2 sleepq_wait+0x3a 
msleep_spin_sbt+0x1bd taskqueue_thread_loop+0x113 fork_exit+0x85 
fork_trampoline+0xe 
0 100024 kernel   acpi_task_2  mi_switch+0xd2 sleepq_wait+0x3a 
msleep_spin_sbt+0x1bd taskqueue_thread_loop+0x113 fork_exit+0x85 
fork_trampoline+0xe 
0 100025 kernel   em0 que  mi_switch+0xd2 sleepq_wait+0x3a 
msleep_spin_sbt+0x1bd taskqueue_thread_loop+0x113 fork_exit+0x85 
fork_trampoline+0xe 
0 100026 kernel   em0 txq  mi_switch+0xd2 sleepq_wait+0x3a 
msleep_spin_sbt+0x1bd taskqueue_thread_loop+0x113 fork_exit+0x85 
fork_trampoline+0xe 
0 100027 kernel   em1 taskqmi_switch+0xd2 sleepq_wait+0x3a 
msleep_spin_sbt+0x1bd taskqueue_thread_loop+0x113 fork_exit+0x85 
fork_trampoline+0xe 
0 100060 kernel   mca taskqmi_switch+0xd2 sleepq_wait+0x3a 
msleep_spin_sbt+0x1bd taskqueue_thread_loop+0x113 fork_exit+0x85 
fork_trampoline+0xe 
0 100061 kernel   system_taskq_0   mi_switch+0xd2 sleepq_wait+0x3a 
_sleep+0x2a1 taskqueue_thread_loop+0x141 fork_exit+0x85 fork_trampoline+0xe 
0 100062 kernel   system_taskq_1   mi_switch+0xd2 sleepq_wait+0x3a 
_sleep+0x2a1 taskqueue_thread_loop+0x141 fork_exit+0x85 fork_trampoline+0xe 
0 100063 kernel   dbu_evictmi_switch+0xd2 sleepq_wait+0x3a 
_sleep+0x2a1 taskqueue_thread_loop+0x141 fork_exit+0x85 fork_trampoline+0xe 
0 100072 kernel   CAM taskqmi_switch+0xd2 sleepq_wait+0x3a 
_sleep+0x2a1 taskqueue_thread_loop+0x141 fork_exit+0x85 fork_trampoline+0xe 
0 100086 kernel   if_config_tqg_0  mi_switch+0xd2 sleepq_wait+0x3a 
msleep_spin_sbt+0x1bd gtaskqueue_thread_loop+0x113 fork_exit+0x85 
fork_trampoline+0xe 
0 100087 kernel   if_io_tqg_0  mi_switch+0xd2 sleepq_wait+0x3a 

Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-10 Thread Andriy Gapon
On 09/11/2016 15:58, Eric van Gyzen wrote:
> On 11/09/2016 07:48, Henri Hennebert wrote:
>> I encounter a strange deadlock on
>>
>> FreeBSD avoriaz.restart.bel 11.0-RELEASE-p3 FreeBSD 11.0-RELEASE-p3 #0 
>> r308260:
>> Fri Nov  4 02:51:33 CET 2016
>> r...@avoriaz.restart.bel:/usr/obj/usr/src/sys/AVORIAZ  amd64
>>
>> This system is exclusively running on zfs.
>>
>> After 3 or 4 days, `periodic daily` is locked in the directory 
>> /usr/local/news/bin
>>
>> [root@avoriaz ~]# ps xa|grep find
>> 85656  -  D0:01.13 find / ( ! -fstype local -o -fstype rdonly ) 
>> -prune
>> -o ( -name [#,]* -o -name .#* -o -name a.out -o -nam
>>   462  1  S+   0:00.00 grep find
>> [root@avoriaz ~]# procstat -f 85656
>>   PID COMMFD T V FLAGSREF  OFFSET PRO NAME
>> 85656 find  text v r r---   -   - - /usr/bin/find
>> 85656 find   cwd v d r---   -   - - /usr/local/news/bin
>> 85656 find  root v d r---   -   - - /
>> 85656 find 0 v c r---   3   0 - /dev/null
>> 85656 find 1 p - rw--   1   0 - -
>> 85656 find 2 v r -w--   7  17 - -
>> 85656 find 3 v d r---   1   0 - /home/root
>> 85656 find 4 v d r---   1   0 - /home/root
>> 85656 find 5 v d rn--   1 533545184 - /usr/local/news/bin
>> [root@avoriaz ~]#
>>
>> If I try `ls /usr/local/news/bin` it is also locked.
>>
>> After `shutdown -r now` the system remain locked after the line '0 0 0 0 0 0'
>>
>> After a reset and reboot  I can access /usr/local/news/bin.
>>
>> I delete this directory and reinstall the package `portupgrade -fu news/inn`
>>
>> 5 days later `periodic daily`is locked on the same directory :-o
>>
>> Any idea?
> 
> I can't help with the deadlock, but someone who _can_ help will probably ask 
> for
> the output of "procstat -kk PID" with the PID of the "find" process.

In fact, it's procstat -kk -a.  With just one thread we would see that a thread
is blocked on something, but we won't see why that something can not be 
acquired.


-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-10 Thread Henri Hennebert

On 11/09/2016 19:23, Thierry Thomas wrote:

Le mer.  9 nov. 16 à 15:03:49 +0100, Henri Hennebert 
 écrivait :


[root@avoriaz ~]# procstat -kk 85656
   PIDTID COMM TDNAME KSTACK
85656 101112 find -mi_switch+0xd2
sleepq_wait+0x3a sleeplk+0x1b4 __lockmgr_args+0x356 vop_stdlock+0x3c
VOP_LOCK1_APV+0x8d _vn_lock+0x43 vget+0x47 cache_lookup+0x679
vfs_cache_lookup+0xac VOP_LOOKUP_APV+0x87 lookup+0x591 namei+0x572
kern_statat+0xa8 sys_fstatat+0x2c amd64_syscall+0x4ce Xfast_syscall+0xfb


It looks similar to the problem reportes in PR 205163
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=205163

May be causes by too small values for some vfs.zfs.arc*.
Could you please list sysctl for vfs.zfs.arc_max and others?

Regards,


[root@avoriaz ~]# sysctl vfs.zfs
vfs.zfs.trim.max_interval: 1
vfs.zfs.trim.timeout: 30
vfs.zfs.trim.txg_delay: 32
vfs.zfs.trim.enabled: 1
vfs.zfs.vol.unmap_enabled: 1
vfs.zfs.vol.recursive: 0
vfs.zfs.vol.mode: 1
vfs.zfs.version.zpl: 5
vfs.zfs.version.spa: 5000
vfs.zfs.version.acl: 1
vfs.zfs.version.ioctl: 6
vfs.zfs.debug: 0
vfs.zfs.super_owner: 0
vfs.zfs.sync_pass_rewrite: 2
vfs.zfs.sync_pass_dont_compress: 5
vfs.zfs.sync_pass_deferred_free: 2
vfs.zfs.zio.exclude_metadata: 0
vfs.zfs.zio.use_uma: 1
vfs.zfs.cache_flush_disable: 0
vfs.zfs.zil_replay_disable: 0
vfs.zfs.min_auto_ashift: 9
vfs.zfs.max_auto_ashift: 13
vfs.zfs.vdev.trim_max_pending: 1
vfs.zfs.vdev.bio_delete_disable: 0
vfs.zfs.vdev.bio_flush_disable: 0
vfs.zfs.vdev.write_gap_limit: 4096
vfs.zfs.vdev.read_gap_limit: 32768
vfs.zfs.vdev.aggregation_limit: 131072
vfs.zfs.vdev.trim_max_active: 64
vfs.zfs.vdev.trim_min_active: 1
vfs.zfs.vdev.scrub_max_active: 2
vfs.zfs.vdev.scrub_min_active: 1
vfs.zfs.vdev.async_write_max_active: 10
vfs.zfs.vdev.async_write_min_active: 1
vfs.zfs.vdev.async_read_max_active: 3
vfs.zfs.vdev.async_read_min_active: 1
vfs.zfs.vdev.sync_write_max_active: 10
vfs.zfs.vdev.sync_write_min_active: 10
vfs.zfs.vdev.sync_read_max_active: 10
vfs.zfs.vdev.sync_read_min_active: 10
vfs.zfs.vdev.max_active: 1000
vfs.zfs.vdev.async_write_active_max_dirty_percent: 60
vfs.zfs.vdev.async_write_active_min_dirty_percent: 30
vfs.zfs.vdev.mirror.non_rotating_seek_inc: 1
vfs.zfs.vdev.mirror.non_rotating_inc: 0
vfs.zfs.vdev.mirror.rotating_seek_offset: 1048576
vfs.zfs.vdev.mirror.rotating_seek_inc: 5
vfs.zfs.vdev.mirror.rotating_inc: 0
vfs.zfs.vdev.trim_on_init: 1
vfs.zfs.vdev.cache.bshift: 16
vfs.zfs.vdev.cache.size: 0
vfs.zfs.vdev.cache.max: 16384
vfs.zfs.vdev.metaslabs_per_vdev: 200
vfs.zfs.txg.timeout: 5
vfs.zfs.space_map_blksz: 4096
vfs.zfs.spa_slop_shift: 5
vfs.zfs.spa_asize_inflation: 24
vfs.zfs.deadman_enabled: 1
vfs.zfs.deadman_checktime_ms: 5000
vfs.zfs.deadman_synctime_ms: 100
vfs.zfs.debug_flags: 0
vfs.zfs.recover: 0
vfs.zfs.spa_load_verify_data: 1
vfs.zfs.spa_load_verify_metadata: 1
vfs.zfs.spa_load_verify_maxinflight: 1
vfs.zfs.ccw_retry_interval: 300
vfs.zfs.check_hostid: 1
vfs.zfs.mg_fragmentation_threshold: 85
vfs.zfs.mg_noalloc_threshold: 0
vfs.zfs.condense_pct: 200
vfs.zfs.metaslab.bias_enabled: 1
vfs.zfs.metaslab.lba_weighting_enabled: 1
vfs.zfs.metaslab.fragmentation_factor_enabled: 1
vfs.zfs.metaslab.preload_enabled: 1
vfs.zfs.metaslab.preload_limit: 3
vfs.zfs.metaslab.unload_delay: 8
vfs.zfs.metaslab.load_pct: 50
vfs.zfs.metaslab.min_alloc_size: 33554432
vfs.zfs.metaslab.df_free_pct: 4
vfs.zfs.metaslab.df_alloc_threshold: 131072
vfs.zfs.metaslab.debug_unload: 0
vfs.zfs.metaslab.debug_load: 0
vfs.zfs.metaslab.fragmentation_threshold: 70
vfs.zfs.metaslab.gang_bang: 16777217
vfs.zfs.free_bpobj_enabled: 1
vfs.zfs.free_max_blocks: 18446744073709551615
vfs.zfs.no_scrub_prefetch: 0
vfs.zfs.no_scrub_io: 0
vfs.zfs.resilver_min_time_ms: 3000
vfs.zfs.free_min_time_ms: 1000
vfs.zfs.scan_min_time_ms: 1000
vfs.zfs.scan_idle: 50
vfs.zfs.scrub_delay: 4
vfs.zfs.resilver_delay: 2
vfs.zfs.top_maxinflight: 32
vfs.zfs.zfetch.array_rd_sz: 1048576
vfs.zfs.zfetch.max_distance: 8388608
vfs.zfs.zfetch.min_sec_reap: 2
vfs.zfs.zfetch.max_streams: 8
vfs.zfs.prefetch_disable: 1
vfs.zfs.delay_scale: 50
vfs.zfs.delay_min_dirty_percent: 60
vfs.zfs.dirty_data_sync: 67108864
vfs.zfs.dirty_data_max_percent: 10
vfs.zfs.dirty_data_max_max: 4294967296
vfs.zfs.dirty_data_max: 373664153
vfs.zfs.max_recordsize: 1048576
vfs.zfs.mdcomp_disable: 0
vfs.zfs.nopwrite_enabled: 1
vfs.zfs.dedup.prefetch: 1
vfs.zfs.l2c_only_size: 0
vfs.zfs.mfu_ghost_data_lsize: 24202240
vfs.zfs.mfu_ghost_metadata_lsize: 136404992
vfs.zfs.mfu_ghost_size: 160607232
vfs.zfs.mfu_data_lsize: 449569280
vfs.zfs.mfu_metadata_lsize: 102724608
vfs.zfs.mfu_size: 714202624
vfs.zfs.mru_ghost_data_lsize: 874834432
vfs.zfs.mru_ghost_metadata_lsize: 387692032
vfs.zfs.mru_ghost_size: 1262526464
vfs.zfs.mru_data_lsize: 151275008
vfs.zfs.mru_metadata_lsize: 13547008
vfs.zfs.mru_size: 322614272
vfs.zfs.anon_data_lsize: 0
vfs.zfs.anon_metadata_lsize: 0
vfs.zfs.anon_size: 2916352
vfs.zfs.l2arc_norw: 1

Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-09 Thread Thierry Thomas
Le mer.  9 nov. 16 à 15:03:49 +0100, Henri Hennebert 
 écrivait :

> [root@avoriaz ~]# procstat -kk 85656
>PIDTID COMM TDNAME KSTACK
> 85656 101112 find -mi_switch+0xd2 
> sleepq_wait+0x3a sleeplk+0x1b4 __lockmgr_args+0x356 vop_stdlock+0x3c 
> VOP_LOCK1_APV+0x8d _vn_lock+0x43 vget+0x47 cache_lookup+0x679 
> vfs_cache_lookup+0xac VOP_LOOKUP_APV+0x87 lookup+0x591 namei+0x572 
> kern_statat+0xa8 sys_fstatat+0x2c amd64_syscall+0x4ce Xfast_syscall+0xfb

It looks similar to the problem reportes in PR 205163
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=205163

May be causes by too small values for some vfs.zfs.arc*.
Could you please list sysctl for vfs.zfs.arc_max and others?

Regards,
-- 
Th. Thomas.


signature.asc
Description: PGP signature


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-09 Thread Henri Hennebert

On 11/09/2016 14:58, Eric van Gyzen wrote:

On 11/09/2016 07:48, Henri Hennebert wrote:

I encounter a strange deadlock on

FreeBSD avoriaz.restart.bel 11.0-RELEASE-p3 FreeBSD 11.0-RELEASE-p3 #0 r308260:
Fri Nov  4 02:51:33 CET 2016
r...@avoriaz.restart.bel:/usr/obj/usr/src/sys/AVORIAZ  amd64

This system is exclusively running on zfs.

After 3 or 4 days, `periodic daily` is locked in the directory 
/usr/local/news/bin

[root@avoriaz ~]# ps xa|grep find
85656  -  D0:01.13 find / ( ! -fstype local -o -fstype rdonly ) -prune
-o ( -name [#,]* -o -name .#* -o -name a.out -o -nam
   462  1  S+   0:00.00 grep find
[root@avoriaz ~]# procstat -f 85656
   PID COMMFD T V FLAGSREF  OFFSET PRO NAME
85656 find  text v r r---   -   - - /usr/bin/find
85656 find   cwd v d r---   -   - - /usr/local/news/bin
85656 find  root v d r---   -   - - /
85656 find 0 v c r---   3   0 - /dev/null
85656 find 1 p - rw--   1   0 - -
85656 find 2 v r -w--   7  17 - -
85656 find 3 v d r---   1   0 - /home/root
85656 find 4 v d r---   1   0 - /home/root
85656 find 5 v d rn--   1 533545184 - /usr/local/news/bin
[root@avoriaz ~]#

If I try `ls /usr/local/news/bin` it is also locked.

After `shutdown -r now` the system remain locked after the line '0 0 0 0 0 0'

After a reset and reboot  I can access /usr/local/news/bin.

I delete this directory and reinstall the package `portupgrade -fu news/inn`

5 days later `periodic daily`is locked on the same directory :-o

Any idea?

I can't help with the deadlock, but someone who _can_ help will probably ask for
the output of "procstat -kk PID" with the PID of the "find" process.

Eric

[root@avoriaz ~]# procstat -kk 85656
  PIDTID COMM TDNAME KSTACK
85656 101112 find -mi_switch+0xd2 
sleepq_wait+0x3a sleeplk+0x1b4 __lockmgr_args+0x356 vop_stdlock+0x3c 
VOP_LOCK1_APV+0x8d _vn_lock+0x43 vget+0x47 cache_lookup+0x679 
vfs_cache_lookup+0xac VOP_LOOKUP_APV+0x87 lookup+0x591 namei+0x572 
kern_statat+0xa8 sys_fstatat+0x2c amd64_syscall+0x4ce Xfast_syscall+0xfb




Henri
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-09 Thread Eric van Gyzen
On 11/09/2016 07:48, Henri Hennebert wrote:
> I encounter a strange deadlock on
> 
> FreeBSD avoriaz.restart.bel 11.0-RELEASE-p3 FreeBSD 11.0-RELEASE-p3 #0 
> r308260:
> Fri Nov  4 02:51:33 CET 2016
> r...@avoriaz.restart.bel:/usr/obj/usr/src/sys/AVORIAZ  amd64
> 
> This system is exclusively running on zfs.
> 
> After 3 or 4 days, `periodic daily` is locked in the directory 
> /usr/local/news/bin
> 
> [root@avoriaz ~]# ps xa|grep find
> 85656  -  D0:01.13 find / ( ! -fstype local -o -fstype rdonly ) -prune
> -o ( -name [#,]* -o -name .#* -o -name a.out -o -nam
>   462  1  S+   0:00.00 grep find
> [root@avoriaz ~]# procstat -f 85656
>   PID COMMFD T V FLAGSREF  OFFSET PRO NAME
> 85656 find  text v r r---   -   - - /usr/bin/find
> 85656 find   cwd v d r---   -   - - /usr/local/news/bin
> 85656 find  root v d r---   -   - - /
> 85656 find 0 v c r---   3   0 - /dev/null
> 85656 find 1 p - rw--   1   0 - -
> 85656 find 2 v r -w--   7  17 - -
> 85656 find 3 v d r---   1   0 - /home/root
> 85656 find 4 v d r---   1   0 - /home/root
> 85656 find 5 v d rn--   1 533545184 - /usr/local/news/bin
> [root@avoriaz ~]#
> 
> If I try `ls /usr/local/news/bin` it is also locked.
> 
> After `shutdown -r now` the system remain locked after the line '0 0 0 0 0 0'
> 
> After a reset and reboot  I can access /usr/local/news/bin.
> 
> I delete this directory and reinstall the package `portupgrade -fu news/inn`
> 
> 5 days later `periodic daily`is locked on the same directory :-o
> 
> Any idea?

I can't help with the deadlock, but someone who _can_ help will probably ask for
the output of "procstat -kk PID" with the PID of the "find" process.

Eric
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Freebsd 11.0 RELEASE - ZFS deadlock

2016-11-09 Thread Henri Hennebert

I encounter a strange deadlock on

FreeBSD avoriaz.restart.bel 11.0-RELEASE-p3 FreeBSD 11.0-RELEASE-p3 #0 
r308260: Fri Nov  4 02:51:33 CET 2016 
r...@avoriaz.restart.bel:/usr/obj/usr/src/sys/AVORIAZ  amd64


This system is exclusively running on zfs.

After 3 or 4 days, `periodic daily` is locked in the directory 
/usr/local/news/bin


[root@avoriaz ~]# ps xa|grep find
85656  -  D0:01.13 find / ( ! -fstype local -o -fstype rdonly ) 
-prune -o ( -name [#,]* -o -name .#* -o -name a.out -o -nam

  462  1  S+   0:00.00 grep find
[root@avoriaz ~]# procstat -f 85656
  PID COMMFD T V FLAGSREF  OFFSET PRO NAME
85656 find  text v r r---   -   - - /usr/bin/find
85656 find   cwd v d r---   -   - - /usr/local/news/bin
85656 find  root v d r---   -   - - /
85656 find 0 v c r---   3   0 - /dev/null
85656 find 1 p - rw--   1   0 - -
85656 find 2 v r -w--   7  17 - -
85656 find 3 v d r---   1   0 - /home/root
85656 find 4 v d r---   1   0 - /home/root
85656 find 5 v d rn--   1 533545184 - 
/usr/local/news/bin

[root@avoriaz ~]#

If I try `ls /usr/local/news/bin` it is also locked.

After `shutdown -r now` the system remain locked after the line '0 0 0 0 
0 0'


After a reset and reboot  I can access /usr/local/news/bin.

I delete this directory and reinstall the package `portupgrade -fu news/inn`

5 days later `periodic daily`is locked on the same directory :-o

Any idea?

Henri
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ZFS deadlock on rrl-rr_ -- look familiar to anyone?

2013-01-29 Thread Andriy Gapon
on 29/01/2013 05:21 Garrett Wollman said the following:
 When
 I restarted mountd, it hung waiting on rrl-rr_, but the system may
 already have been deadlocked at that point.  procstat reported:
 
 87678 104365 mountd   -mi_switch sleepq_wait _cv_wait 
 rrw_enter zfs_root lookup namei vfs_donmount sys_nmount amd64_syscall 
 Xfast_syscall 
...
 If it happens again

procstat -kk -a

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


ZFS deadlock on rrl-rr_ -- look familiar to anyone?

2013-01-28 Thread Garrett Wollman
I just had a big fileserver deadlock in an odd way.  I was
investigating a user's problem, and decided for various reasons to
restart mountd.  It had been complaining like this:

Jan 28 21:06:43 nfs-prod-1 mountd[1108]: can't delete exports for 
/usr/local/.zfs/snapshot/monthly-2013-01: Invalid argument 

for a while, which is odd because /usr/local was never exported.  When
I restarted mountd, it hung waiting on rrl-rr_, but the system may
already have been deadlocked at that point.  procstat reported:

87678 104365 mountd   -mi_switch sleepq_wait _cv_wait 
rrw_enter zfs_root lookup namei vfs_donmount sys_nmount amd64_syscall 
Xfast_syscall 

I was able to run shutdown, and the rc scripts eventually hung in
sync(1) and timed out.  The kernel then hung trying to do the same
thing, but I was able to break into the debugger.  The debugger
interrupted an idle thread, which was not particularly helpful, but I
was able to quickly gather the following information before I had to
reset the machine to restore normal service.

Locked vnodes


0xfe00536383c0: 0xfe00536383c0: tag syncer, type VNON
tag syncer, type VNON
usecount 1, writecount 0, refcount 2 mountedhere 0
usecount 1, writecount 0, refcount 2 mountedhere 0
flags (VI(0x200))
flags (VI(0x200))
lock type syncer: EXCL by thread 0xfe00348cc470 (pid 22)
lock type syncer: EXCL by thread 0xfe00348cc470 (pid 22)

db ps
  pid  ppid  pgrp   uid   state   wmesg wchancmd
87996 1 87994 65534  D   rrl-rr_ 0xfe0048ff8108 df
87976 1 87726 0  D+  rrl-rr_ 0xfe0048ff8108 sync
87707 1 87705 65534  D   rrl-rr_ 0xfe0048ff8108 df
87700 1 87698 65534  D   rrl-rr_ 0xfe0048ff8108 df
87678 1 87657 0  D+  rrl-rr_ 0xfe0048ff8108 mountd
87531 1 87529 65534  D   rrl-rr_ 0xfe0048ff8108 df
87387 1 87385 65534  D   rrl-rr_ 0xfe0048ff8108 df
87380 1 87378 65534  D   rrl-rr_ 0xfe0048ff8108 df
87103 1 87101 65534  D   rrl-rr_ 0xfe0048ff8108 df
87096 1 87094 65534  D   rrl-rr_ 0xfe0048ff8108 df
85193 1 85192 0  D   zio-io_ 0xfe10d3e75320 zfs
   24 0 0 0  DL  sdflush  0x80e50878 [softdepflush]
   23 0 0 0  DL  vlruwt   0xfe0048c0a940 [vnlru]
   22 0 0 0  DL  rrl-rr_ 0xfe0048ff8108 [syncer]
   21 0 0 0  DL  psleep   0x80e3c048 [bufdaemon]
   20 0 0 0  DL  pgzero   0x80e5a81c [pagezero]
   19 0 0 0  DL  psleep   0x80e599e8 [vmdaemon]
   18 0 0 0  DL  psleep   0x80e599ac [pagedaemon]
   17 0 0 0  DL  gkt:wait 0x80de6c0c [g_mp_kt]
   16 0 0 0  DL  ipmireq  0xfe00347400b8 [ipmi0: kcs]
9 0 0 0  DL  ccb_scan 0x80dc1360 [xpt_thrd]
8 0 0 0  DL  waiting_ 0x80e41e80 [sctp_iterator]
7 0 0 0  DL  (threaded)  [zfskern]
101355   D   tx-tx_s 0xfe0050342e10 [txg_thread_enter]
101354   D   tx-tx_q 0xfe0050342e30 [txg_thread_enter]
100989   D   tx-tx_s 0xfe004fd27a10 [txg_thread_enter]
100988   D   tx-tx_q 0xfe004fd27a30 [txg_thread_enter]
100593   D   tx-tx_s 0xfe004a8c0a10 [txg_thread_enter]
100592   D   tx-tx_q 0xfe004a8c0a30 [txg_thread_enter]
100216   D   l2arc_fe 0x81228bc0 [l2arc_feed_thread]
100215   D   arc_recl 0x81218d20 
[arc_reclaim_thread]
   15 0 0 0  DL  (threaded)  [usb]
[32 uninteresting and identical threads deleted]
6 0 0 0  DL  mps_scan 0xfe00276816a8 [mps_scan2]
5 0 0 0  DL  mps_scan 0xfe0027612ca8 [mps_scan1]
4 0 0 0  DL  mps_scan 0xfe00274ef4a8 [mps_scan0]
   14 0 0 0  DL  -0x80ded764 [yarrow]
3 0 0 0  DL  crypto_r 0x80e4e0a0 [crypto returns]
2 0 0 0  DL  crypto_w 0x80e4e060 [crypto]
   13 0 0 0  DL  (threaded)  [geom]
100055   D   -0x80de6b90 [g_down]
100054   D   -0x80de6b88 [g_up]
100053   D   -0x80de6b78 [g_event]
   12 0 0 0  WL  (threaded)  [intr]
100189   I   [irq1: atkbd0]
100188   I   [swi0: uart uart]
100187   I   [irq19: atapci1]
100186   I   [irq18: atapci0+]
100169   I   

8.0RC1, ZFS: deadlock

2009-09-29 Thread Borja Marcos


Hello,

I have observed a deadlock condition when using ZFS. We are making a  
heavy usage of zfs send/zfs receive to keep a replica of a dataset on  
a remote machine. It can be done at one minute intervals. Maybe we're  
doing a somehow atypical usage of ZFS, but, well, seems to be a great  
solution to keep filesystem replicas once this is sorted out.



How to reproduce:

Set up two systems. A dataset with heavy I/O activity is replicated  
from the first to the second one. I've used a dataset containing /usr/ 
obj while I did a make buildworld.


Replicate the dataset from the first machine to the second one using  
an incremental send


zfs send -i pool/data...@nminus1 pool/data...@n | ssh destination zfs  
receive -d pool


When there is read activity on the second system, reading the  
replicated system, I mean, having read access while zfs receive is  
updating it, there can be a deadlock. We have discovered this doing a  
test on a hopefully soon in production server, with 8 GB RAM. A Bacula  
backup agent was running and ZFS deadlocked.


I have set up a couple of VMWare Fussion virtual machines in order to  
test this, and it has deadlocked as well. The virtual machines have  
little memory, 512 MB, but I don't believe this is the actual problem.  
There is no complaint about lack of memory.


A running top shows processes stuck on zfsvfs

last pid:  2051;  load averages:  0.00,  0.07,  0.55up 0+01:18:25   
12:05:48

37 processes:  1 running, 36 sleeping
CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 18M Active, 20M Inact, 114M Wired, 40K Cache, 59M Buf, 327M Free
Swap: 1024M Total, 1024M Free

  PID USERNAME  THR PRI NICE   SIZERES STATE   C   TIME   WCPU  
COMMAND
 1914 root1  620 11932K  2564K zfsvfs  0   0:51  0.00%  
bsdtar

 1093 borjam  1  440  8304K  2464K CPU11   0:32  0.00% top
 1913 root1  540 11932K  2600K rrl-r  0   0:19  0.00%  
bsdtar

 1019 root1  440 25108K  4812K select  0   0:05  0.00% sshd
 2008 root1  760 13600K  1904K tx-tx  0   0:04  0.00% zfs
 1089 borjam  1  440 37040K  5216K select  1   0:04  0.00% sshd
  995 root1  760  8252K  2652K pause   0   0:02  0.00% csh
  840 root1  440 11044K  3828K select  1   0:02  0.00%  
sendmail

 1086 root1  760 37040K  5156K sbwait  1   0:01  0.00% sshd
  850 root1  440  6920K  1612K nanslp  0   0:01  0.00% cron
  607 root1  440  5992K  1540K select  1   0:01  0.00%  
syslogd

 1090 borjam  1  760  8252K  2636K pause   1   0:01  0.00% csh
  990 borjam  1  440 37040K  5220K select  0   0:00  0.00% sshd
  985 root1  480 37040K  5160K sbwait  1   0:00  0.00% sshd
  911 root1  440  8252K  2608K ttyin   0   0:00  0.00% csh
  991 borjam  1  560  8252K  2636K pause   0   0:00  0.00% csh
  844 smmsp   1  460 11044K  3852K pause   0   0:00  0.00%  
sendmail


Interestingly, this has blocked access to all the filesystems. I  
cannot, for instance, ssh into the machine anymore, even though all  
the system-important filesystems are on  ufs, I was just using ZFS for  
a test.


Any ideas on what information might be useful to collect? I have the  
vmware machine right now. I've made a couple of VMWare snapshots of  
it, first before breaking into DDB with the deadlock just started, the  
second being into DDB (I've broken into DDB with sysctl).


Also, a copy of the VMWare virtual machine with snapshots is avaiable  
on request. Your choice ;)







Borja.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 8.0RC1, ZFS: deadlock

2009-09-29 Thread Borja Marcos


On Sep 29, 2009, at 10:29 AM, Borja Marcos wrote:



Hello,

I have observed a deadlock condition when using ZFS. We are making a  
heavy usage of zfs send/zfs receive to keep a replica of a dataset  
on a remote machine. It can be done at one minute intervals. Maybe  
we're doing a somehow atypical usage of ZFS, but, well, seems to be  
a great solution to keep filesystem replicas once this is sorted out.



How to reproduce:

Set up two systems. A dataset with heavy I/O activity is replicated  
from the first to the second one. I've used a dataset containing / 
usr/obj while I did a make buildworld.


Replicate the dataset from the first machine to the second one using  
an incremental send


zfs send -i pool/data...@nminus1 pool/data...@n | ssh destination  
zfs receive -d pool


When there is read activity on the second system, reading the  
replicated system, I mean, having read access while zfs receive is  
updating it, there can be a deadlock. We have discovered this doing  
a test on a hopefully soon in production server, with 8 GB RAM. A  
Bacula backup agent was running and ZFS deadlocked.


Sorry, forgot to explain what was happening on the second system (the  
one receiving the incremental snapshots) for the deadlock to happen.


It was just running an endless loop, copying the contents of /usr/obj  
to another dataset, in order to keep the reading activity going on.


That's how it has deadlocked. On the original test system an rsync did  
the same trick.






Borja

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 8.0RC1, ZFS: deadlock

2009-09-29 Thread Borja Marcos


On Sep 29, 2009, at 10:29 AM, Borja Marcos wrote:

I have observed a deadlock condition when using ZFS. We are making a  
heavy usage of zfs send/zfs receive to keep a replica of a dataset  
on a remote machine. It can be done at one minute intervals. Maybe  
we're doing a somehow atypical usage of ZFS, but, well, seems to be  
a great solution to keep filesystem replicas once this is sorted out.


Not sure the backtraces screenshots will get through...

First one is the backtrace for the zfs command.

Second one, a tar process doing a cf - . on the dataset being  
replicated, sending to a pipe.


Third one, the receiving tar process, doing an xf - on a second  
dataset.











___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

ZFS deadlock

2008-04-08 Thread Johan Ström

Hello

A box of mine running RELENG_7_0 and ZFS over a couple of disks (6  
disks, 3 mirrors) seems to have gotten stuck. From Ctrl-T:


load: 0.50  cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock]  
0.02u 0.04s 0% 3404k
load: 0.43  cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock]  
0.02u 0.04s 0% 3404k
load: 0.10  cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock]  
0.02u 0.04s 0% 3404k
load: 0.10  cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock]  
0.02u 0.04s 0% 3404k
load: 0.11  cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock]  
0.02u 0.04s 0% 3404k


Worked for a while then that stopped working too (was over ssh). When  
trying a local login i only got


load: 0.09  cmd: login 1611 [zfs] 0.00u 0.00s 0% 208k

I found one post like this earlier (by Xin LI), but nobody seemed to  
have replied...
in my current conf, I think my kmem/kmem_max is at 512Mb (not sure  
though, since I've edited my file yesterday for next reboot), with 2G  
of system RAM.. Normally I'd run kmem(max) 1G (with arcsize of 512M.  
currently it is at default), but since I just got back to 2G total mem  
after some hardware problems I've been runnig at those lows (1G total  
is kindof tight with zfs..)


Well, just wanted to report... The box is not totally dead yet, ie I  
can still do Ctrl-T on console, but thats it.. I don't really know  
what more I can do so.. I don't have KDB/DDB.
I'll wait another hour or so before I hard reboot it, unless it  
unlocks or if anyone have any suggestions.


Thanks

--
Johan Ström
Stromnet
[EMAIL PROTECTED]
http://www.stromnet.se/


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ZFS deadlock

2008-04-08 Thread Jeremy Chadwick
On Tue, Apr 08, 2008 at 08:17:38AM +0200, Johan Ström wrote:
 Hello

 A box of mine running RELENG_7_0 and ZFS over a couple of disks (6 disks, 3 
 mirrors) seems to have gotten stuck. From Ctrl-T:

 load: 0.50  cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 
 0.04s 0% 3404k
 load: 0.43  cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 
 0.04s 0% 3404k
 load: 0.10  cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 
 0.04s 0% 3404k
 load: 0.10  cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 
 0.04s 0% 3404k
 load: 0.11  cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 
 0.04s 0% 3404k

 Worked for a while then that stopped working too (was over ssh). When 
 trying a local login i only got

 load: 0.09  cmd: login 1611 [zfs] 0.00u 0.00s 0% 208k

 I found one post like this earlier (by Xin LI), but nobody seemed to have 
 replied...
 in my current conf, I think my kmem/kmem_max is at 512Mb (not sure though, 
 since I've edited my file yesterday for next reboot), with 2G of system 
 RAM.. Normally I'd run kmem(max) 1G (with arcsize of 512M. currently it is 
 at default), but since I just got back to 2G total mem after some hardware 
 problems I've been runnig at those lows (1G total is kindof tight with 
 zfs..)

 Well, just wanted to report... The box is not totally dead yet, ie I can 
 still do Ctrl-T on console, but thats it.. I don't really know what more I 
 can do so.. I don't have KDB/DDB.
 I'll wait another hour or so before I hard reboot it, unless it unlocks 
 or if anyone have any suggestions.

I don't think there are any suggestions left to give.  Many people,
including myself, have experienced this kind of problem.  It's well-
documented both on my Common Issues page, and the official FreeBSD ZFS
Wiki.

ZFS is still considered highly experimental, so if your data is at all
important to you, perform backups or switch to another filesystem
provider.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ZFS deadlock

2008-04-08 Thread LI Xin

Johan Ström wrote:

Hello

A box of mine running RELENG_7_0 and ZFS over a couple of disks (6 
disks, 3 mirrors) seems to have gotten stuck. From Ctrl-T:


load: 0.50  cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 
0.02u 0.04s 0% 3404k
load: 0.43  cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 
0.02u 0.04s 0% 3404k
load: 0.10  cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 
0.02u 0.04s 0% 3404k
load: 0.10  cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 
0.02u 0.04s 0% 3404k
load: 0.11  cmd: zsh 40188 [zfs:buf_hash_table.ht_locks[i].ht_lock] 
0.02u 0.04s 0% 3404k


Worked for a while then that stopped working too (was over ssh). When 
trying a local login i only got


load: 0.09  cmd: login 1611 [zfs] 0.00u 0.00s 0% 208k

I found one post like this earlier (by Xin LI), but nobody seemed to 
have replied...
in my current conf, I think my kmem/kmem_max is at 512Mb (not sure 
though, since I've edited my file yesterday for next reboot), with 2G of 
system RAM.. Normally I'd run kmem(max) 1G (with arcsize of 512M. 
currently it is at default), but since I just got back to 2G total mem 
after some hardware problems I've been runnig at those lows (1G total is 
kindof tight with zfs..)


Well, just wanted to report... The box is not totally dead yet, ie I can 
still do Ctrl-T on console, but thats it.. I don't really know what more 
I can do so.. I don't have KDB/DDB.
I'll wait another hour or so before I hard reboot it, unless it 
unlocks or if anyone have any suggestions.


The key is to increase your kmem and prevent it from being exhausted.  I 
think more recent OpenSolaris's ZFS code has some improvements but I do 
not have spare devices at hand to test and debug :(


Maybe pjd@ would get a new import at some point?  I have cc'ed him.

Cheers,
--
Xin LI [EMAIL PROTECTED]http://www.delphij.net/
FreeBSD - The Power to Serve!



signature.asc
Description: OpenPGP digital signature


Re: ZFS deadlock

2008-04-08 Thread Johan Ström

On Apr 8, 2008, at 9:32 AM, Jeremy Chadwick wrote:


On Tue, Apr 08, 2008 at 08:17:38AM +0200, Johan Ström wrote:

Hello

A box of mine running RELENG_7_0 and ZFS over a couple of disks (6  
disks, 3

mirrors) seems to have gotten stuck. From Ctrl-T:

load: 0.50  cmd: zsh 40188  
[zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u

0.04s 0% 3404k
load: 0.43  cmd: zsh 40188  
[zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u

0.04s 0% 3404k
load: 0.10  cmd: zsh 40188  
[zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u

0.04s 0% 3404k
load: 0.10  cmd: zsh 40188  
[zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u

0.04s 0% 3404k
load: 0.11  cmd: zsh 40188  
[zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u

0.04s 0% 3404k

Worked for a while then that stopped working too (was over ssh). When
trying a local login i only got

load: 0.09  cmd: login 1611 [zfs] 0.00u 0.00s 0% 208k

I found one post like this earlier (by Xin LI), but nobody seemed  
to have

replied...
in my current conf, I think my kmem/kmem_max is at 512Mb (not sure  
though,
since I've edited my file yesterday for next reboot), with 2G of  
system
RAM.. Normally I'd run kmem(max) 1G (with arcsize of 512M.  
currently it is
at default), but since I just got back to 2G total mem after some  
hardware
problems I've been runnig at those lows (1G total is kindof tight  
with

zfs..)

Well, just wanted to report... The box is not totally dead yet, ie  
I can
still do Ctrl-T on console, but thats it.. I don't really know what  
more I

can do so.. I don't have KDB/DDB.
I'll wait another hour or so before I hard reboot it, unless it  
unlocks

or if anyone have any suggestions.


I don't think there are any suggestions left to give.  Many people,
including myself, have experienced this kind of problem.  It's well-
documented both on my Common Issues page, and the official FreeBSD ZFS
Wiki.


Ah.. I guess I was just to restrictive with the googling on  
zfs:buf_hash_table.ht_locks[i].ht_lock.



ZFS is still considered highly experimental, so if your data is at all
important to you, perform backups or switch to another filesystem
provider.


That I am aware of.

Thanks.___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ZFS deadlock

2008-04-08 Thread Johan Ström

On Apr 8, 2008, at 9:37 AM, LI Xin wrote:


Johan Ström wrote:

Hello
A box of mine running RELENG_7_0 and ZFS over a couple of disks (6  
disks, 3 mirrors) seems to have gotten stuck. From Ctrl-T:
load: 0.50  cmd: zsh 40188  
[zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k
load: 0.43  cmd: zsh 40188  
[zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k
load: 0.10  cmd: zsh 40188  
[zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k
load: 0.10  cmd: zsh 40188  
[zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k
load: 0.11  cmd: zsh 40188  
[zfs:buf_hash_table.ht_locks[i].ht_lock] 0.02u 0.04s 0% 3404k
Worked for a while then that stopped working too (was over ssh).  
When trying a local login i only got

load: 0.09  cmd: login 1611 [zfs] 0.00u 0.00s 0% 208k
I found one post like this earlier (by Xin LI), but nobody seemed  
to have replied...
in my current conf, I think my kmem/kmem_max is at 512Mb (not sure  
though, since I've edited my file yesterday for next reboot), with  
2G of system RAM.. Normally I'd run kmem(max) 1G (with arcsize of  
512M. currently it is at default), but since I just got back to 2G  
total mem after some hardware problems I've been runnig at those  
lows (1G total is kindof tight with zfs..)
Well, just wanted to report... The box is not totally dead yet, ie  
I can still do Ctrl-T on console, but thats it.. I don't really  
know what more I can do so.. I don't have KDB/DDB.
I'll wait another hour or so before I hard reboot it, unless it  
unlocks or if anyone have any suggestions.


The key is to increase your kmem and prevent it from being  
exhausted.  I think more recent OpenSolaris's ZFS code has some  
improvements but I do not have spare devices at hand to test and  
debug :(


Yep, never had the problem when I was running with 2G total mem, but  
then one stick (damn consumer crap) failed and I was left with 1G, and  
I started to have random problems. Going to tune kmem back up now when  
I got more mem again, thinking about putting in 4G too..





Maybe pjd@ would get a new import at some point?  I have cc'ed him.

Cheers,
--
Xin LI [EMAIL PROTECTED]http://www.delphij.net/
FreeBSD - The Power to Serve!



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ZFS deadlock

2008-04-08 Thread Johan Ström

On Apr 8, 2008, at 9:40 AM, LI Xin wrote:

For your question: just reboot would be fine, you may want to tune  
your arc size (to be smaller) and kmem space (to be larger), which  
would reduce the chance that this would happen, or eliminate it,  
depending on your workload.


Back online now, with kmem/kmem_max to 1G and arcsize to 512M. Are  
those reasonable on a 2G machine? I think I've read that from  
somewhere, but cannot find that (arc at least) in the TuningGuide now.




This situation is not recoverable and you can trust ZFS that you  
will not lose data if they are already sync'ed.




Actually, I've had a lot of hard crashes lately on this machine (bad  
hw) but not a single time I have lost data (to my knowledge at  
least...). In that regard, comparing to UFS, ZFS is waaay better! :)



--
Xin LI [EMAIL PROTECTED]http://www.delphij.net/
FreeBSD - The Power to Serve!



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ZFS deadlock

2008-04-08 Thread LI Xin
For your question: just reboot would be fine, you may want to tune your 
arc size (to be smaller) and kmem space (to be larger), which would 
reduce the chance that this would happen, or eliminate it, depending on 
your workload.


This situation is not recoverable and you can trust ZFS that you will 
not lose data if they are already sync'ed.


--
Xin LI [EMAIL PROTECTED]http://www.delphij.net/
FreeBSD - The Power to Serve!



signature.asc
Description: OpenPGP digital signature


Re: ZFS deadlock

2008-04-08 Thread Ender

Johan Ström wrote:

On Apr 8, 2008, at 9:40 AM, LI Xin wrote:

For your question: just reboot would be fine, you may want to tune 
your arc size (to be smaller) and kmem space (to be larger), which 
would reduce the chance that this would happen, or eliminate it, 
depending on your workload.


Back online now, with kmem/kmem_max to 1G and arcsize to 512M. Are 
those reasonable on a 2G machine? I think I've read that from 
somewhere, but cannot find that (arc at least) in the TuningGuide now.




Depending on  your work load you are just buying more time, so 
reasonable is a matter of perspective. :(  I didn't see if you said 
you are on 32bit or 64bit? Keep in mind the kmem max is 1.5-2G on amd64 
regardless of how much memory you have. If 512M arcsize crashes too soon 
for your tastes you can always lower it down to 256M, or 128M, etc.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ZFS deadlock

2008-04-08 Thread Ender

Spike Ilacqua wrote:
Depending on  your work load you are just buying more time, so 
reasonable is a matter of perspective. :(  I didn't see if you said 
you are on 32bit or 64bit? Keep in mind the kmem max is 1.5-2G on 
amd64 regardless of how much memory you have. If 512M arcsize crashes 
too soon for your tastes you can always lower it down to 256M, or 
128M, etc.


I tried for several weeks to get ZFS stable on a 64bit system with a 
1.5G kernel.  The best uptime I ever got was 72 hours, the worst was 
2, the average about 24.  Interestingly, most of the hangs were at off 
hours, when the system was lightly loaded, had lots of free memory, 
etc.  That suggests to me a slow leak of some sort.


Anyway, ZFS is not ready for production.  Some people may get lucky, 
but  you can't count on it.


Spike
Very intresting. With 1.5G of kmem and a 64M arc_max the best uptime I 
had was 5 days, worst 1 day. Also most of my crashes are off hours as 
well. Another tidbit of information running things out of /tank instead 
of  /tank/foo/bar/foo seems to lead to longer uptime, you might want to 
try that as well.



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ZFS deadlock

2008-04-08 Thread Spike Ilacqua
Depending on  your work load you are just buying more time, so 
reasonable is a matter of perspective. :(  I didn't see if you said 
you are on 32bit or 64bit? Keep in mind the kmem max is 1.5-2G on amd64 
regardless of how much memory you have. If 512M arcsize crashes too soon 
for your tastes you can always lower it down to 256M, or 128M, etc.


I tried for several weeks to get ZFS stable on a 64bit system with a 
1.5G kernel.  The best uptime I ever got was 72 hours, the worst was 2, 
the average about 24.  Interestingly, most of the hangs were at off 
hours, when the system was lightly loaded, had lots of free memory, etc. 
 That suggests to me a slow leak of some sort.


Anyway, ZFS is not ready for production.  Some people may get lucky, but 
 you can't count on it.


Spike
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ZFS deadlock

2008-04-08 Thread Jisakiel
¿So no chances of ZFS stable on FBSD7? I was actually considering debian over 
freebsd on a dual AMD64, but if there are no settings that will make it 
stable... Nevertheless I'd be willing to help debugging ZFS on that machine 
(Dell T105) as soon as I receive it in a couple of weeks, as I'm in no rush to 
getting it into production (just tell me what to do ;) ).  

- Mensaje original 
De: Spike Ilacqua [EMAIL PROTECTED]
Para: Ender [EMAIL PROTECTED]
CC: [EMAIL PROTECTED]; freebsd-stable@freebsd.org; Johan Ström [EMAIL 
PROTECTED]
Enviado: martes, 8 de abril, 2008 18:13:32
Asunto: Re: ZFS deadlock

 Depending on  your work load you are just buying more time, so 
 reasonable is a matter of perspective. :(  I didn't see if you said 
 you are on 32bit or 64bit? Keep in mind the kmem max is 1.5-2G on amd64 
 regardless of how much memory you have. If 512M arcsize crashes too soon 
 for your tastes you can always lower it down to 256M, or 128M, etc.

I tried for several weeks to get ZFS stable on a 64bit system with a 
1.5G kernel.  The best uptime I ever got was 72 hours, the worst was 2, 
the average about 24.  Interestingly, most of the hangs were at off 
hours, when the system was lightly loaded, had lots of free memory, etc. 
  That suggests to me a slow leak of some sort.

Anyway, ZFS is not ready for production.  Some people may get lucky, but 
  you can't count on it.

Spike
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]






  __ 
¿Con Mascota por primera vez? Sé un mejor Amigo. Entra en Yahoo! Respuestas 
http://es.answers.yahoo.com/info/welcome
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ZFS deadlock

2008-04-08 Thread Vince

It depends a lot on your workload I'd say.
for me its pretty stable on a amd64 7-STABLE box that just does a little 
light mail and web and package building.

for others not so much.

info on my system below if anyones interested.

Vince

(20:12:28 /usr/home/jhary) 0 $ more /boot/loader.conf
geom_mirror_load=YES
vm.kmem_size=768M
vm.kmem_size_max=768M
snd_emu10k1_load=YES
[EMAIL PROTECTED]
(20:12:39 /usr/home/jhary) 0 $ uptime
 8:12PM  up 13 days, 19:16, 5 users, load averages: 1.21, 0.86, 0.44
[EMAIL PROTECTED]
(20:12:50 /usr/home/jhary) 0 $ zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
data   164G  64.8G18K  /data
data/usr   163G  64.8G   163G  /usr
data/var   306M  64.8G   306M  /var
[EMAIL PROTECTED]
(20:13:00 /usr/home/jhary) 0 $ zpool status
  pool: data
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
dataONLINE   0 0 0
  mirrorONLINE   0 0 0
ad6s2   ONLINE   0 0 0
ad4s2   ONLINE   0 0 0

errors: No known data errors

relevent bits from dmesg:

CPU: AMD Opteron(tm) Processor 242 (1594.18-MHz K8-class CPU)
  Origin = AuthenticAMD  Id = 0xf5a  Stepping = 10

Features=0x78bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2
  AMD Features=0xe0500800SYSCALL,NX,MMX+,LM,3DNow!+,3DNow!
usable memory = 3210489856 (3061 MB)
avail memory  = 3103461376 (2959 MB)
ACPI APIC Table: A M I  OEMAPIC 
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1









Jisakiel wrote:
¿So no chances of ZFS stable on FBSD7? I was actually considering debian over freebsd on a dual AMD64, but if there are no settings that will make it stable... Nevertheless I'd be willing to help debugging ZFS on that machine (Dell T105) as soon as I receive it in a couple of weeks, as I'm in no rush to getting it into production (just tell me what to do ;) ).  


- Mensaje original 
De: Spike Ilacqua [EMAIL PROTECTED]
Para: Ender [EMAIL PROTECTED]
CC: [EMAIL PROTECTED]; freebsd-stable@freebsd.org; Johan Ström [EMAIL 
PROTECTED]
Enviado: martes, 8 de abril, 2008 18:13:32
Asunto: Re: ZFS deadlock

Depending on  your work load you are just buying more time, so 
reasonable is a matter of perspective. :(  I didn't see if you said 
you are on 32bit or 64bit? Keep in mind the kmem max is 1.5-2G on amd64 
regardless of how much memory you have. If 512M arcsize crashes too soon 
for your tastes you can always lower it down to 256M, or 128M, etc.


I tried for several weeks to get ZFS stable on a 64bit system with a 
1.5G kernel.  The best uptime I ever got was 72 hours, the worst was 2, 
the average about 24.  Interestingly, most of the hangs were at off 
hours, when the system was lightly loaded, had lots of free memory, etc. 
  That suggests to me a slow leak of some sort.


Anyway, ZFS is not ready for production.  Some people may get lucky, but 
  you can't count on it.


Spike
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]






  __ 
¿Con Mascota por primera vez? Sé un mejor Amigo. Entra en Yahoo! Respuestas http://es.answers.yahoo.com/info/welcome

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ZFS deadlock ?

2007-11-17 Thread Henri Hennebert

Henri Hennebert wrote:

Pawel Jakub Dawidek wrote:

On Sat, Nov 10, 2007 at 12:39:27PM +0100, Henri Hennebert wrote:

Pawel Jakub Dawidek wrote:

On Fri, Nov 09, 2007 at 05:37:00PM +0100, Henri Hennebert wrote:

hello

To push zfs, I launch 2 scrub at the same time, after ~20 seconds 
the system freeze:

[...]

I found a deadlock too. If it's reproducable for you, can you try this
patch:

I reproduce it after 30 minutes, si I try you patch.


http://people.freebsd.org/~pjd/patches/zgd_done.patch

when I try to load zfs.ko I get:

# kldload zfs
link_elf: symbol kproc_create undefined
kldload: can't load zfs: No such file or directory

What must I add to my config to resolve this symbol / problem


Ouch, you don't use HEAD. Try changing kproc_*() to kthread_*().


Today, after more than 10 scrubs, no deadlock. This patch is effective.

Maybe this is not related, but when I copy a 600MB file from zfs to a 
ufs under gjournal, my system freeze completely. A break on the serial 
console don't go to debugging!


Henri
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ZFS deadlock ?

2007-11-11 Thread Henri Hennebert

Pawel Jakub Dawidek wrote:

On Sat, Nov 10, 2007 at 12:39:27PM +0100, Henri Hennebert wrote:

Pawel Jakub Dawidek wrote:

On Fri, Nov 09, 2007 at 05:37:00PM +0100, Henri Hennebert wrote:

hello

To push zfs, I launch 2 scrub at the same time, after ~20 seconds the 
system freeze:

[...]

I found a deadlock too. If it's reproducable for you, can you try this
patch:

I reproduce it after 30 minutes, si I try you patch.


http://people.freebsd.org/~pjd/patches/zgd_done.patch

when I try to load zfs.ko I get:

# kldload zfs
link_elf: symbol kproc_create undefined
kldload: can't load zfs: No such file or directory

What must I add to my config to resolve this symbol / problem


Ouch, you don't use HEAD. Try changing kproc_*() to kthread_*().


Today, after more than 10 scrubs, no deadlock. This patch is effective.

Thanks

Henri

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ZFS deadlock ?

2007-11-10 Thread Pawel Jakub Dawidek
On Fri, Nov 09, 2007 at 05:37:00PM +0100, Henri Hennebert wrote:
 hello
 
 To push zfs, I launch 2 scrub at the same time, after ~20 seconds the 
 system freeze:
[...]

I found a deadlock too. If it's reproducable for you, can you try this
patch:

http://people.freebsd.org/~pjd/patches/zgd_done.patch

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpJHYGJwsy6i.pgp
Description: PGP signature


Re: ZFS deadlock ?

2007-11-10 Thread Henri Hennebert

Pawel Jakub Dawidek wrote:

On Fri, Nov 09, 2007 at 05:37:00PM +0100, Henri Hennebert wrote:

hello

To push zfs, I launch 2 scrub at the same time, after ~20 seconds the 
system freeze:

[...]

I found a deadlock too. If it's reproducable for you, can you try this
patch:


I reproduce it after 30 minutes, si I try you patch.



http://people.freebsd.org/~pjd/patches/zgd_done.patch


when I try to load zfs.ko I get:

# kldload zfs
link_elf: symbol kproc_create undefined
kldload: can't load zfs: No such file or directory

What must I add to my config to resolve this symbol / problem

Thanks

Henri





___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ZFS deadlock ?

2007-11-10 Thread Pawel Jakub Dawidek
On Sat, Nov 10, 2007 at 12:39:27PM +0100, Henri Hennebert wrote:
 Pawel Jakub Dawidek wrote:
 On Fri, Nov 09, 2007 at 05:37:00PM +0100, Henri Hennebert wrote:
 hello
 
 To push zfs, I launch 2 scrub at the same time, after ~20 seconds the 
 system freeze:
 [...]
 
 I found a deadlock too. If it's reproducable for you, can you try this
 patch:
 
 I reproduce it after 30 minutes, si I try you patch.
 
 
  http://people.freebsd.org/~pjd/patches/zgd_done.patch
 
 when I try to load zfs.ko I get:
 
 # kldload zfs
 link_elf: symbol kproc_create undefined
 kldload: can't load zfs: No such file or directory
 
 What must I add to my config to resolve this symbol / problem

Ouch, you don't use HEAD. Try changing kproc_*() to kthread_*().

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp0rgr3sIeHJ.pgp
Description: PGP signature


Re: ZFS deadlock ?

2007-11-10 Thread Henri Hennebert

Pawel Jakub Dawidek wrote:

On Sat, Nov 10, 2007 at 12:39:27PM +0100, Henri Hennebert wrote:

Pawel Jakub Dawidek wrote:

On Fri, Nov 09, 2007 at 05:37:00PM +0100, Henri Hennebert wrote:

hello

To push zfs, I launch 2 scrub at the same time, after ~20 seconds the 
system freeze:

[...]

I found a deadlock too. If it's reproducable for you, can you try this
patch:

I reproduce it after 30 minutes, si I try you patch.


http://people.freebsd.org/~pjd/patches/zgd_done.patch

when I try to load zfs.ko I get:

# kldload zfs
link_elf: symbol kproc_create undefined
kldload: can't load zfs: No such file or directory

What must I add to my config to resolve this symbol / problem


Ouch, you don't use HEAD. Try changing kproc_*() to kthread_*().


It load correctly now...

Moreover, no deadlock after multiple scrub in // and some buildworld to 
make sure...


look fine for my config :-)

Just to give credit to zfs, scrub encounter 2 IO errors without impact 
on my data :)


/var/log/messages:
Nov 10 15:33:00 morzine kernel: ad6: TIMEOUT - READ_DMA48 retrying (1 
retry left) LBA=299882429
Nov 10 15:33:06 morzine kernel: ad6: TIMEOUT - READ_DMA48 retrying (0 
retries left) LBA=299882429
Nov 10 15:33:12 morzine kernel: ad6: FAILURE - READ_DMA48 timed out 
LBA=299882429
Nov 10 16:55:53 morzine kernel: ad6: WARNING - SETFEATURES SET TRANSFER 
MODE taskqueue timeout - completing request directly
Nov 10 16:55:53 morzine kernel: ad6: TIMEOUT - READ_DMA48 retrying (1 
retry left) LBA=299883325
Nov 10 16:56:06 morzine kernel: ad6: TIMEOUT - READ_DMA48 retrying (0 
retries left) LBA=299883325
Nov 10 16:56:13 morzine kernel: ad6: FAILURE - READ_DMA48 timed out 
LBA=299883325


ZFS is realy great!

I will run more test tomorrow... and keep you posted

Thanks

Henri
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


ZFS deadlock ?

2007-11-09 Thread Henri Hennebert

hello

To push zfs, I launch 2 scrub at the same time, after ~20 seconds the 
system freeze:


zpool scrub pool0  zpool scrub pool2


My pools:

zpool status
  pool: pool0
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
pool0   ONLINE   0 0 0
  mirrorONLINE   0 0 0
da0s2   ONLINE   0 0 0
da1s2   ONLINE   0 0 0

errors: No known data errors

  pool: pool1
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
pool1   ONLINE   0 0 0
  da0s3 ONLINE   0 0 0
  da1s3 ONLINE   0 0 0

errors: No known data errors

  pool: pool2
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
pool2   ONLINE   0 0 0
  raidz1ONLINE   0 0 0
ad4s3   ONLINE   0 0 0
ad6s3   ONLINE   0 0 0

errors: No known data errors

I'm running 7.0-BETA2 with patch 
http://people.freebsd.org/~pjd/patches/vm_kern.c.2.patch


Root is on pool0

[EMAIL PROTECTED] ~]# df -h
FilesystemSizeUsed   Avail Capacity  Mounted on
pool0  34G 16M 34G 0%/
devfs 1.0K1.0K  0B   100%/dev
/dev/mirror/gm0s1a496M220M236M48%/bootfs
procfs4.0K4.0K  0B   100%/proc
pool0/home 35G1.5G 34G 4%/home
pool1  16G128K 16G 0%/pool1
pool1/qemu 24G 16G8.1G66%/pool1/qemu
pool1/squid12G 39M 12G 0%/pool1/squid
pool2  72G  0B 72G 0%/pool2
pool2/WorkBench64G 21G 43G33%/pool2/WorkBench
pool2/backup   32G6.8G 25G21%/pool2/backup
pool2/download 72G  0B 72G 0%/pool2/download
pool2/morzine  85G 13G 72G16%/pool2/morzine
pool2/qemu 16G 16G112M99%/pool2/qemu
pool2/sys  73G1.2G 72G 2%/pool2/sys
pool0/tmp  34G384K 34G 0%/tmp
pool0/usr  40G5.7G 34G14%/usr
pool0/var  34G116M 34G 0%/var
pool0/var/spool39G5.2G 34G13%/var/spool
devfs 1.0K1.0K  0B   100%/var/named/dev


I can break on my serial console and her are some informations:

db ps
  pid  ppid  pgrp   uid   state   wmesg wchancmd
 3425  3424  3425 0  RVs cron
 3424  1161  1161 0  S   ppwait   0xc5f3 cron
 3423   589  3423 0  S+  zfs:vq- 0xc5c0b334 zpool
 3419 0 0 0  SL  vgeom:io 0xc8ba1308 [vdev:worker ad6s3]
 3418 0 0 0  SL  vgeom:io 0xda70f748 [vdev:worker ad4s3]
 3417 0 0 0  SL  zfs:(sp 0xc56da318 [spa_scrub_thread]
 3415 0 0 0  SL  vgeom:io 0xd90229c8 [vdev:worker da1s2]
 3414 0 0 0  SL  vgeom:io 0xc5892208 [vdev:worker da0s2]
 3413 0 0 0  SL  zfs:(sp 0xc56db318 [spa_scrub_thread]
 3309   998   979 8  S   nanslp   0xc0890924 sleep
 3136   995   995 8  S   select   0xc089b778 initial thread
 2610  1490  2610 0  S+  select   0xc089b778 ssh
76040 1 76016  2001  S   select   0xc089b778 initial thread
76038 76034 76016  2001  S   (threaded)  firefox-bin
100333   S   ucond0xcb3fc080 firefox-bin
100327   S   ucond0xc8ba1e80 firefox-bin
100326   S   ucond0xc722abc0 firefox-bin
100323   S   ucond0xc5b56680 firefox-bin
1002850xcb466580 firefox-bin
100156   S   select   0xc089b778 firefox-bin
100441   S   select   0xc089b778 initial thread
76034 76030 76016  2001  S   wait 0xcca1e2a8 sh
76030 1 76016  2001  S   wait 0xcca1f000 sh
29979 29976 2997970  Rs  postgres
29978 29976 2997870  Rs  postgres
29976 1 2997670  Ss  select   0xc089b778 postgres
25774 1 2577425  Ss  pause0xcf074858 sendmail
25770 1 25770 0  Ss  select   0xc089b778 sendmail
  589   587   589 0  S+  wait 0xcbe18d48 bash
  587  1311   587  2001  Ss+ wait 0xca7e52a8 su
 7234  1486  7234 0  S+  select   0xc089b778 ssh
58922 1 58922 0  Ss  kqread   0xdb632200 cupsd
54279 1 5427953  Ss  (threaded)  named
100207   S   select   0xc089b778 named
100206   S   ucond0xc76cadc0 named
100205   S   ucond0xd1c3c440 named
100204   S   ucond0xd1c3dac0 named
 00291 

Re: ZFS deadlock ?

2007-11-09 Thread Richard Arends
On Fri, Nov 09, 2007 at 05:37:00PM +0100, Henri Hennebert wrote:

Henri,

 To push zfs, I launch 2 scrub at the same time, after ~20 seconds the 
 system freeze:
 
 zpool scrub pool0  zpool scrub pool2

This won't start the scrubs at the same time, but after each other. And
the second will only start if the first one not fails (exitcode == 0)

-- 
Regards,

Richard.

/* Homo Sapiens non urinat in ventum */
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ZFS deadlock ?

2007-11-09 Thread Henri Hennebert

Richard Arends wrote:

On Fri, Nov 09, 2007 at 05:37:00PM +0100, Henri Hennebert wrote:

Henri,

To push zfs, I launch 2 scrub at the same time, after ~20 seconds the 
system freeze:


zpool scrub pool0  zpool scrub pool2


This won't start the scrubs at the same time, but after each other. And
the second will only start if the first one not fails (exitcode == 0)


Not at all, the scrub is asynchronious, I'm sure of it

Henri
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ZFS deadlock ?

2007-11-09 Thread Richard Arends
On Fri, Nov 09, 2007 at 09:35:59PM +0100, Henri Hennebert wrote:

 This won't start the scrubs at the same time, but after each other. And
 the second will only start if the first one not fails (exitcode == 0)
 
 Not at all, the scrub is asynchronious, I'm sure of it

Running 2 commands seperated by  will not run at the same time. Scrub
could be asynchronious, i don't know, but that has nothing to do with the
way you are running it.

See: echo sleep 1  time sleep 2  echo sleep 2  time sleep 2
and: ls -l /notfound  echo yes

-- 
Regards,

Richard.

/* Homo Sapiens non urinat in ventum */
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ZFS deadlock ?

2007-11-09 Thread Henri Hennebert

Richard Arends wrote:

On Fri, Nov 09, 2007 at 09:35:59PM +0100, Henri Hennebert wrote:


This won't start the scrubs at the same time, but after each other. And
the second will only start if the first one not fails (exitcode == 0)


Not at all, the scrub is asynchronious, I'm sure of it


Running 2 commands seperated by  will not run at the same time. Scrub
could be asynchronious, i don't know, but that has nothing to do with the
way you are running it.

See: echo sleep 1  time sleep 2  echo sleep 2  time sleep 2
and: ls -l /notfound  echo yes


Per the man page, zpool scrub *begin* a scrub witch go on in background, 
so two scrubs are running simustaneously on 2 different pools.


Henri



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ZFS deadlock ?

2007-11-09 Thread Erik Osterholm
On Fri, Nov 09, 2007 at 11:28:27PM +0100, Henri Hennebert wrote:
 Richard Arends wrote:
 On Fri, Nov 09, 2007 at 09:35:59PM +0100, Henri Hennebert wrote:
 
 This won't start the scrubs at the same time, but after each other. And
 the second will only start if the first one not fails (exitcode == 0)
 
 Not at all, the scrub is asynchronious, I'm sure of it
 
 Running 2 commands seperated by  will not run at the same time. Scrub
 could be asynchronious, i don't know, but that has nothing to do with the
 way you are running it.
 
 See: echo sleep 1  time sleep 2  echo sleep 2  time sleep 2
 and: ls -l /notfound  echo yes
 
 Per the man page, zpool scrub *begin* a scrub witch go on in background, 
 so two scrubs are running simustaneously on 2 different pools.
 
 Henri

Henri is 100% correct.  zpool scrub kicks off a scrub which occurs in
the background.  I'm not sure I like this behavior that much, but it's
not like it's my call :)

lothos# zpool list
NAMESIZEUSED   AVAILCAP  HEALTH
ALTROOT
tank   1.81T368G   1.45T19%  ONLINE -

lothos# time sh -c zpool scrub tank  echo Done\?
Done?
0.000u 0.010s 0:04.35 0.2%  116+152k 14+0io 8pf+0w

lothos# zpool status tank
  pool: tank
 state: ONLINE
 scrub: scrub in progress, 3.97% done, 0h40m to go
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 0
  raidz1ONLINE   0 0 0
ad4 ONLINE   0 0 0
ad5 ONLINE   0 0 0
ad6 ONLINE   0 0 0
ad7 ONLINE   0 0 0

errors: No known data errors

Erik
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ZFS deadlock ?

2007-11-09 Thread Richard Arends
On Fri, Nov 09, 2007 at 11:28:27PM +0100, Henri Hennebert wrote:

Henri,

 See: echo sleep 1  time sleep 2  echo sleep 2  time sleep 2
 and: ls -l /notfound  echo yes
 
 Per the man page, zpool scrub *begin* a scrub witch go on in background, 
 so two scrubs are running simustaneously on 2 different pools.

Okay, i see. I did not know scrub background. I stand corrected! :)

-- 
Regards,

Richard.

/* Homo Sapiens non urinat in ventum */
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]