Re: zfs i/o hangs on 9-PRERELEASE

2011-12-08 Thread Mark Felder

Hi all,

Just wanted to report back that I found time to do more diagnostics. 
ZFS/FreeBSD/etc are not to blame. ZFS / FreeBSD never reported any I/O 
errors and scrubs always came up clean because one of the disks was 
failing and had issues reading the disk but would eventually return 
accurate data. It was sort of like having a RAIDZ with one disk that had 
30s latency sometimes! Seatools confirmed this disk was failing and 
after replacing the disk all issues went away.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: zfs i/o hangs on 9-PRERELEASE

2011-11-28 Thread Rick Macklem
Pawel Jakub Dawidek wrote:
> On Fri, Nov 25, 2011 at 01:20:01PM -0600, Mark Felder wrote:
> > 13:14:32 nas:~ > uname -a
> > FreeBSD nas.feld.me 9.0-PRERELEASE FreeBSD 9.0-PRERELEASE #3
> > r227971M:
> > Fri Nov 25 10:07:48 CST 2011
> > r...@nas.feld.me:/usr/obj/tank/svn/sys/GENERIC amd64
> >
> > This seemed to start happening sometime after RC1. I tried 8-STABLE
> > and
> > it's happening there too right now. I think whatever caused this was
> > MFC'd. I've also reproduced this on completely different hardware
> > running a single disk ZFS pool.
> >
> >
> > I'm getting this output in dmesg after these hangs I keep seeing.
> 
> Mark, those backtrace are not related to ZFS, but to PF. Not sure if
> they are at all related to your hangs. Most cases where ZFS I/O seems
> to
> hang are hardware problems, where I/O requests are not completed.
> 
He recently posted that his hangs went away when he stopped using NFS.
NFS does use uma_zalloc() and there are several places in pfioctl()
where uma_zalloc(...M_WAITOK...) is called (hidden under pool_get())
when a mutex (the PF_LOCK() one) is held.

I've emailed bz@ related to this.

I'm also not sure if they could be related to his hangs, but it seems
that if uma_zalloc() decides to sleep with the mutex held, something
may break and a broken uma_zalloc() would impact NFS.

rick

> 'procstat -kk -a' output might be useful once the hang happens.
> 
> > uma_zalloc_arg: zone "pfrktable" with the following non-sleepable
> > locks
> > held:
> > exclusive sleep mutex pf task mtx (pf task mtx) r = 0
> > (0x8199af20) locked @
> > /tank/svn/sys/modules/pf/../../contrib/pf/net/pf_ioctl.c:1589
> > KDB: stack backtrace:
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> > kdb_backtrace() at kdb_backtrace+0x37
> > _witness_debugger() at _witness_debugger+0x2e
> > witness_warn() at witness_warn+0x2c4
> > uma_zalloc_arg() at uma_zalloc_arg+0x335
> > pfr_create_ktable() at pfr_create_ktable+0xd8
> > pfr_ina_define() at pfr_ina_define+0x12b
> > pfioctl() at pfioctl+0x1c5a
> > devfs_ioctl_f() at devfs_ioctl_f+0x7a
> > kern_ioctl() at kern_ioctl+0xcd
> > sys_ioctl() at sys_ioctl+0xfd
> > amd64_syscall() at amd64_syscall+0x3ac
> > Xfast_syscall() at Xfast_syscall+0xf7
> > --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800da711c, rsp =
> > 0x7fff9d28, rbp = 0x7fffa1f0 ---
> 
> --
> Pawel Jakub Dawidek http://www.wheelsystems.com
> FreeBSD committer http://www.FreeBSD.org
> Am I Evil? Yes, I Am! http://yomoli.com
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: zfs i/o hangs on 9-PRERELEASE

2011-11-28 Thread Pawel Jakub Dawidek
On Fri, Nov 25, 2011 at 01:20:01PM -0600, Mark Felder wrote:
> 13:14:32 nas:~ > uname -a
> FreeBSD nas.feld.me 9.0-PRERELEASE FreeBSD 9.0-PRERELEASE #3 r227971M: 
> Fri Nov 25 10:07:48 CST 2011 
> r...@nas.feld.me:/usr/obj/tank/svn/sys/GENERIC  amd64
> 
> This seemed to start happening sometime after RC1. I tried 8-STABLE and 
> it's happening there too right now. I think whatever caused this was 
> MFC'd. I've also reproduced this on completely different hardware 
> running a single disk ZFS pool.
> 
> 
> I'm getting this output in dmesg after these hangs I keep seeing.

Mark, those backtrace are not related to ZFS, but to PF. Not sure if
they are at all related to your hangs. Most cases where ZFS I/O seems to
hang are hardware problems, where I/O requests are not completed.

'procstat -kk -a' output might be useful once the hang happens.

> uma_zalloc_arg: zone "pfrktable" with the following non-sleepable locks 
> held:
> exclusive sleep mutex pf task mtx (pf task mtx) r = 0 
> (0x8199af20) locked @ 
> /tank/svn/sys/modules/pf/../../contrib/pf/net/pf_ioctl.c:1589
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> kdb_backtrace() at kdb_backtrace+0x37
> _witness_debugger() at _witness_debugger+0x2e
> witness_warn() at witness_warn+0x2c4
> uma_zalloc_arg() at uma_zalloc_arg+0x335
> pfr_create_ktable() at pfr_create_ktable+0xd8
> pfr_ina_define() at pfr_ina_define+0x12b
> pfioctl() at pfioctl+0x1c5a
> devfs_ioctl_f() at devfs_ioctl_f+0x7a
> kern_ioctl() at kern_ioctl+0xcd
> sys_ioctl() at sys_ioctl+0xfd
> amd64_syscall() at amd64_syscall+0x3ac
> Xfast_syscall() at Xfast_syscall+0xf7
> --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800da711c, rsp = 
> 0x7fff9d28, rbp = 0x7fffa1f0 ---

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgp98vrUuromO.pgp
Description: PGP signature


Re: zfs i/o hangs on 9-PRERELEASE

2011-11-27 Thread Martin Matuska
On 28.11.2011 3:21, Mark Felder wrote:
> After many hours of testing, reproducing, and testing again I've
> finally been able to narrow down what the real issue is and it's not
> ZFS as I suspected. After completely turning off all NFS functionality
> and serving my files over Samba I haven't had a single issue. It seems
> there is something going on with the new NFS code (I serve out over
> v4, but reproduced it last week with v3) and my media player box, a
> Popcorn Hour A-200 which is running Linux. If I can cobble some
> hardware together and place it between so I can do some tcpdumps I
> will provide that data so perhaps someone can understand what's going
> on. If this is due to a badly behaving client this is potentially a
> DoS on the server.
>
>
> Regards,
>
>
>
> Mark
Hi Mark,

as to the output you have posted this seems to be a pf problem. Could
you try the same situation with with pf(4) disabled? If you are not able
to reproduce this hang with pf(4) disabled, it would be very nice to
have a PR submitted.

-- 
Martin Matuska
FreeBSD committer
http://blog.vx.sk

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: zfs i/o hangs on 9-PRERELEASE

2011-11-27 Thread Mark Felder
After many hours of testing, reproducing, and testing again I've 
finally been able to narrow down what the real issue is and it's not ZFS 
as I suspected. After completely turning off all NFS functionality and 
serving my files over Samba I haven't had a single issue. It seems there 
is something going on with the new NFS code (I serve out over v4, but 
reproduced it last week with v3) and my media player box, a Popcorn Hour 
A-200 which is running Linux. If I can cobble some hardware together and 
place it between so I can do some tcpdumps I will provide that data so 
perhaps someone can understand what's going on. If this is due to a 
badly behaving client this is potentially a DoS on the server.



Regards,



Mark
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: zfs i/o hangs on 9-PRERELEASE

2011-11-26 Thread Jeremy Chadwick
On Sat, Nov 26, 2011 at 04:47:35PM -0600, Mark Felder wrote:
> It appears that I'm mistaken about those messages then . However this does 
> both happen on my AMD x6 and Intel Atom machines with different hard drives, 
> controllers, etc. I feel it would be unlikely to be hardware. 
> 
> Unfortunately the procstat command is probably of no use because I can't 
> interact with the console or ssh for the periods of time when it is hanging 
> (sometimes in excess of a minute). Zpool scrubs come up clean and I never see 
> any errors reported. I've been running this hardware for 2 years and v28 for 
> quite some time. It doesn't seem like it started happening until I upgraded 
> to a build past RC1. I don't know where to find RC1 media and I don't know 
> the svn revision of RC1 so I haven't tried.

The kernel backtrace you provided indicates a problem in pf(4), not ZFS.
What piece am I missing?

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator   Mountain View, CA, US |
| Making life hard for others since 1977.   PGP 4BD6C0CB |

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: zfs i/o hangs on 9-PRERELEASE

2011-11-26 Thread Mark Felder
It appears that I'm mistaken about those messages then . However this does both 
happen on my AMD x6 and Intel Atom machines with different hard drives, 
controllers, etc. I feel it would be unlikely to be hardware. 

Unfortunately the procstat command is probably of no use because I can't 
interact with the console or ssh for the periods of time when it is hanging 
(sometimes in excess of a minute). Zpool scrubs come up clean and I never see 
any errors reported. I've been running this hardware for 2 years and v28 for 
quite some time. It doesn't seem like it started happening until I upgraded to 
a build past RC1. I don't know where to find RC1 media and I don't know the svn 
revision of RC1 so I haven't tried.



Regards,


Mark
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: zfs i/o hangs on 9-PRERELEASE

2011-11-26 Thread Andriy Gapon
on 25/11/2011 21:20 Mark Felder said the following:
> 13:14:32 nas:~ > uname -a
> FreeBSD nas.feld.me 9.0-PRERELEASE FreeBSD 9.0-PRERELEASE #3 r227971M: Fri Nov
> 25 10:07:48 CST 2011 r...@nas.feld.me:/usr/obj/tank/svn/sys/GENERIC  amd64
> 
> This seemed to start happening sometime after RC1. I tried 8-STABLE and it's
> happening there too right now. I think whatever caused this was MFC'd. I've 
> also
> reproduced this on completely different hardware running a single disk ZFS 
> pool.
> 
> 
> I'm getting this output in dmesg after these hangs I keep seeing.
> 
> 
> uma_zalloc_arg: zone "pfrktable" with the following non-sleepable locks held:
> exclusive sleep mutex pf task mtx (pf task mtx) r = 0 (0x8199af20)
> locked @ /tank/svn/sys/modules/pf/../../contrib/pf/net/pf_ioctl.c:1589
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> kdb_backtrace() at kdb_backtrace+0x37
> _witness_debugger() at _witness_debugger+0x2e
> witness_warn() at witness_warn+0x2c4
> uma_zalloc_arg() at uma_zalloc_arg+0x335
> pfr_create_ktable() at pfr_create_ktable+0xd8
> pfr_ina_define() at pfr_ina_define+0x12b
> pfioctl() at pfioctl+0x1c5a
> devfs_ioctl_f() at devfs_ioctl_f+0x7a
> kern_ioctl() at kern_ioctl+0xcd
> sys_ioctl() at sys_ioctl+0xfd
> amd64_syscall() at amd64_syscall+0x3ac
> Xfast_syscall() at Xfast_syscall+0xf7
> --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800da711c, rsp =
> 0x7fff9d28, rbp = 0x7fffa1f0 ---

Please note that all these messages are about pf.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: zfs i/o hangs on 9-PRERELEASE

2011-11-25 Thread Mark Felder

On 25.11.2011 13:39, Freddie Cash wrote:


There's a lot of uma_* stuff in there.  Just curious, what's the 
following

sysctl set to:

vfs.zfs.zio.use_uma

Back in the 8.x days, it was recommended to set it to 0 due to bugs:

http://lists.freebsd.org/pipermail/freebsd-stable/2010-June/057162.html

No idea if this is still the case or not, but you may want to try 
toggling

that sysctl and see if it makes a difference.



Confirmed mine is set to 0.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: zfs i/o hangs on 9-PRERELEASE

2011-11-25 Thread Freddie Cash
On Fri, Nov 25, 2011 at 11:20 AM, Mark Felder  wrote:

> 13:14:32 nas:~ > uname -a
> FreeBSD nas.feld.me 9.0-PRERELEASE FreeBSD 9.0-PRERELEASE #3 r227971M:
> Fri Nov 25 10:07:48 CST 2011 
> r...@nas.feld.me:/usr/obj/**tank/svn/sys/GENERIC
>  amd64
>
> This seemed to start happening sometime after RC1. I tried 8-STABLE and
> it's happening there too right now. I think whatever caused this was MFC'd.
> I've also reproduced this on completely different hardware running a single
> disk ZFS pool.
>
> I'm getting this output in dmesg after these hangs I keep seeing.
>
> uma_zalloc_arg: zone "pfrktable" with the following non-sleepable locks
> held:
>

There's a lot of uma_* stuff in there.  Just curious, what's the following
sysctl set to:

vfs.zfs.zio.use_uma

Back in the 8.x days, it was recommended to set it to 0 due to bugs:
http://lists.freebsd.org/pipermail/freebsd-stable/2010-June/057162.html

No idea if this is still the case or not, but you may want to try toggling
that sysctl and see if it makes a difference.

-- 
Freddie Cash
fjwc...@gmail.com
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"