Re: ZFS panic with concurrent recv and read-heavy workload

2011-06-08 Thread Marius Strobl
On Fri, Jun 03, 2011 at 03:03:56AM -0400, Nathaniel W Filardo wrote:
 I just got this on another machine, no heavy workload needed, just booting
 and starting some jails.  Of interest, perhaps, both this and the machine
 triggering the below panic are SMP V240s with 1.5GHz CPUs (though I will
 confess that the machine in the original report may have had bad RAM).  I
 have run a UP 1.2GHz V240 for months and never seen this panic.
 
 This time the kernel is
  FreeBSD 9.0-CURRENT #9: Fri Jun  3 02:32:13 EDT 2011
 csup'd immediately before building.  The full panic this time is
  panic: Lock buf_hash_table.ht_locks[i].ht_lock not exclusively locked @
  /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4659
 
  cpuid = 1
  KDB: stack backtrace:
  panic() at panic+0x1c8
  _sx_assert() at _sx_assert+0xc4
  _sx_xunlock() at _sx_xunlock+0x98
  l2arc_feed_thread() at l2arc_feed_thread+0xeac
  fork_exit() at fork_exit+0x9c
  fork_trampoline() at fork_trampoline+0x8
 
  SC Alert: SC Request to send Break to host.
  KDB: enter: Line break on console
  [ thread pid 27 tid 100121 ]
  Stopped at  kdb_enter+0x80: ta  %xcc, 1
  db reset
  ttiimmeeoouutt  sshhuuiinngg  ddoowwnn  CCPPUUss..
 
 Half of the memory in this machine is new (well, came with the machine) and
 half is from the aforementioned UP V240 which seemed to work fine (I was
 attempting an upgrade when this happened); none of it (or indeed any of the
 hardware save the disk controller and disks) are common between this and the
 machine reporting below.
 
 Thoughts?  Any help would be greatly appreciated.
 Thanks.
 --nwf;
 
 On Wed, Apr 06, 2011 at 04:00:43AM -0400, Nathaniel W Filardo wrote:
 [...]
  panic: Lock buf_hash_table.ht_locks[i].ht_lock not exclusively locked @ 
  /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1869
 
  cpuid = 1
  KDB: stack backtrace:
  panic() at panic+0x1c8
  _sx_assert() at _sx_assert+0xc4
  _sx_xunlock() at _sx_xunlock+0x98
  arc_evict() at arc_evict+0x614
  arc_get_data_buf() at arc_get_data_buf+0x360
  arc_buf_alloc() at arc_buf_alloc+0x94
  dmu_buf_will_fill() at dmu_buf_will_fill+0xfc
  dmu_write() at dmu_write+0xec
  dmu_recv_stream() at dmu_recv_stream+0x8a8
  zfs_ioc_recv() at zfs_ioc_recv+0x354
  zfsdev_ioctl() at zfsdev_ioctl+0xe0
  devfs_ioctl_f() at devfs_ioctl_f+0xe8
  kern_ioctl() at kern_ioctl+0x294
  ioctl() at ioctl+0x198
  syscallenter() at syscallenter+0x270
  syscall() at syscall+0x74
  -- syscall (54, FreeBSD ELF64, ioctl) %o7=0x40c13e24 --
  userland() at 0x40e72cc8
  user trace: trap %o7=0x40c13e24
  pc 0x40e72cc8, sp 0x7fd4641
  pc 0x40c158f4, sp 0x7fd4721
  pc 0x40c1e878, sp 0x7fd47f1
  pc 0x40c1ce54, sp 0x7fd8b01
  pc 0x40c1dbe0, sp 0x7fd9431
  pc 0x40c1f718, sp 0x7fdd741
  pc 0x10731c, sp 0x7fdd831
  pc 0x10c90c, sp 0x7fdd8f1
  pc 0x103ef0, sp 0x7fde1d1
  pc 0x4021aff4, sp 0x7fde291
  done
 [...]

Apparently this is a locking issue in the ARC code, the ZFS people should
be able to help you.

Marius

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ZFS panic with concurrent recv and read-heavy workload

2011-06-03 Thread Nathaniel W Filardo
I just got this on another machine, no heavy workload needed, just booting
and starting some jails.  Of interest, perhaps, both this and the machine
triggering the below panic are SMP V240s with 1.5GHz CPUs (though I will
confess that the machine in the original report may have had bad RAM).  I
have run a UP 1.2GHz V240 for months and never seen this panic.

This time the kernel is
 FreeBSD 9.0-CURRENT #9: Fri Jun  3 02:32:13 EDT 2011
csup'd immediately before building.  The full panic this time is
 panic: Lock buf_hash_table.ht_locks[i].ht_lock not exclusively locked @
 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4659

 cpuid = 1
 KDB: stack backtrace:
 panic() at panic+0x1c8
 _sx_assert() at _sx_assert+0xc4
 _sx_xunlock() at _sx_xunlock+0x98
 l2arc_feed_thread() at l2arc_feed_thread+0xeac
 fork_exit() at fork_exit+0x9c
 fork_trampoline() at fork_trampoline+0x8

 SC Alert: SC Request to send Break to host.
 KDB: enter: Line break on console
 [ thread pid 27 tid 100121 ]
 Stopped at  kdb_enter+0x80: ta  %xcc, 1
 db reset
 ttiimmeeoouutt  sshhuuiinngg  ddoowwnn  CCPPUUss..

Half of the memory in this machine is new (well, came with the machine) and
half is from the aforementioned UP V240 which seemed to work fine (I was
attempting an upgrade when this happened); none of it (or indeed any of the
hardware save the disk controller and disks) are common between this and the
machine reporting below.

Thoughts?  Any help would be greatly appreciated.
Thanks.
--nwf;

On Wed, Apr 06, 2011 at 04:00:43AM -0400, Nathaniel W Filardo wrote:
[...]
 panic: Lock buf_hash_table.ht_locks[i].ht_lock not exclusively locked @ 
 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1869

 cpuid = 1
 KDB: stack backtrace:
 panic() at panic+0x1c8
 _sx_assert() at _sx_assert+0xc4
 _sx_xunlock() at _sx_xunlock+0x98
 arc_evict() at arc_evict+0x614
 arc_get_data_buf() at arc_get_data_buf+0x360
 arc_buf_alloc() at arc_buf_alloc+0x94
 dmu_buf_will_fill() at dmu_buf_will_fill+0xfc
 dmu_write() at dmu_write+0xec
 dmu_recv_stream() at dmu_recv_stream+0x8a8
 zfs_ioc_recv() at zfs_ioc_recv+0x354
 zfsdev_ioctl() at zfsdev_ioctl+0xe0
 devfs_ioctl_f() at devfs_ioctl_f+0xe8
 kern_ioctl() at kern_ioctl+0x294
 ioctl() at ioctl+0x198
 syscallenter() at syscallenter+0x270
 syscall() at syscall+0x74
 -- syscall (54, FreeBSD ELF64, ioctl) %o7=0x40c13e24 --
 userland() at 0x40e72cc8
 user trace: trap %o7=0x40c13e24
 pc 0x40e72cc8, sp 0x7fd4641
 pc 0x40c158f4, sp 0x7fd4721
 pc 0x40c1e878, sp 0x7fd47f1
 pc 0x40c1ce54, sp 0x7fd8b01
 pc 0x40c1dbe0, sp 0x7fd9431
 pc 0x40c1f718, sp 0x7fdd741
 pc 0x10731c, sp 0x7fdd831
 pc 0x10c90c, sp 0x7fdd8f1
 pc 0x103ef0, sp 0x7fde1d1
 pc 0x4021aff4, sp 0x7fde291
 done
[...]


pgpz83vKmukl9.pgp
Description: PGP signature


ZFS panic with concurrent recv and read-heavy workload

2011-04-06 Thread Nathaniel W Filardo
When racing two workloads, one doing
  zfs recv -v -d testpool
and the other
  find /testpool -type f -print0 | xargs -0 sha1
I can (seemingly reliably) trigger this panic:

panic: Lock buf_hash_table.ht_locks[i].ht_lock not exclusively locked @ 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1869
  

   
cpuid = 1   
   
KDB: stack backtrace:   
   
panic() at panic+0x1c8  
   
_sx_assert() at _sx_assert+0xc4 
   
_sx_xunlock() at _sx_xunlock+0x98   
   
arc_evict() at arc_evict+0x614  
   
arc_get_data_buf() at arc_get_data_buf+0x360
   
arc_buf_alloc() at arc_buf_alloc+0x94   
   
dmu_buf_will_fill() at dmu_buf_will_fill+0xfc
dmu_write() at dmu_write+0xec
dmu_recv_stream() at dmu_recv_stream+0x8a8  
   
zfs_ioc_recv() at zfs_ioc_recv+0x354
   
zfsdev_ioctl() at zfsdev_ioctl+0xe0 
   
devfs_ioctl_f() at devfs_ioctl_f+0xe8   
   
kern_ioctl() at kern_ioctl+0x294
   
ioctl() at ioctl+0x198
syscallenter() at syscallenter+0x270
syscall() at syscall+0x74   
   
-- syscall (54, FreeBSD ELF64, ioctl) %o7=0x40c13e24 -- 
   
userland() at 0x40e72cc8
   
user trace: trap %o7=0x40c13e24 
   
pc 0x40e72cc8, sp 0x7fd4641
pc 0x40c158f4, sp 0x7fd4721 
   
pc 0x40c1e878, sp 0x7fd47f1 
   
pc 0x40c1ce54, sp 0x7fd8b01 
   
pc 0x40c1dbe0, sp 0x7fd9431 
   
pc 0x40c1f718, sp 0x7fdd741 
   
pc 0x10731c, sp 0x7fdd831   
   
pc 0x10c90c, sp 0x7fdd8f1   
   
pc 0x103ef0, sp 0x7fde1d1   
   
pc 0x4021aff4, sp 0x7fde291 
   
done

The machine is a freshly installed and built sparc64 2-way SMP, running
today's -CURRENT with
http://people.freebsd.org/~mm/patches/zfs/zfs_ioctl_compat_bugfix.patch
applied.  Of note, it has only 1G of RAM in it, so kmem_max = 512M.

Thoughts?  More information?  Thanks in advance.
--nwf;


pgpo8tXy31jgF.pgp
Description: PGP signature