I have a test system with 132 (small) ZFS pools[*], as part of our
work to validate a new ZFS-based fileserver environment. In testing,
it appears that we can produce situations that will run the kernel out
of memory, or at least out of some resource such that things start
complaining 'bash: fork: Resource temporarily unavailable'. Sometimes
the system locks up solid.

 I've found at least two situations that reliably do this:
- trying to 'zpool scrub' each pool in sequence (waiting for each scrub
  to complete before starting the next one).
- starting simultaneous sequential read IO from all pools from a NFS client.
  (trying to do the same IO from the server basically kills the server
  entirely.)

 If I aggregate the same disk space into 12 pools instead of 132, the
same IO load does not kill the system.

 The ZFS machine is an X2100 M2 with 2GB of physical memory and 1GB
of swap, running 64-bit Solaris 10 U4 with an almost current set of
patches; it gets the storage from another machine via ISCSI. The pools
are non-redundant, with each vdev being a whole ISCSI LUN.

 Is this a known issue (or issues)? If this isn't a known issue, does
anyone have pointers to good tools to trace down what might be happening
and where memory is disappearing and so on? Does the system plain need
more memory for this number of pools and if so, does anyone know how
much?

 Thanks in advance.

(I was pointed to mdb -k's '::kmastat' by some people on the OpenSolaris
IRC channel but I haven't spotted anything particularly enlightening in
its output, and I can't run it once the system has gone over the edge.)

        - cks
[*: we have an outstanding uncertainty over how many ZFS pools a
    single system can sensibly support, so testing something larger
    than we'd use in production seemed sensible.]
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to