Hello Nate,
I have few issues about ZFS and virtualization:
[b]Virtualization and performance[/b]
When filesystem traffic occurs on a zpool
containing
only spindles dedicated to this zpool i/o can be
distributed evenly. When the zpool is located on a
lun sliced from a raid group shared by multiple
systems the capability of doing i/o from this
zpool
will be limited. Avoiding or limiting i/o to this
lun
until the load from the other systems decreases
would
overall help performance for the local zpool.
I heard some rumors recently about using SMI-S to
de-virtualize the traffic and allow Solaris to
peek
through the virtualization layers thus optimizing
i/o
target selection. Maybe someone has some rumors to
add ;-)
Virtualization with 6920 has been briefly
discussed
at
http://www.opensolaris.org/jive/thread.jspa?messageID=14984#14984
but without conclusion or
recommendations.
I don't know the answer, but: Wouldn't the overhead
of using SMI-S, or some other method, to determine
the load on the raid group from the storage array,
negate any potential I/O benefits you could gain?
Avoiding or limiting I/O to a heavily used LUN in
your zpool would reduce the number of spindles in
your zpool thus reducing aggregate throughput
anyway(?).
Yes, you may be right on this. The current implementation
limiting outstanding i/o operations per lun now seems
more appropriate to me too.
Storage array layout best practices suggest, if at
all possible, to limit the number of LUNs you create
from a raid group. Exactly because of the I/O
limitations that you mention.
This is basically true, however in virtualized environment
you can not always ensure this because of the complexity.
You have spindles distributed across a raid group, sliced luns
from the raid group, virtualized them e.g. with 6920 and
distributed the virtualized luns possibly to different hosts
or zpools. Knowing which luns lead to which spindle might
help to optimize vdev selection.
I can understand building the smarts into ZFS to
handle multipath LUNs (LUNs presented out of more
than one controller on the array, active-active
configurations, not simply dual-fabric multipathing)
and load balance that way. Does ZFS simply take
advantage of MPxIO in Solaris for multipathing/load
balancing or are there plans to build support for it
into the file system itself?
This has already been discussed at
http://www.opensolaris.org/jive/thread.jspa?messageID=44278#44159
and
http://www.opensolaris.org/jive/thread.jspa?messageID=19248#19248
[b]Volume mobility[/b]
One of the major advantages of zfs is sharing of
the
zpool capacity between filesystems. I often run
application in small application containers
located
on separate luns which are zoned to several hosts
so
they can be run on different hosts. The idea
behind
this is failover, testing and load adjustment.
Because only complete zpools can be migrated
capacity
sharing between movable containers is currently
impossible.
Are there any plans to allow zpools to be
concurrently shareable between hosts?
Clarification, you're not asking for shared file
system behaviour are you?
No. A shared filesystem is concurrently mountable on multiple servers at the
same time. I was thinking of mounting [i]different[/i] filesystems from the
same pool on different servers, so each filesystem is mounted at most on one
server at the same time.
Multiple systems zoned to
see the same LUNs and simultaneously reading/writing
to them?
Yes, the LUNs must be visible to each host and simultaneaous writing will occur.
but I assume if you coordinated which server had
ownership of a zpool, there would be nothing from
stopping you from creating a zpool on servera with a
set of LUNs, creating your zfs file systems within
the pools, zoning the same set of LUNs to one or more
other servers, and then coordinating who has
ownership of the zpool.
This works out of the box with 'zpool export' and 'zpool import'
Ex: You're testing an application/data installed on a
ZFS file system on a 32-bit server (x86) system, then
you want to test it on an Opteron. So you zone the
LUNs to the Opteron and stop using the zpool on the
32-bit server and use it on the Opteron I may be
completely incorrect about the above.
This, too, works already out of the box.
Other than that scenario, I think your questions fit
more closely to the shared file system topic that I
brought up originally.
Do you mean
http://www.opensolaris.org/jive/click.jspa?searchID=98699messageID=16480 ?
Still if you had production
data in a ZFS file system in your pool as well as
test data in a separate ZFS file system also using
the same pool (your application container) the
disks making up your common pool would still have to
be visible to multiple servers and you probably would
want to limit exposure to the other ZFS file systems
within that pool on the