Thanks to all that replied. I hope we may continue the discussion,
but I'm afraid the overall verdict so far is disapproval of the idea.
It is my understanding that those active in discussion considered
it either too limited (in application - for VMs, or for hardware cfg),
or too difficult to implement, so that we should rather use some
alternative solutions. Or at least research them better (thanks Nico).
I guess I am happy to not have seen replies like "won't work
at all, period" or "useless, period". I get "Difficult" and "Limited"
and hope these can be worked around sometime, and hopefully
this discussion would spark some interest in other software
authors or customers to suggest more solutions and applications -
to make some shared ZFS a possibility ;)
Still, I would like to clear up some misunderstandings in replies -
because at times we seemed to have been speaking about
different architectures. Thanks to Richard, I stated what exact
hardware I had in mind (and wanted to use most efficiently)
while thinking about this problem, and how it is different from
"general" extensible computers or server+NAS networks.
Namely, with the shared storage architecture built into Intel
MFSYS25 blade chassis and lack of expansibility of servers
beyond that, some suggested solutions are not applicable
(10GbE, FC, Infiniband) but some networking problems
are already solved in hardware (full and equal connectivity
between all servers and all shared storage LUNs).
So some combined replies follow below:
2011-10-15, Richard Elling and Edward Ned Harver and Nico Williams wrote:
> #1 - You seem to be assuming storage is slower when it's on a remote storage
> server as opposed to a local disk. While this is typically true over
> ethernet, it's not necessarily true over infiniband or fibre channel.
Many people today are deploying 10GbE and it is relatively easy to get wire
for bandwidth and< 0.1 ms average access for storage.
Well, I am afraid I have to reiterate: for a number of reasons including
price, our customers are choosing some specific and relatively fixed
hardware solutions. So, time and again, I am afraid I'll have to remind
of the sandbox I'm tucked into - I have to do with these boxes, and I
want to do the best with them.
I understand that Richard comes from a background where HW is the
flexible part in equations and software is designed to be used for
years. But for many people (especially those oriented at fast-evolving
free software) the hardware is something they have to BUY and it
works unchanged as long as possible. This does not only cover
enthusiasts like the proverbial "red-eyed linuxoids", but also many
small businesses. I do still maintain several decade-old computers
running infrastructure tasks (luckily, floorspace and electricity are
near-free there) which were not yet virtualized because "if it ain't
broken - don't touch it!" ;)
In particular, the blade chassis in my example, which I hoped to
utilize to their best, using shared ZFS pools, have no extension
slots. There is no 10GbE for neither external RJ45 nor internal
ports (technically there is 10GbE interlink of two switch modules),
so each server blade is limited to have either 2 or 4 1Gbps ports.
There is no FC. No infiniband. There may be one extSAS link
on each storage controller module, that's it.
I think the biggest problem lies in requiring full
connectivity from every server to every LUN.
This is exactly (and the only) sort of connectivity available to
server blades in this chassis.
I think this is as applicable to networked storage where there
is a mesh of reliable connections between disk controllers
and disks (or at least LUNs), be it switched FC or dual-link
SAS or whatnot.
Doing something like VMotion would be largely pointless if the VM storage
still remains on the node that was previously the compute head.
True. However, in these Intel MFSYS25 boxes no server blade
has any local disks (unlike most other blades I know). Any disk
space is fed to them - and is equally accessible over a HA link -
from the storage controller modules (which are in turn connected
to the built-in array of hard-disks) that are a part of the chassis
shared by all servers, like the networking switches are.
If you do the same thing over ethernet, then the performance will be
degraded to ethernet speeds. So take it for granted, no matter what you do,
you either need a bus that performs just as well remotely versus locally...
Or else performance will be degraded... Or else it's kind of pointless
because the VM storage lives only on the system that you want to VMotion
Well, while this is no Infiniband, in terms of disk access this
paragraph is applicable to MFSYS chassis: disk access
via storage controller modules can be considered a fast
common bus - if this comforts readers into understanding
my idea better. And yes, I do also think that channeling
disk over ethernet via one of the servers is a bad thing
bound to degrade performance as opposed to what can
be had anyway with direct disk access.
Ethernet has *always* been faster than a HDD. Even back when we had 3/180s
10Mbps Ethernet it was faster than the 30ms average access time for the disks of
the day. I tested a simple server the other day and round-trip for 4KB of data
busy 1GbE switch was 0.2ms. Can you show a HDD as fast? Indeed many SSDs
have trouble reaching that rate under load.
As noted by other posters, access times are not bandwidth.
So these are two different "faster"'s ;) Besides, (1Gbps)
Ethernet is faster than a single HDD stream. But it is not
quite faster than an array of 14HDDs...
And if Ethernet is utilized by its direct tasks - whatever they
be, say video streaming off this server to 5000 viewers or
whatever is needed to saturate the network, disk access
over the same ethernet link would have to compete. And
whatever the QoS settings, viewers would lose - either the
real-time multimedia signal would lag, or the disk data to
Moreover, usage of an external NAS (a dedicated server
with Ethernet connection to the blade chassis) would make
an external box dedicated and perhaps optimized to storage
tasks (i.e. with ZIL/L2ARC), and would free up a blade for
VM farming needs, but it would consume much of the LAN
bandwidth of the blades using its storage services.
Today, HDDs aren't fast, and are not getting faster.
Well, typical consumer disks did get about 2-3 times faster for
linear RW speeds over the past decade; but for random access
they do still lag a lot. So, "agreed" ;)
zfs-discuss mailing list