Thanks to all that replied. I hope we may continue the discussion, but I'm afraid the overall verdict so far is disapproval of the idea. It is my understanding that those active in discussion considered it either too limited (in application - for VMs, or for hardware cfg), or too difficult to implement, so that we should rather use some alternative solutions. Or at least research them better (thanks Nico).
I guess I am happy to not have seen replies like "won't work at all, period" or "useless, period". I get "Difficult" and "Limited" and hope these can be worked around sometime, and hopefully this discussion would spark some interest in other software authors or customers to suggest more solutions and applications - to make some shared ZFS a possibility ;) Still, I would like to clear up some misunderstandings in replies - because at times we seemed to have been speaking about different architectures. Thanks to Richard, I stated what exact hardware I had in mind (and wanted to use most efficiently) while thinking about this problem, and how it is different from "general" extensible computers or server+NAS networks. Namely, with the shared storage architecture built into Intel MFSYS25 blade chassis and lack of expansibility of servers beyond that, some suggested solutions are not applicable (10GbE, FC, Infiniband) but some networking problems are already solved in hardware (full and equal connectivity between all servers and all shared storage LUNs). So some combined replies follow below: 2011-10-15, Richard Elling and Edward Ned Harver and Nico Williams wrote:
> #1 - You seem to be assuming storage is slower when it's on a remote storage > server as opposed to a local disk. While this is typically true over > ethernet, it's not necessarily true over infiniband or fibre channel. Many people today are deploying 10GbE and it is relatively easy to get wire speed for bandwidth and< 0.1 ms average access for storage.
Well, I am afraid I have to reiterate: for a number of reasons including price, our customers are choosing some specific and relatively fixed hardware solutions. So, time and again, I am afraid I'll have to remind of the sandbox I'm tucked into - I have to do with these boxes, and I want to do the best with them. I understand that Richard comes from a background where HW is the flexible part in equations and software is designed to be used for years. But for many people (especially those oriented at fast-evolving free software) the hardware is something they have to BUY and it works unchanged as long as possible. This does not only cover enthusiasts like the proverbial "red-eyed linuxoids", but also many small businesses. I do still maintain several decade-old computers running infrastructure tasks (luckily, floorspace and electricity are near-free there) which were not yet virtualized because "if it ain't broken - don't touch it!" ;) In particular, the blade chassis in my example, which I hoped to utilize to their best, using shared ZFS pools, have no extension slots. There is no 10GbE for neither external RJ45 nor internal ports (technically there is 10GbE interlink of two switch modules), so each server blade is limited to have either 2 or 4 1Gbps ports. There is no FC. No infiniband. There may be one extSAS link on each storage controller module, that's it.
I think the biggest problem lies in requiring full connectivity from every server to every LUN.
This is exactly (and the only) sort of connectivity available to server blades in this chassis. I think this is as applicable to networked storage where there is a mesh of reliable connections between disk controllers and disks (or at least LUNs), be it switched FC or dual-link SAS or whatnot.
Doing something like VMotion would be largely pointless if the VM storage still remains on the node that was previously the compute head.
True. However, in these Intel MFSYS25 boxes no server blade has any local disks (unlike most other blades I know). Any disk space is fed to them - and is equally accessible over a HA link - from the storage controller modules (which are in turn connected to the built-in array of hard-disks) that are a part of the chassis shared by all servers, like the networking switches are.
If you do the same thing over ethernet, then the performance will be degraded to ethernet speeds. So take it for granted, no matter what you do, you either need a bus that performs just as well remotely versus locally... Or else performance will be degraded... Or else it's kind of pointless because the VM storage lives only on the system that you want to VMotion away from.
Well, while this is no Infiniband, in terms of disk access this paragraph is applicable to MFSYS chassis: disk access via storage controller modules can be considered a fast common bus - if this comforts readers into understanding my idea better. And yes, I do also think that channeling disk over ethernet via one of the servers is a bad thing bound to degrade performance as opposed to what can be had anyway with direct disk access.
Ethernet has *always* been faster than a HDD. Even back when we had 3/180s 10Mbps Ethernet it was faster than the 30ms average access time for the disks of the day. I tested a simple server the other day and round-trip for 4KB of data on a busy 1GbE switch was 0.2ms. Can you show a HDD as fast? Indeed many SSDs have trouble reaching that rate under load.
As noted by other posters, access times are not bandwidth. So these are two different "faster"'s ;) Besides, (1Gbps) Ethernet is faster than a single HDD stream. But it is not quite faster than an array of 14HDDs... And if Ethernet is utilized by its direct tasks - whatever they be, say video streaming off this server to 5000 viewers or whatever is needed to saturate the network, disk access over the same ethernet link would have to compete. And whatever the QoS settings, viewers would lose - either the real-time multimedia signal would lag, or the disk data to feed it. Moreover, usage of an external NAS (a dedicated server with Ethernet connection to the blade chassis) would make an external box dedicated and perhaps optimized to storage tasks (i.e. with ZIL/L2ARC), and would free up a blade for VM farming needs, but it would consume much of the LAN bandwidth of the blades using its storage services.
Today, HDDs aren't fast, and are not getting faster. -- richard
Well, typical consumer disks did get about 2-3 times faster for linear RW speeds over the past decade; but for random access they do still lag a lot. So, "agreed" ;) //Jim _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss