On Sat, Oct 15, 2011 at 6:57 PM, Jim Klimov <jimkli...@cos.ru> wrote:
> Thanks to all that replied. I hope we may continue the discussion,
> but I'm afraid the overall verdict so far is disapproval of the idea.
> It is my understanding that those active in discussion considered
> it either too limited (in application - for VMs, or for hardware cfg),
> or too difficult to implement, so that we should rather use some
> alternative solutions. Or at least research them better (thanks Nico).
> I guess I am happy to not have seen replies like "won't work
> at all, period" or "useless, period". I get "Difficult" and "Limited"
> and hope these can be worked around sometime, and hopefully
> this discussion would spark some interest in other software
> authors or customers to suggest more solutions and applications -
> to make some shared ZFS a possibility ;)
> Still, I would like to clear up some misunderstandings in replies -
> because at times we seemed to have been speaking about
> different architectures. Thanks to Richard, I stated what exact
> hardware I had in mind (and wanted to use most efficiently)
> while thinking about this problem, and how it is different from
> "general" extensible computers or server+NAS networks.
> Namely, with the shared storage architecture built into Intel
> MFSYS25 blade chassis and lack of expansibility of servers
> beyond that, some suggested solutions are not applicable
> (10GbE, FC, Infiniband) but some networking problems
> are already solved in hardware (full and equal connectivity
> between all servers and all shared storage LUNs).
> So some combined replies follow below:
> 2011-10-15, Richard Elling and Edward Ned Harver and Nico Williams wrote:
>> > #1 - You seem to be assuming storage is slower when it's on a remote
>> > server as opposed to a local disk. While this is typically true over
>> > ethernet, it's not necessarily true over infiniband or fibre channel.
>> Many people today are deploying 10GbE and it is relatively easy to get
>> wire speed
>> for bandwidth and< 0.1 ms average access for storage.
> Well, I am afraid I have to reiterate: for a number of reasons including
> price, our customers are choosing some specific and relatively fixed
> hardware solutions. So, time and again, I am afraid I'll have to remind
> of the sandbox I'm tucked into - I have to do with these boxes, and I
> want to do the best with them.
> I understand that Richard comes from a background where HW is the
> flexible part in equations and software is designed to be used for
> years. But for many people (especially those oriented at fast-evolving
> free software) the hardware is something they have to BUY and it
> works unchanged as long as possible. This does not only cover
> enthusiasts like the proverbial "red-eyed linuxoids", but also many
> small businesses. I do still maintain several decade-old computers
> running infrastructure tasks (luckily, floorspace and electricity are
> near-free there) which were not yet virtualized because "if it ain't
> broken - don't touch it!" ;)
> In particular, the blade chassis in my example, which I hoped to
> utilize to their best, using shared ZFS pools, have no extension
> slots. There is no 10GbE for neither external RJ45 nor internal
> ports (technically there is 10GbE interlink of two switch modules),
> so each server blade is limited to have either 2 or 4 1Gbps ports.
> There is no FC. No infiniband. There may be one extSAS link
> on each storage controller module, that's it.
> I think the biggest problem lies in requiring full
>> connectivity from every server to every LUN.
> This is exactly (and the only) sort of connectivity available to
> server blades in this chassis.
> I think this is as applicable to networked storage where there
> is a mesh of reliable connections between disk controllers
> and disks (or at least LUNs), be it switched FC or dual-link
> SAS or whatnot.
> Doing something like VMotion would be largely pointless if the VM storage
>> still remains on the node that was previously the compute head.
> True. However, in these Intel MFSYS25 boxes no server blade
> has any local disks (unlike most other blades I know). Any disk
> space is fed to them - and is equally accessible over a HA link -
> from the storage controller modules (which are in turn connected
> to the built-in array of hard-disks) that are a part of the chassis
> shared by all servers, like the networking switches are.
> If you do the same thing over ethernet, then the performance will be
>> degraded to ethernet speeds. So take it for granted, no matter what you
>> you either need a bus that performs just as well remotely versus
>> Or else performance will be degraded... Or else it's kind of pointless
>> because the VM storage lives only on the system that you want to VMotion
>> away from.
> Well, while this is no Infiniband, in terms of disk access this
> paragraph is applicable to MFSYS chassis: disk access
> via storage controller modules can be considered a fast
> common bus - if this comforts readers into understanding
> my idea better. And yes, I do also think that channeling
> disk over ethernet via one of the servers is a bad thing
> bound to degrade performance as opposed to what can
> be had anyway with direct disk access.
> Ethernet has *always* been faster than a HDD. Even back when we had 3/180s
>> 10Mbps Ethernet it was faster than the 30ms average access time for the
>> disks of
>> the day. I tested a simple server the other day and round-trip for 4KB of
>> data on a
>> busy 1GbE switch was 0.2ms. Can you show a HDD as fast? Indeed many SSDs
>> have trouble reaching that rate under load.
> As noted by other posters, access times are not bandwidth.
> So these are two different "faster"'s ;) Besides, (1Gbps)
> Ethernet is faster than a single HDD stream. But it is not
> quite faster than an array of 14HDDs...
> And if Ethernet is utilized by its direct tasks - whatever they
> be, say video streaming off this server to 5000 viewers or
> whatever is needed to saturate the network, disk access
> over the same ethernet link would have to compete. And
> whatever the QoS settings, viewers would lose - either the
> real-time multimedia signal would lag, or the disk data to
> feed it.
> Moreover, usage of an external NAS (a dedicated server
> with Ethernet connection to the blade chassis) would make
> an external box dedicated and perhaps optimized to storage
> tasks (i.e. with ZIL/L2ARC), and would free up a blade for
> VM farming needs, but it would consume much of the LAN
> bandwidth of the blades using its storage services.
> Today, HDDs aren't fast, and are not getting faster.
>> -- richard
> Well, typical consumer disks did get about 2-3 times faster for
> linear RW speeds over the past decade; but for random access
> they do still lag a lot. So, "agreed" ;)
Quite frankly your choice in blade chassis was a horrible design decision.
From your description of its limitations it should never be the building
block for a vmware cluster in the first place. I would start by rethinking
that decision instead of trying to pound a round ZFS peg into a square hole.
zfs-discuss mailing list