On Fri, Oct 14, 2011 at 7:36 AM, Jim Klimov <jimkli...@cos.ru> wrote:
> 2011-10-14 15:53, Edward Ned Harvey пишет:
>>> boun...@opensolaris.org] On Behalf Of Jim Klimov
>>> I guess Richard was correct about the usecase description -
>>> I should detail what I'm thinking about, to give some illustration.
>> After reading all this, I'm still unclear on what you want to accomplish,
>> that isn't already done today. Yes I understand what it means when we say
>> ZFS is not a clustering filesystem, and yes I understand what benefits there
>> would be to gain if it were a clustering FS. But in all of what you're
>> saying below, I don't see that you need a clustering FS.
> In my example - probably not a completely clustered FS.
> A clustered ZFS pool with datasets individually owned by
> specific nodes at any given time would suffice for such
> VM farms. This would give users the benefits of ZFS
> (resilience, snapshots and clones, shared free space)
> merged with the speed of direct disk access instead of
> lagging through a storage server accessing these disks.
> This is why I think such a solution may be more simple
> than a fully-fledged POSIX-compliant shared FS, but it
> would still have some benefits for specific - and popular -
> usage cases. And it might pave way for a more complete
> solution - or perhaps illustrate what should not be done
> for those solutions ;)
> After all, I think that if the problem of safe multiple-node
> RW access to ZFS gets fundamentally solved, these
> usages I described before might just become a couple
> of new dataset types with specific predefined usage
> and limitations - like POSIX-compliant FS datasets
> and block-based volumes are now defined over ZFS.
> There is no reason not to call them "clustered FS and
> clustered volume datasets", for example ;)
> AFAIK, VMFS is not a generic filesystem, and cannot
> quite be used "directly" by software applications, but it
> has its target market for shared VM farming...
> I do not know how they solve the problems of consistency
> control - with master nodes or something else, and for
> the sake of patent un-encroaching, I'm afraid I'd rather
> not know - as to not copycat someone's solution and
> get burnt for that ;)
>> of these deployments become VMWare ESX farms with shared
>>> VMFS. Due to my stronger love for things Solaris, I would love
>>> to see ZFS and any of Solaris-based hypervisors (VBox, Xen
>>> or KVM ports) running there instead. But for things to be as
>>> efficient, ZFS would have to become shared - clustered...
>> I think the solution people currently use in this area is either NFS or
>> iscsi. (Or infiniband, and other flavors.) You have a storage server
>> presenting the storage to the various vmware (or whatever) hypervisors.
> In fact, no. Based on the MFSYS model, there is no storage server.
> There is a built-in storage controller which can do RAID over HDDs
> and represent SCSI LUNs to the blades over direct SAS access.
> These LUNs can be accessed individually by certain servers, or
> concurrently. In the latter case it is possible that servers take turns
> mounting the LUN as a HDD with some single-server FS, or use
> a clustered FS to use the LUN's disk space simultaneously.
> If we were to use in this system an OpenSolaris-based OS and
> VirtualBox/Xen/KVM as they are now, and hope for live migration
> of VMs without copying of data, we would have to make a separate
> LUN for each VM on the controller, and mount/import this LUN to
> its current running host. I don't need to explain why that would be
> a clumsy and unflexible solution for a near-infinite number of
> reasons, do i? ;)
> Everything works. What's missing? And why does this need to be a
>> clustering FS?
>> To be clearer, I should say that modern VM hypervisors can
>>> migrate running virtual machines between two VM hosts.
>> This works on NFS/iscsi/IB as well. Doesn't need a clustering FS.
> Except that the storage controller doesn't do NFS/iscsi/IB,
> and doesn't do snapshots and clones. And if I were to
> dedicate one or two out of six blades to storage tasks,
> this might be considered an improper waste of resources.
> And would repackage SAS access (anyway available to
> all blades at full bandwidth) into NFS/iscsi access over a
> Gbit link...
>> With clustered VMFS on shared storage, VMWare can
>>> migrate VMs faster - it knows not to copy the HDD image
>>> file in vain - it will be equally available to the "new host"
>>> at the correct point in migration, just as it was accessible
>>> to the "old host".
>> Again. NFS/iscsi/IB = ok.
> True, except that this is not an optimal solution in this described
> usecase - a farm of server blades with a relatively dumb fast raw
> storage (but NOT an intellectual storage server).
The idea is you would dedicate one of the servers in the chassis to be a
Solaris system, which then presents NFS out to the rest of the hosts. From
the chassis itself you would present every drive that isn't being used to
boot an existing server to this solaris host as individual disks, and let
that server take care of RAID and presenting out the storage to the rests of
the vmware hosts.
zfs-discuss mailing list