On Thu, Apr 26, 2012 at 12:37 PM, Richard Elling
<richard.ell...@gmail.com> wrote:
> [...]

NFSv4 had migration in the protocol (excluding protocols between
servers) from the get-go, but it was missing a lot (FedFS) and was not
implemented until recently.  I've no idea what clients and servers
support it adequately besides Solaris 11, though that's just my fault
(not being informed).  It's taken over a decade to get to where we
have any implementations of NFSv4 migration.

>> For me one of the exciting things about Lustre was/is the idea that
>> you could just have a single volume where all new data (and metadata)
>> is distributed evenly as you go.  Need more storage?  Plug it in,
>> either to an existing head or via a new head, then flip a switch and
>> there it is.  No need to manage allocation.  Migration may still be
>> needed, both within a cluster and between clusters, but that's much
>> more manageable when you have a protocol where data locations can be
>> all over the place in a completely transparent manner.
>
>
> Many distributed file systems do this, at the cost of being not quite
> POSIX-ish.

Well, Lustre does POSIX semantics just fine, including cache coherency
(as opposed to NFS' close-to-open coherency, which is decidedly
non-POSIX).

> In the brave new world of storage vmotion, nosql, and distributed object
> stores,
> it is not clear to me that coding to a POSIX file system is a strong
> requirement.

Well, I don't quite agree.  I'm very suspicious of
eventually-consistent.  I'm not saying that the enormous DBs that eBay
and such run should sport SQL and ACID semantics -- I'm saying that I
think we can do much better than eventually-consistent (and
no-language) while not paying the steep price that ACID requires.  I'm
not alone in this either.

The trick is to find the right compromise.  Close-to-open semantics
works out fine for NFS, but O_APPEND is too wonderful not to have
(ditto O_EXCL, which NFSv2 did not have; v4 has O_EXCL, but not
O_APPEND).

Whoever first delivers the right compromise in distributed DB
semantics stands to make a fortune.

> Perhaps people are so tainted by experiences with v2 and v3 that we can
> explain
> the non-migration to v4 as being due to poor marketing? As a leader of NFS,
> Sun
> had unimpressive marketing.

Sun did not do too much to improve NFS in the 90s, not compared to the
v4 work that only really started paying off only too recently.  And
then since Sun had lost the client space by then it doesn't mean all
that much to have the best server if the clients aren't able to take
advantage of the server's best features for lack of client
implementation.  Basically, Sun's ZFS, DTrace, SMF, NFSv4, Zones, and
other amazing innovations came a few years too late to make up for the
awful management that Sun was saddled with.  But for all the decidedly
awful things Sun management did (or didn't do), the worst was
terminating Sun PS (yes, worse that all the non-marketing, poor
marketing, poor acquisitions, poor strategy, and all the rest
including truly epic mistakes like icing Solaris on x86 a decade ago).
 One of the worst outcomes of the Sun debacle is that now there's a
bevy of senior execs who think the worst thing Sun did was to open
source Solaris and Java -- which isn't to say that Sun should have
open sourced as much as it did, or that open source is an end in
itself, but that open sourcing these things was legitimate a business
tool with very specific goals in mind in each case, and which had
nothing to do with the sinking of the company.  Or maybe that's one of
the best outcomes, because the good news about it is that those who
learn the right lessons (in that case: that open source is a
legitimate business tool that is sometimes, often even, a great
mind-share building tool) will be in the minority, and thus will have
a huge advantage over their competition.  That's another thing Sun did
not learn until it was too late: mind-share matters enormously to a
software company.

Nico
--
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to