Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)

2012-04-25 Thread Nico Williams
On Wed, Apr 25, 2012 at 8:57 PM, Paul Kraus  wrote:
> On Wed, Apr 25, 2012 at 9:07 PM, Nico Williams  wrote:
>> Nothing's changed.  Automounter + data migration -> rebooting clients
>> (or close enough to rebooting).  I.e., outage.
>
>    Uhhh, not if you design your automounter architecture correctly
> and (as Richard said) have NFS clients that are not lame to which I'll
> add, automunters that actually work as advertised. I was designing
> automount architectures that permitted dynamic changes with minimal to
> no outages in the late 1990's. I only had a little over 100 clients
> (most of which were also servers) and NIS+ (NIS ver. 3) to distribute
> the indirect automount maps.

Further below you admit that you're talking about read-only data,
effectively.  But the world is not static.  Sure, *code* is by and
large static, and indeed, we segregated data by whether it was
read-only (code, historical data) or not (application data, home
directories).  We were able to migrated *read-only* data with no
outages.  But for the rest?  Yeah, there were always outages.  Of
course, we had a periodic maintenance window, with all systems
rebooting within a short period, and this meant that some data
migration outages were not noticeable, but they were real.

>    I also had to _redesign_ a number of automount strategies that
> were built by people who thought that using direct maps for everything
> was a good idea. That _was_ a pain in the a** due to the changes
> needed at the applications to point at a different hierarchy.

We used indirect maps almost exclusively.  Moreover, we used
hierarchical automount entries, and even -autofs mounts.  We also used
environment variables to control various things, such as which servers
to mount what from (this was particularly useful for spreading the
load on read-only static data).  We used practically every feature of
the automounter except for executable maps (and direct maps, when we
eventually stopped using those).

>    It all depends on _what_ the application is doing. Something that
> opens and locks a file and never releases the lock or closes the file
> until the application exits will require a restart of the application
> with an automounter / NFS approach.

No kidding!  In the real world such applications exist and get used.

Nico
--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)

2012-04-25 Thread Paul Kraus
On Wed, Apr 25, 2012 at 9:07 PM, Nico Williams  wrote:
> On Wed, Apr 25, 2012 at 7:37 PM, Richard Elling
>  wrote:
>> On Apr 25, 2012, at 3:36 PM, Nico Williams wrote:

>> > I disagree vehemently.  automount is a disaster because you need to
>> > synchronize changes with all those clients.  That's not realistic.
>>
>> Really?  I did it with NIS automount maps and 600+ clients back in 1991.
>> Other than the obvious problems with open files, has it gotten worse since
>> then?
>
> Nothing's changed.  Automounter + data migration -> rebooting clients
> (or close enough to rebooting).  I.e., outage.

Uhhh, not if you design your automounter architecture correctly
and (as Richard said) have NFS clients that are not lame to which I'll
add, automunters that actually work as advertised. I was designing
automount architectures that permitted dynamic changes with minimal to
no outages in the late 1990's. I only had a little over 100 clients
(most of which were also servers) and NIS+ (NIS ver. 3) to distribute
the indirect automount maps.

I also had to _redesign_ a number of automount strategies that
were built by people who thought that using direct maps for everything
was a good idea. That _was_ a pain in the a** due to the changes
needed at the applications to point at a different hierarchy.

It all depends on _what_ the application is doing. Something that
opens and locks a file and never releases the lock or closes the file
until the application exits will require a restart of the application
with an automounter / NFS approach.

-- 
{1-2-3-4-5-6-7-}
Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
-> Technical Advisor, RPI Players
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)

2012-04-25 Thread Nico Williams
On Wed, Apr 25, 2012 at 7:37 PM, Richard Elling
 wrote:
> On Apr 25, 2012, at 3:36 PM, Nico Williams wrote:
> > I disagree vehemently.  automount is a disaster because you need to
> > synchronize changes with all those clients.  That's not realistic.
>
> Really?  I did it with NIS automount maps and 600+ clients back in 1991.
> Other than the obvious problems with open files, has it gotten worse since
> then?

Nothing's changed.  Automounter + data migration -> rebooting clients
(or close enough to rebooting).  I.e., outage.

> Storage migration is much more difficult with NFSv2, NFSv3, NetWare, etc.

But not with AFS.  And spec-wise not with NFSv4 (though I don't know
if/when all NFSv4 clients will properly support migration, just that
the protocol and some servers do).

> With server-side, referral-based namespace construction that problem
> goes away, and the whole thing can be transparent w.r.t. migrations.

Yes.

> Agree, but we didn't have NFSv4 back in 1991 :-)  Today, of course, this
> is how one would design it if you had to design a new DFS today.

Indeed, that's why I built an automounter solution in 1996 (that's
still in use, I'm told).  Although to be fair AFS existed back then
and had global namespace and data migration back then, and was mature.
 It's taken NFS that long to catch up...

> >[...]
>
> Almost any of the popular nosql databases offer this and more.
> The movement away from POSIX-ish DFS and storing data in
> traditional "files" is inevitable. Even ZFS is a object store at its core.

I agree.  Except that there are applications where large octet streams
are needed.  HPC, media come to mind.

Nico
--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)

2012-04-25 Thread Richard Elling
On Apr 25, 2012, at 3:36 PM, Nico Williams wrote:

> On Wed, Apr 25, 2012 at 5:22 PM, Richard Elling
>  wrote:
>> Unified namespace doesn't relieve you of 240 cross-mounts (or equivalents).
>> FWIW,
>> automounters were invented 20+ years ago to handle this in a nearly seamless
>> manner.
>> Today, we have DFS from Microsoft and NFS referrals that almost eliminate
>> the need
>> for automounter-like solutions.
> 
> I disagree vehemently.  automount is a disaster because you need to
> synchronize changes with all those clients.  That's not realistic.

Really?  I did it with NIS automount maps and 600+ clients back in 1991.
Other than the obvious problems with open files, has it gotten worse since then?

> I've built a large automount-based namespace, replete with a
> distributed configuration system for setting the environment variables
> available to the automounter.  I can tell you this: the automounter
> does not scale, and it certainly does not avoid the need for outages
> when storage migrates.

Storage migration is much more difficult with NFSv2, NFSv3, NetWare, etc. 

> With server-side, referral-based namespace construction that problem
> goes away, and the whole thing can be transparent w.r.t. migrations.

Agree, but we didn't have NFSv4 back in 1991 :-)  Today, of course, this
is how one would design it if you had to design a new DFS today.

> 
> For my money the key features a DFS must have are:
> 
> - server-driven namespace construction
> - data migration without having to restart clients,
>   reconfigure them, or do anything at all to them
> - aggressive caching
> 
> - striping of file data for HPC and media environments
> 
> - semantics that ultimately allow multiple processes
>   on disparate clients to cooperate (i.e., byte range
>   locking), but I don't think full POSIX semantics are
>   needed

Almost any of the popular nosql databases offer this and more.
The movement away from POSIX-ish DFS and storing data in 
traditional "files" is inevitable. Even ZFS is a object store at its core.

>   (that said, I think O_EXCL is necessary, and it'd be
>   very nice to have O_APPEND, though the latter is
>   particularly difficult to implement and painful when
>   there's contention if you stripe file data across
>   multiple servers)

+1
 -- richard

--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422







___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)

2012-04-25 Thread Nico Williams
On Wed, Apr 25, 2012 at 5:22 PM, Richard Elling
 wrote:
> Unified namespace doesn't relieve you of 240 cross-mounts (or equivalents).
> FWIW,
> automounters were invented 20+ years ago to handle this in a nearly seamless
> manner.
> Today, we have DFS from Microsoft and NFS referrals that almost eliminate
> the need
> for automounter-like solutions.

I disagree vehemently.  automount is a disaster because you need to
synchronize changes with all those clients.  That's not realistic.
I've built a large automount-based namespace, replete with a
distributed configuration system for setting the environment variables
available to the automounter.  I can tell you this: the automounter
does not scale, and it certainly does not avoid the need for outages
when storage migrates.

With server-side, referral-based namespace construction that problem
goes away, and the whole thing can be transparent w.r.t. migrations.

For my money the key features a DFS must have are:

 - server-driven namespace construction
 - data migration without having to restart clients,
   reconfigure them, or do anything at all to them
 - aggressive caching

 - striping of file data for HPC and media environments

 - semantics that ultimately allow multiple processes
   on disparate clients to cooperate (i.e., byte range
   locking), but I don't think full POSIX semantics are
   needed

   (that said, I think O_EXCL is necessary, and it'd be
   very nice to have O_APPEND, though the latter is
   particularly difficult to implement and painful when
   there's contention if you stripe file data across
   multiple servers)

Nico
--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)

2012-04-25 Thread Paul Archer

2:34pm, Rich Teer wrote:


On Wed, 25 Apr 2012, Paul Archer wrote:


Simple. With a distributed FS, all nodes mount from a single DFS. With NFS,
each node would have to mount from each other node. With 16 nodes, that's
what, 240 mounts? Not to mention your data is in 16 different 
mounts/directory

structures, instead of being in a unified filespace.


Perhaps I'm being overly simplistic, but in this scenario, what would prevent
one from having, on a single file server, /exports/nodes/node[0-15], and then
having each node NFS-mount /exports/nodes from the server?  Much simplier 
than

your example, and all data is available on all machines/nodes.



That assumes the data set will fit on one machine, and that machine won't be a 
performance bottleneck.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)

2012-04-25 Thread Richard Elling
On Apr 25, 2012, at 2:26 PM, Paul Archer wrote:

> 2:20pm, Richard Elling wrote:
> 
>> On Apr 25, 2012, at 12:04 PM, Paul Archer wrote:
>> 
>>Interesting, something more complex than NFS to avoid the 
>> complexities of NFS? ;-)
>> 
>>  We have data coming in on multiple nodes (with local storage) that is 
>> needed on other multiple nodes. The only way
>>  to do that with NFS would be with a matrix of cross mounts that would 
>> be truly scary.
>> Ignoring lame NFS clients, how is that architecture different than what you 
>> would have 
>> with any other distributed file system? If all nodes share data to all other 
>> nodes, then...?
>>  -- richard
>> 
> 
> Simple. With a distributed FS, all nodes mount from a single DFS. With NFS, 
> each node would have to mount from each other node. With 16 nodes, that's 
> what, 240 mounts? Not to mention your data is in 16 different 
> mounts/directory structures, instead of being in a unified filespace.

Unified namespace doesn't relieve you of 240 cross-mounts (or equivalents). 
FWIW,
automounters were invented 20+ years ago to handle this in a nearly seamless 
manner.
Today, we have DFS from Microsoft and NFS referrals that almost eliminate the 
need
for automounter-like solutions.

Also, it is not unusual for a NFS environment to have 10,000+ mounts with 
thousands
of mounts on each server. No big deal, happens every day.

On Apr 25, 2012, at 2:53 PM, Nico Williams wrote:
> To be fair NFSv4 now has a distributed namespace scheme so you could
> still have a single mount on the client.  That said, some DFSes have
> better properties, such as striping of data across sets of servers,
> aggressive caching, and various choices of semantics (e.g., Lustre
> tries hard to give you POSIX cache coherency semantics).


I think this is where the real value is. NFS & CIFS are intentionally generic 
and have
caching policies that are favorably described as generic. For special-purpose 
workloads 
there can be advantages to having policies more explicitly applicable to the 
workload.
 -- richard

--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422







___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)

2012-04-25 Thread Bob Friesenhahn

On Wed, 25 Apr 2012, Rich Teer wrote:


Perhaps I'm being overly simplistic, but in this scenario, what would prevent
one from having, on a single file server, /exports/nodes/node[0-15], and then
having each node NFS-mount /exports/nodes from the server?  Much simplier 
than

your example, and all data is available on all machines/nodes.


This solution would limit bandwidth to that available from that single 
server.  With the cluster approach, the objective is for each machine 
in the cluster to primarily access files which are stored locally. 
Whole files could be moved as necessary.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)

2012-04-25 Thread Nico Williams
On Wed, Apr 25, 2012 at 4:26 PM, Paul Archer  wrote:
> 2:20pm, Richard Elling wrote:
>> Ignoring lame NFS clients, how is that architecture different than what
>> you would have
>> with any other distributed file system? If all nodes share data to all
>> other nodes, then...?
>
> Simple. With a distributed FS, all nodes mount from a single DFS. With NFS,
> each node would have to mount from each other node. With 16 nodes, that's
> what, 240 mounts? Not to mention your data is in 16 different
> mounts/directory structures, instead of being in a unified filespace.

To be fair NFSv4 now has a distributed namespace scheme so you could
still have a single mount on the client.  That said, some DFSes have
better properties, such as striping of data across sets of servers,
aggressive caching, and various choices of semantics (e.g., Lustre
tries hard to give you POSIX cache coherency semantics).

Nico
--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)

2012-04-25 Thread Rich Teer

On Wed, 25 Apr 2012, Paul Archer wrote:


Simple. With a distributed FS, all nodes mount from a single DFS. With NFS,
each node would have to mount from each other node. With 16 nodes, that's
what, 240 mounts? Not to mention your data is in 16 different mounts/directory
structures, instead of being in a unified filespace.


Perhaps I'm being overly simplistic, but in this scenario, what would prevent
one from having, on a single file server, /exports/nodes/node[0-15], and then
having each node NFS-mount /exports/nodes from the server?  Much simplier than
your example, and all data is available on all machines/nodes.

--
Rich Teer, Publisher
Vinylphile Magazine

www.vinylphilemag.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)

2012-04-25 Thread Paul Archer

2:20pm, Richard Elling wrote:


On Apr 25, 2012, at 12:04 PM, Paul Archer wrote:

Interesting, something more complex than NFS to avoid the 
complexities of NFS? ;-)

  We have data coming in on multiple nodes (with local storage) that is 
needed on other multiple nodes. The only way
  to do that with NFS would be with a matrix of cross mounts that would be 
truly scary.


Ignoring lame NFS clients, how is that architecture different than what you 
would have 
with any other distributed file system? If all nodes share data to all other 
nodes, then...?
 -- richard



Simple. With a distributed FS, all nodes mount from a single DFS. With NFS, 
each node would have to mount from each other node. With 16 nodes, that's 
what, 240 mounts? Not to mention your data is in 16 different mounts/directory 
structures, instead of being in a unified filespace.___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)

2012-04-25 Thread Richard Elling
On Apr 25, 2012, at 12:04 PM, Paul Archer wrote:

> 11:26am, Richard Elling wrote:
> 
>> On Apr 25, 2012, at 10:59 AM, Paul Archer wrote:
>> 
>>  The point of a clustered filesystem was to be able to spread our data 
>> out among all nodes and still have access
>>  from any node without having to run NFS. Size of the data set (once you 
>> get past the point where you can replicate
>>  it on each node) is irrelevant.
>> Interesting, something more complex than NFS to avoid the complexities of 
>> NFS? ;-)
> We have data coming in on multiple nodes (with local storage) that is needed 
> on other multiple nodes. The only way to do that with NFS would be with a 
> matrix of cross mounts that would be truly scary.


Ignoring lame NFS clients, how is that architecture different than what you 
would have 
with any other distributed file system? If all nodes share data to all other 
nodes, then...?
 -- richard

--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422







___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)

2012-04-25 Thread Robert Milkowski

And he will still need an underlying filesystem like ZFS for them :)


> -Original Message-
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Nico Williams
> Sent: 25 April 2012 20:32
> To: Paul Archer
> Cc: ZFS-Discuss mailing list
> Subject: Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs
FreeBSD)
> 
> I agree, you need something like AFS, Lustre, or pNFS.  And/or an NFS
proxy
> to those.
> 
> Nico
> --
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)

2012-04-25 Thread Nico Williams
I agree, you need something like AFS, Lustre, or pNFS.  And/or an NFS
proxy to those.

Nico
--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss