Re: [zfs-discuss] cluster vs nfs

2012-04-25 Thread Fred Liu
I jump into this loop with different alternative -- ip-based block device.
And I saw few successful cases with "HAST + UCARP + ZFS + FreeBSD".
If zfsonlinux is robust enough, trying "DRBD + PACEMAKER + ZFS + LINUX" is
definitely encouraged.

Thanks.


Fred

> -Original Message-
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Nico Williams
> Sent: 星期四, 四月 26, 2012 14:00
> To: Richard Elling
> Cc: zfs-discuss@opensolaris.org
> Subject: Re: [zfs-discuss] cluster vs nfs
> 
> On Thu, Apr 26, 2012 at 12:10 AM, Richard Elling
>  wrote:
> > On Apr 25, 2012, at 8:30 PM, Carson Gaspar wrote:
> > Reboot requirement is a lame client implementation.
> 
> And lame protocol design.  You could possibly migrate read-write NFSv3
> on the fly by preserving FHs and somehow updating the clients to go to
> the new server (with a hiccup in between, no doubt), but only entire
> shares at a time -- you could not migrate only part of a volume with
> NFSv3.
> 
> Of course, having migration support in the protocol does not equate to
> getting it in the implementation, but it's certainly a good step in
> that direction.
> 
> > You are correct, a ZFS send/receive will result in different file
> handles on
> > the receiver, just like
> > rsync, tar, ufsdump+ufsrestore, etc.
> 
> That's understandable for NFSv2 and v3, but for v4 there's no reason
> that an NFSv4 server stack and ZFS could not arrange to preserve FHs
> (if, perhaps, at the price of making the v4 FHs rather large).
> Although even for v3 it should be possible for servers in a cluster to
> arrange to preserve devids...
> 
> Bottom line: live migration needs to be built right into the protocol.
> 
> For me one of the exciting things about Lustre was/is the idea that
> you could just have a single volume where all new data (and metadata)
> is distributed evenly as you go.  Need more storage?  Plug it in,
> either to an existing head or via a new head, then flip a switch and
> there it is.  No need to manage allocation.  Migration may still be
> needed, both within a cluster and between clusters, but that's much
> more manageable when you have a protocol where data locations can be
> all over the place in a completely transparent manner.
> 
> Nico
> --
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs

2012-04-25 Thread Nico Williams
On Thu, Apr 26, 2012 at 12:10 AM, Richard Elling
 wrote:
> On Apr 25, 2012, at 8:30 PM, Carson Gaspar wrote:
> Reboot requirement is a lame client implementation.

And lame protocol design.  You could possibly migrate read-write NFSv3
on the fly by preserving FHs and somehow updating the clients to go to
the new server (with a hiccup in between, no doubt), but only entire
shares at a time -- you could not migrate only part of a volume with
NFSv3.

Of course, having migration support in the protocol does not equate to
getting it in the implementation, but it's certainly a good step in
that direction.

> You are correct, a ZFS send/receive will result in different file handles on
> the receiver, just like
> rsync, tar, ufsdump+ufsrestore, etc.

That's understandable for NFSv2 and v3, but for v4 there's no reason
that an NFSv4 server stack and ZFS could not arrange to preserve FHs
(if, perhaps, at the price of making the v4 FHs rather large).
Although even for v3 it should be possible for servers in a cluster to
arrange to preserve devids...

Bottom line: live migration needs to be built right into the protocol.

For me one of the exciting things about Lustre was/is the idea that
you could just have a single volume where all new data (and metadata)
is distributed evenly as you go.  Need more storage?  Plug it in,
either to an existing head or via a new head, then flip a switch and
there it is.  No need to manage allocation.  Migration may still be
needed, both within a cluster and between clusters, but that's much
more manageable when you have a protocol where data locations can be
all over the place in a completely transparent manner.

Nico
--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs

2012-04-25 Thread Richard Elling
On Apr 25, 2012, at 8:30 PM, Carson Gaspar wrote:

> On 4/25/12 6:57 PM, Paul Kraus wrote:
>> On Wed, Apr 25, 2012 at 9:07 PM, Nico Williams  wrote:
>>> On Wed, Apr 25, 2012 at 7:37 PM, Richard Elling
>>>   wrote:
> 
>>> 
>>> Nothing's changed.  Automounter + data migration ->  rebooting clients
>>> (or close enough to rebooting).  I.e., outage.
>> 
>> Uhhh, not if you design your automounter architecture correctly
>> and (as Richard said) have NFS clients that are not lame to which I'll
>> add, automunters that actually work as advertised. I was designing
> 
> And applications that don't pin the mount points, and can be idled during the 
> migration. If your migration is due to a dead server, and you have pending 
> writes, you have no choice but to reboot the client(s) (and accept the data 
> loss, of course).

Reboot requirement is a lame client implementation.

> Which is why we use AFS for RO replicated data, and NetApp clusters with 
> SnapMirror and VIPs for RW data.
> 
> To bring this back to ZFS, sadly ZFS doesn't support NFS HA without shared / 
> replicated storage, as ZFS send / recv can't preserve the data necessary to 
> have the same NFS filehandle, so failing over to a replica causes stale NFS 
> filehandles on the clients. Which frustrates me, because the technology to do 
> NFS shadow copy (which is possible in Solaris - not sure about the open 
> source forks) is a superset of that needed to do HA, but can't be used for HA.

You are correct, a ZFS send/receive will result in different file handles on 
the receiver, just like
rsync, tar, ufsdump+ufsrestore, etc.

Do you mean the Sun ZFS Storage 7000 Shadow Migration feature?  This is not a 
HA feature, it
is an interposition architecture.

It is possible to preserve NFSv[23] file handles in a ZFS environment using 
lower-level replication
like TrueCopy, SRDF, AVS, etc. But those have other architectural issues (aka 
suckage). I am 
open to looking at what it would take to make a ZFS-friendly replicator that 
would do this, but
need to know the business case [1]

The beauty of AFS and others, is that the file handle equivalent is not a 
number. NFSv4 also has
this feature. So I have a little bit of heartburn when people say, "NFS sux 
because it has a feature
I won't use because I won't upgrade to NFSv4 even though it was released 10 
years ago."

As Nico points out, there are cases where you really need a Lustre, Ceph, 
Gluster, or other 
parallel file system. That is not the design point for ZFS's ZPL or volume 
interfaces.

[1] FWIW, you can build a metropolitan area ZFS-based, shared storage cluster 
today for about 1/4 
the cost of the NetApp Stretch Metro software license. There is more than one 
way to skin a cat :-)
So if the idea is to get even lower than 1/4 the NetApp cost, it feels like a 
race to the bottom.

 -- richard

--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422







___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)

2012-04-25 Thread Nico Williams
On Wed, Apr 25, 2012 at 8:57 PM, Paul Kraus  wrote:
> On Wed, Apr 25, 2012 at 9:07 PM, Nico Williams  wrote:
>> Nothing's changed.  Automounter + data migration -> rebooting clients
>> (or close enough to rebooting).  I.e., outage.
>
>    Uhhh, not if you design your automounter architecture correctly
> and (as Richard said) have NFS clients that are not lame to which I'll
> add, automunters that actually work as advertised. I was designing
> automount architectures that permitted dynamic changes with minimal to
> no outages in the late 1990's. I only had a little over 100 clients
> (most of which were also servers) and NIS+ (NIS ver. 3) to distribute
> the indirect automount maps.

Further below you admit that you're talking about read-only data,
effectively.  But the world is not static.  Sure, *code* is by and
large static, and indeed, we segregated data by whether it was
read-only (code, historical data) or not (application data, home
directories).  We were able to migrated *read-only* data with no
outages.  But for the rest?  Yeah, there were always outages.  Of
course, we had a periodic maintenance window, with all systems
rebooting within a short period, and this meant that some data
migration outages were not noticeable, but they were real.

>    I also had to _redesign_ a number of automount strategies that
> were built by people who thought that using direct maps for everything
> was a good idea. That _was_ a pain in the a** due to the changes
> needed at the applications to point at a different hierarchy.

We used indirect maps almost exclusively.  Moreover, we used
hierarchical automount entries, and even -autofs mounts.  We also used
environment variables to control various things, such as which servers
to mount what from (this was particularly useful for spreading the
load on read-only static data).  We used practically every feature of
the automounter except for executable maps (and direct maps, when we
eventually stopped using those).

>    It all depends on _what_ the application is doing. Something that
> opens and locks a file and never releases the lock or closes the file
> until the application exits will require a restart of the application
> with an automounter / NFS approach.

No kidding!  In the real world such applications exist and get used.

Nico
--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs

2012-04-25 Thread Carson Gaspar

On 4/25/12 6:57 PM, Paul Kraus wrote:

On Wed, Apr 25, 2012 at 9:07 PM, Nico Williams  wrote:

On Wed, Apr 25, 2012 at 7:37 PM, Richard Elling
  wrote:




Nothing's changed.  Automounter + data migration ->  rebooting clients
(or close enough to rebooting).  I.e., outage.


 Uhhh, not if you design your automounter architecture correctly
and (as Richard said) have NFS clients that are not lame to which I'll
add, automunters that actually work as advertised. I was designing


And applications that don't pin the mount points, and can be idled 
during the migration. If your migration is due to a dead server, and you 
have pending writes, you have no choice but to reboot the client(s) (and 
accept the data loss, of course).


Which is why we use AFS for RO replicated data, and NetApp clusters with 
SnapMirror and VIPs for RW data.


To bring this back to ZFS, sadly ZFS doesn't support NFS HA without 
shared / replicated storage, as ZFS send / recv can't preserve the data 
necessary to have the same NFS filehandle, so failing over to a replica 
causes stale NFS filehandles on the clients. Which frustrates me, 
because the technology to do NFS shadow copy (which is possible in 
Solaris - not sure about the open source forks) is a superset of that 
needed to do HA, but can't be used for HA.


--
Carson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)

2012-04-25 Thread Paul Kraus
On Wed, Apr 25, 2012 at 9:07 PM, Nico Williams  wrote:
> On Wed, Apr 25, 2012 at 7:37 PM, Richard Elling
>  wrote:
>> On Apr 25, 2012, at 3:36 PM, Nico Williams wrote:

>> > I disagree vehemently.  automount is a disaster because you need to
>> > synchronize changes with all those clients.  That's not realistic.
>>
>> Really?  I did it with NIS automount maps and 600+ clients back in 1991.
>> Other than the obvious problems with open files, has it gotten worse since
>> then?
>
> Nothing's changed.  Automounter + data migration -> rebooting clients
> (or close enough to rebooting).  I.e., outage.

Uhhh, not if you design your automounter architecture correctly
and (as Richard said) have NFS clients that are not lame to which I'll
add, automunters that actually work as advertised. I was designing
automount architectures that permitted dynamic changes with minimal to
no outages in the late 1990's. I only had a little over 100 clients
(most of which were also servers) and NIS+ (NIS ver. 3) to distribute
the indirect automount maps.

I also had to _redesign_ a number of automount strategies that
were built by people who thought that using direct maps for everything
was a good idea. That _was_ a pain in the a** due to the changes
needed at the applications to point at a different hierarchy.

It all depends on _what_ the application is doing. Something that
opens and locks a file and never releases the lock or closes the file
until the application exits will require a restart of the application
with an automounter / NFS approach.

-- 
{1-2-3-4-5-6-7-}
Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
-> Technical Advisor, RPI Players
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs

2012-04-25 Thread Paul Archer

Tomorrow, Ian Collins wrote:


On 04/26/12 10:34 AM, Paul Archer wrote:
That assumes the data set will fit on one machine, and that machine won't 
be a

performance bottleneck.


Aren't those general considerations when specifying a file server?

I suppose. But I meant specifically that our data will not fit on one single 
machine, and we are relying on spreading it across more nodes to get it on 
more spindles as well.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)

2012-04-25 Thread Nico Williams
On Wed, Apr 25, 2012 at 7:37 PM, Richard Elling
 wrote:
> On Apr 25, 2012, at 3:36 PM, Nico Williams wrote:
> > I disagree vehemently.  automount is a disaster because you need to
> > synchronize changes with all those clients.  That's not realistic.
>
> Really?  I did it with NIS automount maps and 600+ clients back in 1991.
> Other than the obvious problems with open files, has it gotten worse since
> then?

Nothing's changed.  Automounter + data migration -> rebooting clients
(or close enough to rebooting).  I.e., outage.

> Storage migration is much more difficult with NFSv2, NFSv3, NetWare, etc.

But not with AFS.  And spec-wise not with NFSv4 (though I don't know
if/when all NFSv4 clients will properly support migration, just that
the protocol and some servers do).

> With server-side, referral-based namespace construction that problem
> goes away, and the whole thing can be transparent w.r.t. migrations.

Yes.

> Agree, but we didn't have NFSv4 back in 1991 :-)  Today, of course, this
> is how one would design it if you had to design a new DFS today.

Indeed, that's why I built an automounter solution in 1996 (that's
still in use, I'm told).  Although to be fair AFS existed back then
and had global namespace and data migration back then, and was mature.
 It's taken NFS that long to catch up...

> >[...]
>
> Almost any of the popular nosql databases offer this and more.
> The movement away from POSIX-ish DFS and storing data in
> traditional "files" is inevitable. Even ZFS is a object store at its core.

I agree.  Except that there are applications where large octet streams
are needed.  HPC, media come to mind.

Nico
--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs

2012-04-25 Thread Nico Williams
On Wed, Apr 25, 2012 at 5:42 PM, Ian Collins  wrote:
> Aren't those general considerations when specifying a file server?

There are Lustre clusters with thousands of nodes, hundreds of them
being servers, and high utilization rates.  Whatever specs you might
have for one server head will not meet the demand that hundreds of the
same can.

Nico
--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)

2012-04-25 Thread Richard Elling
On Apr 25, 2012, at 3:36 PM, Nico Williams wrote:

> On Wed, Apr 25, 2012 at 5:22 PM, Richard Elling
>  wrote:
>> Unified namespace doesn't relieve you of 240 cross-mounts (or equivalents).
>> FWIW,
>> automounters were invented 20+ years ago to handle this in a nearly seamless
>> manner.
>> Today, we have DFS from Microsoft and NFS referrals that almost eliminate
>> the need
>> for automounter-like solutions.
> 
> I disagree vehemently.  automount is a disaster because you need to
> synchronize changes with all those clients.  That's not realistic.

Really?  I did it with NIS automount maps and 600+ clients back in 1991.
Other than the obvious problems with open files, has it gotten worse since then?

> I've built a large automount-based namespace, replete with a
> distributed configuration system for setting the environment variables
> available to the automounter.  I can tell you this: the automounter
> does not scale, and it certainly does not avoid the need for outages
> when storage migrates.

Storage migration is much more difficult with NFSv2, NFSv3, NetWare, etc. 

> With server-side, referral-based namespace construction that problem
> goes away, and the whole thing can be transparent w.r.t. migrations.

Agree, but we didn't have NFSv4 back in 1991 :-)  Today, of course, this
is how one would design it if you had to design a new DFS today.

> 
> For my money the key features a DFS must have are:
> 
> - server-driven namespace construction
> - data migration without having to restart clients,
>   reconfigure them, or do anything at all to them
> - aggressive caching
> 
> - striping of file data for HPC and media environments
> 
> - semantics that ultimately allow multiple processes
>   on disparate clients to cooperate (i.e., byte range
>   locking), but I don't think full POSIX semantics are
>   needed

Almost any of the popular nosql databases offer this and more.
The movement away from POSIX-ish DFS and storing data in 
traditional "files" is inevitable. Even ZFS is a object store at its core.

>   (that said, I think O_EXCL is necessary, and it'd be
>   very nice to have O_APPEND, though the latter is
>   particularly difficult to implement and painful when
>   there's contention if you stripe file data across
>   multiple servers)

+1
 -- richard

--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422







___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs

2012-04-25 Thread Ian Collins

On 04/26/12 10:34 AM, Paul Archer wrote:

2:34pm, Rich Teer wrote:


On Wed, 25 Apr 2012, Paul Archer wrote:


Simple. With a distributed FS, all nodes mount from a single DFS. With NFS,
each node would have to mount from each other node. With 16 nodes, that's
what, 240 mounts? Not to mention your data is in 16 different
mounts/directory
structures, instead of being in a unified filespace.

Perhaps I'm being overly simplistic, but in this scenario, what would prevent
one from having, on a single file server, /exports/nodes/node[0-15], and then
having each node NFS-mount /exports/nodes from the server?  Much simplier
than
your example, and all data is available on all machines/nodes.


That assumes the data set will fit on one machine, and that machine won't be a
performance bottleneck.


Aren't those general considerations when specifying a file server?

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)

2012-04-25 Thread Nico Williams
On Wed, Apr 25, 2012 at 5:22 PM, Richard Elling
 wrote:
> Unified namespace doesn't relieve you of 240 cross-mounts (or equivalents).
> FWIW,
> automounters were invented 20+ years ago to handle this in a nearly seamless
> manner.
> Today, we have DFS from Microsoft and NFS referrals that almost eliminate
> the need
> for automounter-like solutions.

I disagree vehemently.  automount is a disaster because you need to
synchronize changes with all those clients.  That's not realistic.
I've built a large automount-based namespace, replete with a
distributed configuration system for setting the environment variables
available to the automounter.  I can tell you this: the automounter
does not scale, and it certainly does not avoid the need for outages
when storage migrates.

With server-side, referral-based namespace construction that problem
goes away, and the whole thing can be transparent w.r.t. migrations.

For my money the key features a DFS must have are:

 - server-driven namespace construction
 - data migration without having to restart clients,
   reconfigure them, or do anything at all to them
 - aggressive caching

 - striping of file data for HPC and media environments

 - semantics that ultimately allow multiple processes
   on disparate clients to cooperate (i.e., byte range
   locking), but I don't think full POSIX semantics are
   needed

   (that said, I think O_EXCL is necessary, and it'd be
   very nice to have O_APPEND, though the latter is
   particularly difficult to implement and painful when
   there's contention if you stripe file data across
   multiple servers)

Nico
--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)

2012-04-25 Thread Paul Archer

2:34pm, Rich Teer wrote:


On Wed, 25 Apr 2012, Paul Archer wrote:


Simple. With a distributed FS, all nodes mount from a single DFS. With NFS,
each node would have to mount from each other node. With 16 nodes, that's
what, 240 mounts? Not to mention your data is in 16 different 
mounts/directory

structures, instead of being in a unified filespace.


Perhaps I'm being overly simplistic, but in this scenario, what would prevent
one from having, on a single file server, /exports/nodes/node[0-15], and then
having each node NFS-mount /exports/nodes from the server?  Much simplier 
than

your example, and all data is available on all machines/nodes.



That assumes the data set will fit on one machine, and that machine won't be a 
performance bottleneck.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)

2012-04-25 Thread Richard Elling
On Apr 25, 2012, at 2:26 PM, Paul Archer wrote:

> 2:20pm, Richard Elling wrote:
> 
>> On Apr 25, 2012, at 12:04 PM, Paul Archer wrote:
>> 
>>Interesting, something more complex than NFS to avoid the 
>> complexities of NFS? ;-)
>> 
>>  We have data coming in on multiple nodes (with local storage) that is 
>> needed on other multiple nodes. The only way
>>  to do that with NFS would be with a matrix of cross mounts that would 
>> be truly scary.
>> Ignoring lame NFS clients, how is that architecture different than what you 
>> would have 
>> with any other distributed file system? If all nodes share data to all other 
>> nodes, then...?
>>  -- richard
>> 
> 
> Simple. With a distributed FS, all nodes mount from a single DFS. With NFS, 
> each node would have to mount from each other node. With 16 nodes, that's 
> what, 240 mounts? Not to mention your data is in 16 different 
> mounts/directory structures, instead of being in a unified filespace.

Unified namespace doesn't relieve you of 240 cross-mounts (or equivalents). 
FWIW,
automounters were invented 20+ years ago to handle this in a nearly seamless 
manner.
Today, we have DFS from Microsoft and NFS referrals that almost eliminate the 
need
for automounter-like solutions.

Also, it is not unusual for a NFS environment to have 10,000+ mounts with 
thousands
of mounts on each server. No big deal, happens every day.

On Apr 25, 2012, at 2:53 PM, Nico Williams wrote:
> To be fair NFSv4 now has a distributed namespace scheme so you could
> still have a single mount on the client.  That said, some DFSes have
> better properties, such as striping of data across sets of servers,
> aggressive caching, and various choices of semantics (e.g., Lustre
> tries hard to give you POSIX cache coherency semantics).


I think this is where the real value is. NFS & CIFS are intentionally generic 
and have
caching policies that are favorably described as generic. For special-purpose 
workloads 
there can be advantages to having policies more explicitly applicable to the 
workload.
 -- richard

--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422







___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs

2012-04-25 Thread Ian Collins

On 04/26/12 09:54 AM, Bob Friesenhahn wrote:

On Wed, 25 Apr 2012, Rich Teer wrote:

Perhaps I'm being overly simplistic, but in this scenario, what would prevent
one from having, on a single file server, /exports/nodes/node[0-15], and then
having each node NFS-mount /exports/nodes from the server?  Much simplier
than
your example, and all data is available on all machines/nodes.

This solution would limit bandwidth to that available from that single
server.  With the cluster approach, the objective is for each machine
in the cluster to primarily access files which are stored locally.
Whole files could be moved as necessary.


Distributed software building faces similar issues, but I've found once 
the common files have been read (and cached) by each node, network 
traffic becomes one way (to the file server).  I guess that topology 
works well when most access to shared data is read.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)

2012-04-25 Thread Bob Friesenhahn

On Wed, 25 Apr 2012, Rich Teer wrote:


Perhaps I'm being overly simplistic, but in this scenario, what would prevent
one from having, on a single file server, /exports/nodes/node[0-15], and then
having each node NFS-mount /exports/nodes from the server?  Much simplier 
than

your example, and all data is available on all machines/nodes.


This solution would limit bandwidth to that available from that single 
server.  With the cluster approach, the objective is for each machine 
in the cluster to primarily access files which are stored locally. 
Whole files could be moved as necessary.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)

2012-04-25 Thread Nico Williams
On Wed, Apr 25, 2012 at 4:26 PM, Paul Archer  wrote:
> 2:20pm, Richard Elling wrote:
>> Ignoring lame NFS clients, how is that architecture different than what
>> you would have
>> with any other distributed file system? If all nodes share data to all
>> other nodes, then...?
>
> Simple. With a distributed FS, all nodes mount from a single DFS. With NFS,
> each node would have to mount from each other node. With 16 nodes, that's
> what, 240 mounts? Not to mention your data is in 16 different
> mounts/directory structures, instead of being in a unified filespace.

To be fair NFSv4 now has a distributed namespace scheme so you could
still have a single mount on the client.  That said, some DFSes have
better properties, such as striping of data across sets of servers,
aggressive caching, and various choices of semantics (e.g., Lustre
tries hard to give you POSIX cache coherency semantics).

Nico
--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)

2012-04-25 Thread Rich Teer

On Wed, 25 Apr 2012, Paul Archer wrote:


Simple. With a distributed FS, all nodes mount from a single DFS. With NFS,
each node would have to mount from each other node. With 16 nodes, that's
what, 240 mounts? Not to mention your data is in 16 different mounts/directory
structures, instead of being in a unified filespace.


Perhaps I'm being overly simplistic, but in this scenario, what would prevent
one from having, on a single file server, /exports/nodes/node[0-15], and then
having each node NFS-mount /exports/nodes from the server?  Much simplier than
your example, and all data is available on all machines/nodes.

--
Rich Teer, Publisher
Vinylphile Magazine

www.vinylphilemag.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)

2012-04-25 Thread Paul Archer

2:20pm, Richard Elling wrote:


On Apr 25, 2012, at 12:04 PM, Paul Archer wrote:

Interesting, something more complex than NFS to avoid the 
complexities of NFS? ;-)

  We have data coming in on multiple nodes (with local storage) that is 
needed on other multiple nodes. The only way
  to do that with NFS would be with a matrix of cross mounts that would be 
truly scary.


Ignoring lame NFS clients, how is that architecture different than what you 
would have 
with any other distributed file system? If all nodes share data to all other 
nodes, then...?
 -- richard



Simple. With a distributed FS, all nodes mount from a single DFS. With NFS, 
each node would have to mount from each other node. With 16 nodes, that's 
what, 240 mounts? Not to mention your data is in 16 different mounts/directory 
structures, instead of being in a unified filespace.___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)

2012-04-25 Thread Richard Elling
On Apr 25, 2012, at 12:04 PM, Paul Archer wrote:

> 11:26am, Richard Elling wrote:
> 
>> On Apr 25, 2012, at 10:59 AM, Paul Archer wrote:
>> 
>>  The point of a clustered filesystem was to be able to spread our data 
>> out among all nodes and still have access
>>  from any node without having to run NFS. Size of the data set (once you 
>> get past the point where you can replicate
>>  it on each node) is irrelevant.
>> Interesting, something more complex than NFS to avoid the complexities of 
>> NFS? ;-)
> We have data coming in on multiple nodes (with local storage) that is needed 
> on other multiple nodes. The only way to do that with NFS would be with a 
> matrix of cross mounts that would be truly scary.


Ignoring lame NFS clients, how is that architecture different than what you 
would have 
with any other distributed file system? If all nodes share data to all other 
nodes, then...?
 -- richard

--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422







___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)

2012-04-25 Thread Robert Milkowski

And he will still need an underlying filesystem like ZFS for them :)


> -Original Message-
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Nico Williams
> Sent: 25 April 2012 20:32
> To: Paul Archer
> Cc: ZFS-Discuss mailing list
> Subject: Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs
FreeBSD)
> 
> I agree, you need something like AFS, Lustre, or pNFS.  And/or an NFS
proxy
> to those.
> 
> Nico
> --
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)

2012-04-25 Thread Nico Williams
I agree, you need something like AFS, Lustre, or pNFS.  And/or an NFS
proxy to those.

Nico
--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on Linux vs FreeBSD

2012-04-25 Thread Nico Williams
As I understand it LLNL has very large datasets on ZFS on Linux.  You
could inquire with them, as well as
http://groups.google.com/a/zfsonlinux.org/group/zfs-discuss/topics?pli=1
.  My guess is that it's quite stable for at least some use cases
(most likely: LLNL's!), but that may not be yours.  You could
always... test it, but if you do then please tell us how it went :)

Nico
--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on Linux vs FreeBSD

2012-04-25 Thread Paul Archer

9:08pm, Stefan Ring wrote:


Sorry for not being able to contribute any ZoL experience. I've been
pondering whether it's worth trying for a few months myself already.
Last time I checked, it didn't support the .zfs directory (for
snapshot access), which you really don't want to miss after getting
used to it.

Actually, rc8 (or was it rc7?) introduced/implemented the .zfs directory. If 
you're upgrading, you need to reboot,  but other than that, it works 
perfectly.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on Linux vs FreeBSD

2012-04-25 Thread Paul Archer

>To put it slightly differently, if I used ZoL in production, would I be likely 
to experience performance or stability
problems?

I saw one team revert from ZoL (CentOS 6) back to ext on some backup servers 
for an application project, the killer  was
stat times (find running slow etc.), perhaps more layer 2 cache could have 
solved the problem, but it was easier to deploy
ext/lvm2.


Hmm... I've got 1.4TB in about 70K files in 2K directories, and a simple find 
on a cold FS took me about 6 seconds:


root@hoard22:/hpool/12/db# time find . -type d | wc
df -h
   20822082   32912

real0m5.923s
user0m0.052s
sys 0m1.012s


So I'd say I'm doing OK there. But I've got 10K disks and a fast SSD for 
caching.___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on Linux vs FreeBSD

2012-04-25 Thread Stefan Ring
> I saw one team revert from ZoL (CentOS 6) back to ext on some backup servers
> for an application project, the killer  was
> stat times (find running slow etc.), perhaps more layer 2 cache could have
> solved the problem, but it was easier to deploy ext/lvm2.

But stat times (think directory traversal) are horrible on ZFS/Solaris
as well, at least on a workstation-class machine that doesn't run
24/7. Maybe on an always-on server with 256GB RAM or more, things
would be different. For me, that's really the only pain point of using
ZFS.

Sorry for not being able to contribute any ZoL experience. I've been
pondering whether it's worth trying for a few months myself already.
Last time I checked, it didn't support the .zfs directory (for
snapshot access), which you really don't want to miss after getting
used to it.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)

2012-04-25 Thread Paul Archer




11:26am, Richard Elling wrote:


On Apr 25, 2012, at 10:59 AM, Paul Archer wrote:

  The point of a clustered filesystem was to be able to spread our data out 
among all nodes and still have access
  from any node without having to run NFS. Size of the data set (once you 
get past the point where you can replicate
  it on each node) is irrelevant.


Interesting, something more complex than NFS to avoid the complexities of NFS? 
;-)

We have data coming in on multiple nodes (with local storage) that is needed 
on other multiple nodes. The only way to do that with NFS would be with a 
matrix of cross mounts that would be truly scary.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on Linux vs FreeBSD

2012-04-25 Thread Jordan Schwartz
>To put it slightly differently, if I used ZoL in production, would I be
likely to experience performance or stability problems?

I saw one team revert from ZoL (CentOS 6) back to ext on some backup
servers for an application project, the killer  was
stat times (find running slow etc.), perhaps more layer 2 cache could have
solved the problem, but it was easier to deploy ext/lvm2.
The source filesystems were ext so zfs send/rcv was not an option.

You may want to check with the ZoL project about where there development is
with respect to performance, I heard that the focus
was on stability.


Jordan



On Wed, Apr 25, 2012 at 10:59 AM, Paul Archer  wrote:

> 9:59am, Richard Elling wrote:
>
>  On Apr 25, 2012, at 5:48 AM, Paul Archer wrote:
>>
>>  This may fall into the realm of a religious war (I hope not!), but
>> recently several people on this list have
>>  said/implied that ZFS was only acceptable for production use on
>> FreeBSD (or Solaris, of course) rather than Linux
>>  with ZoL.
>>
>>  I'm working on a project at work involving a large(-ish) amount of
>> data, about 5TB, working its way up to 12-15TB
>>
>>
>> This is pretty small by today's standards.  With 4TB disks, that is only
>> 3-4 disks + redundancy.
>>
>>  True. At my last job, we were used to researchers asking for individual
> 4-5TB filesystems, and 1-2TB increases in size. When I left, there was over
> a 100TB online (in '07).
>
>
>
>   eventually, spread among a dozen or so nodes. There may or may not
>> be a clustered filesystem involved (probably
>>  gluster if we use anything).
>>
>>
>> I wouldn't dream of building a clustered file system that small. Maybe
>> when you get into the
>> multiple-PB range, then it might make sense.
>>
>>  The point of a clustered filesystem was to be able to spread our data
> out among all nodes and still have access from any node without having to
> run NFS. Size of the data set (once you get past the point where you can
> replicate it on each node) is irrelevant.
>
>
>
>
>   I've been looking at ZoL as the primary filesystem for this data.
>> We're a Linux shop, so I'd rather not switch to
>>  FreeBSD, or any of the Solaris-derived distros--although I have no
>> problem with them, I just don't want to
>>  introduce another OS into the mix if I can avoid it.
>>
>>  So, the actual questions are:
>>
>>  Is ZoL really not ready for production use?
>>
>>  If not, what is holding it back? Features? Performance? Stability?
>>
>>
>> The computer science behind ZFS is sound. But it was also developed for
>> Solaris which
>> is quite different than Linux under the covers. So the Linux and other OS
>> ports have issues
>> around virtual memory system differences and fault management
>> differences. This is the
>> classic "getting it to work is 20% of the effort, getting it to work when
>> all else is failing is
>> the other 80%" case.
>>  -- richard
>>
>
> I understand the 80/20 rule. But this doesn't really answer the
> question(s). If there weren't any major differences among operating
> systems, the project probably would have been done long ago.
>
> To put it slightly differently, if I used ZoL in production, would I be
> likely to experience performance or stability problems? Or would it be
> lacking in features that I would likely need?
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on Linux vs FreeBSD

2012-04-25 Thread Richard Elling
On Apr 25, 2012, at 10:59 AM, Paul Archer wrote:

> 9:59am, Richard Elling wrote:
> 
>> On Apr 25, 2012, at 5:48 AM, Paul Archer wrote:
>> 
>>  This may fall into the realm of a religious war (I hope not!), but 
>> recently several people on this list have
>>  said/implied that ZFS was only acceptable for production use on FreeBSD 
>> (or Solaris, of course) rather than Linux
>>  with ZoL.
>> 
>>  I'm working on a project at work involving a large(-ish) amount of 
>> data, about 5TB, working its way up to 12-15TB
>> This is pretty small by today's standards.  With 4TB disks, that is only 3-4 
>> disks + redundancy.
> True. At my last job, we were used to researchers asking for individual 4-5TB 
> filesystems, and 1-2TB increases in size. When I left, there was over a 100TB 
> online (in '07).

100TB is medium sized for today's systems, about 4RU or less :-)

>>  eventually, spread among a dozen or so nodes. There may or may not be a 
>> clustered filesystem involved (probably
>>  gluster if we use anything).
>> I wouldn't dream of building a clustered file system that small. Maybe when 
>> you get into the
>> multiple-PB range, then it might make sense.
>> 
> The point of a clustered filesystem was to be able to spread our data out 
> among all nodes and still have access from any node without having to run 
> NFS. Size of the data set (once you get past the point where you can 
> replicate it on each node) is irrelevant.

Interesting, something more complex than NFS to avoid the complexities of NFS? 
;-)

>>  I've been looking at ZoL as the primary filesystem for this data. We're 
>> a Linux shop, so I'd rather not switch to
>>  FreeBSD, or any of the Solaris-derived distros--although I have no 
>> problem with them, I just don't want to
>>  introduce another OS into the mix if I can avoid it.
>> 
>>  So, the actual questions are:
>> 
>>  Is ZoL really not ready for production use?
>> 
>>  If not, what is holding it back? Features? Performance? Stability?
>> The computer science behind ZFS is sound. But it was also developed for 
>> Solaris which
>> is quite different than Linux under the covers. So the Linux and other OS 
>> ports have issues
>> around virtual memory system differences and fault management differences. 
>> This is the
>> classic "getting it to work is 20% of the effort, getting it to work when 
>> all else is failing is
>> the other 80%" case.
>>  -- richard
> 
> I understand the 80/20 rule. But this doesn't really answer the question(s). 
> If there weren't any major differences among operating systems, the project 
> probably would have been done long ago.

The issues are not only technical :-(

> To put it slightly differently, if I used ZoL in production, would I be 
> likely to experience performance or stability problems? Or would it be 
> lacking in features that I would likely need?

It seems reasonably stable for the casual use cases. 

As for the features, that is a much more difficult question to answer. For 
example, if
you use ACLs, you might find that some userland tools on some distros have full 
or 
no support for ACLs.

Let us know how it works out for you.
 -- richard

--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422







___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Two disks giving errors in a raidz pool, advice needed

2012-04-25 Thread Manuel Ryan
Hey again, I'm back with some news from my situation.

I tried taking out the faulty disk 5 and replacing it with a new disk, but
the pool showed up as FAULTED. So I plugged the faulting disk back keeping
the new disk in the machine, then ran a zpool replace.

After the new disk resilvered completely (took around 9 hours), the zpool
status still shows the disk as "replacing" but is not doing anything
(iostat not showing any disk activity). If I try to remove the faulty
drive, the pool shows up a DEGRADED now and still "replacing" the old
broken disk.

The overall state of the pool seems to have been getting worse, the other
failing disk is giving the write errors again, the pool had 28k corrupted
files (60k checksum errors on the raidz1 and  28k checksum errors on the
pool itself).

After seeing that, I tried to do a zpool clear to try and help the replace
process finish. After this, disk 1 was UNAVAIL due to too many IO errors
and the pool was DEGRADED.

I rebooted the machine, the pool is not back ONLINE with the disk5 still
saying "replacing" and 0 errors except permanent ones.

I don't really know what to try next :-/ any idea ?



On Mon, Apr 23, 2012 at 7:35 AM, Daniel Carosone  wrote:

> On Mon, Apr 23, 2012 at 05:48:16AM +0200, Manuel Ryan wrote:
> > After a reboot of the machine, I have no more write errors on disk 2
> (only
> > 4 checksum, not growing), I was able to access data which I previously
> > couldn't and now only the checksum errors on disk 5 are growing.
>
> Well, that's good, but what changed?   If it was just a reboot and
> perhaps power-cycle of the disks, I don't think you've solved much in
> the long term..
>
> > Fortunately, I was able to recover all important data in those conditions
> > (yeah !),
>
> .. though that's clearly the most important thing!
>
> If you're down to just checksum errors now, then run a scrub and see
> if they can all be repaired, before replacing the disk.  If you
> haven't been able to get a scrub complete, then either:
>  * delete unimportant / rescued data, until none of the problem
>   sectors are referenced any longer, or
>  * "replace" the disk like I suggested last time, with a copy under
>   zfs' nose and switch
>
> > And since I can live with loosing the pool now, I'll gamble away and
> > replace drive 5 tomorrow and if that fails i'll just destroy the pool,
> > replace the 2 physical disks and build a new one (maybe raidz2 this time
> :))
>
> You know what?  If you're prepared to do that in the worst of
> circumstances, it would be a very good idea to do that under the best
> of circumstances.  If you can, just rebuild it raidz2 and be happier
> next time something flaky happens with this hardware.
>
> > I'll try to leave all 6 original disks in the machine while replacing,
> > maybe zfs will be smart enough to use the 6 drives to build the
> replacement
> > disk ?
>
> I don't think it will.. others who know the code, feel free to comment
> otherwise.
>
> If you've got the physical space for the extra disk, why not keep it
> there and build the pool raidz2 with the same capacity?
>
> > It's a miracle that zpool still shows disk 5 as "ONLINE", here's a SMART
> > dump of disk 5 (1265 Current_Pending_Sector, ouch)
>
> That's all indicative of read errors. Note that your reallocated
> sector count on that disk is still low, so most of those will probably
> clear when overwritten and given a chance to re-map.
>
> If these all appeared suddenly, clearly the disk has developed a
> problem. Normally, they appear gradually as head sensitivity
> diminishes.
>
> How often do you normally run a scrub, before this happened?  It's
> possible they were accumulating for a while but went undetected for
> lack of read attempts to the disk.  Scrub more often!
>
> --
> Dan.
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on Linux vs FreeBSD

2012-04-25 Thread Paul Archer

9:59am, Richard Elling wrote:


On Apr 25, 2012, at 5:48 AM, Paul Archer wrote:

  This may fall into the realm of a religious war (I hope not!), but 
recently several people on this list have
  said/implied that ZFS was only acceptable for production use on FreeBSD 
(or Solaris, of course) rather than Linux
  with ZoL.

  I'm working on a project at work involving a large(-ish) amount of data, 
about 5TB, working its way up to 12-15TB


This is pretty small by today's standards.  With 4TB disks, that is only 3-4 
disks + redundancy.

True. At my last job, we were used to researchers asking for individual 4-5TB 
filesystems, and 1-2TB increases in size. When I left, there was over a 100TB 
online (in '07).




  eventually, spread among a dozen or so nodes. There may or may not be a 
clustered filesystem involved (probably
  gluster if we use anything).


I wouldn't dream of building a clustered file system that small. Maybe when you 
get into the
multiple-PB range, then it might make sense.

The point of a clustered filesystem was to be able to spread our data out 
among all nodes and still have access from any node without having to run NFS. 
Size of the data set (once you get past the point where you can replicate it 
on each node) is irrelevant.





  I've been looking at ZoL as the primary filesystem for this data. We're a 
Linux shop, so I'd rather not switch to
  FreeBSD, or any of the Solaris-derived distros--although I have no 
problem with them, I just don't want to
  introduce another OS into the mix if I can avoid it.

  So, the actual questions are:

  Is ZoL really not ready for production use?

  If not, what is holding it back? Features? Performance? Stability?


The computer science behind ZFS is sound. But it was also developed for Solaris 
which
is quite different than Linux under the covers. So the Linux and other OS ports 
have issues
around virtual memory system differences and fault management differences. This 
is the
classic "getting it to work is 20% of the effort, getting it to work when all 
else is failing is
the other 80%" case.
 -- richard


I understand the 80/20 rule. But this doesn't really answer the question(s). 
If there weren't any major differences among operating systems, the project 
probably would have been done long ago.


To put it slightly differently, if I used ZoL in production, would I be likely 
to experience performance or stability problems? Or would it be lacking in 
features that I would likely need?___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on Linux vs FreeBSD

2012-04-25 Thread Richard Elling
On Apr 25, 2012, at 5:48 AM, Paul Archer wrote:

> This may fall into the realm of a religious war (I hope not!), but recently 
> several people on this list have said/implied that ZFS was only acceptable 
> for production use on FreeBSD (or Solaris, of course) rather than Linux with 
> ZoL.
> 
> I'm working on a project at work involving a large(-ish) amount of data, 
> about 5TB, working its way up to 12-15TB

This is pretty small by today's standards.  With 4TB disks, that is only 3-4 
disks + redundancy.

> eventually, spread among a dozen or so nodes. There may or may not be a 
> clustered filesystem involved (probably gluster if we use anything).

I wouldn't dream of building a clustered file system that small. Maybe when you 
get into the
multiple-PB range, then it might make sense.

> I've been looking at ZoL as the primary filesystem for this data. We're a 
> Linux shop, so I'd rather not switch to FreeBSD, or any of the 
> Solaris-derived distros--although I have no problem with them, I just don't 
> want to introduce another OS into the mix if I can avoid it.
> 
> So, the actual questions are:
> 
> Is ZoL really not ready for production use?
> 
> If not, what is holding it back? Features? Performance? Stability?

The computer science behind ZFS is sound. But it was also developed for Solaris 
which
is quite different than Linux under the covers. So the Linux and other OS ports 
have issues
around virtual memory system differences and fault management differences. This 
is the
classic "getting it to work is 20% of the effort, getting it to work when all 
else is failing is
the other 80%" case.
 -- richard


--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422







___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [developer] Setting default user/group quotas[usage accounting]?

2012-04-25 Thread Richard Elling
On Apr 25, 2012, at 8:14 AM, Eric Schrock wrote:

> ZFS will always track per-user usage information even in the absence of 
> quotas. See the the zfs 'userused@' properties and 'zfs userspace' command.

tip: zfs get -H -o value -p userused@username filesystem

Yes, and this is the logical size, not physical size. Some ZFS features 
increase logical size
(copies) while others decrease physical size (compression, dedup)
 -- richard

--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422







___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on Linux vs FreeBSD

2012-04-25 Thread Ray Van Dolson
On Wed, Apr 25, 2012 at 05:48:57AM -0700, Paul Archer wrote:
> This may fall into the realm of a religious war (I hope not!), but
> recently several people on this list have said/implied that ZFS was
> only acceptable for production use on FreeBSD (or Solaris, of course)
> rather than Linux with ZoL.
> 
> I'm working on a project at work involving a large(-ish) amount of
> data, about 5TB, working its way up to 12-15TB eventually, spread
> among a dozen or so nodes. There may or may not be a clustered
> filesystem involved (probably gluster if we use anything). I've been
> looking at ZoL as the primary filesystem for this data. We're a Linux
> shop, so I'd rather not switch to FreeBSD, or any of the
> Solaris-derived distros--although I have no problem with them, I just
> don't want to introduce another OS into the mix if I can avoid it.
> 
> So, the actual questions are:
> 
> Is ZoL really not ready for production use?
> 
> If not, what is holding it back? Features? Performance? Stability?
> 
> If not, then what kind of timeframe are we looking at to get past
> whatever is holding it back?

I can't comment directly on experiences with ZoL as I haven't used it,
but it does seem to be under active development.  That can be a good
thing or a bad thing. :)

I for one would be hesitant to use it for anything production based
solely on the "youngness" of the effort.

That said, might be worthwhile to check out the ZoL mailing lists and
bug reports to see what types of issues the early adopters are running
into and whether or not they are showstoppers for you or you are
willing to accept the risks.

For your size requierements and your intent to use Gluster, it sounds
like ext4 or xfs would be entirely suitable and are obviously more
"mature" on Linux at this point.

Regardless, curious to hear which way you end up going and how things
work out.

Ray
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS on Linux vs FreeBSD

2012-04-25 Thread Paul Archer
This may fall into the realm of a religious war (I hope not!), but recently 
several people on this list have said/implied that ZFS was only acceptable for 
production use on FreeBSD (or Solaris, of course) rather than Linux with ZoL.


I'm working on a project at work involving a large(-ish) amount of data, about 
5TB, working its way up to 12-15TB eventually, spread among a dozen or so 
nodes. There may or may not be a clustered filesystem involved (probably 
gluster if we use anything). I've been looking at ZoL as the primary 
filesystem for this data. We're a Linux shop, so I'd rather not switch to 
FreeBSD, or any of the Solaris-derived distros--although I have no problem 
with them, I just don't want to introduce another OS into the mix if I can 
avoid it.


So, the actual questions are:

Is ZoL really not ready for production use?

If not, what is holding it back? Features? Performance? Stability?

If not, then what kind of timeframe are we looking at to get past whatever is 
holding it back?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [developer] Setting default user/group quotas[usage accounting]?

2012-04-25 Thread Eric Schrock
ZFS will always track per-user usage information even in the absence of
quotas. See the the zfs 'userused@' properties and 'zfs userspace' command.

- Eric

2012/4/25 Fred Liu 

> Missing an important ‘NOT’:
>
> >OK. I see. And I agree such quotas will **NOT** scale well. From users'
> side, they always
> > ask for more space or even no quotas at all. One of the  main purposes
> behind such quotas
> > is that we can account usage and get the statistics. Is it possible to
> do it without setting
> > such quotas?
> >
> >Thanks.
> >
> >Fred
> >
>
>
> _
> *From:* Fred Liu
> *Sent:* 星期三, 四月 25, 2012 20:05
> *To:* develo...@lists.illumos.org
> *Cc:* 'zfs-discuss@opensolaris.org'
> *Subject:* RE: [developer] Setting default user/group quotas[usage
> accounting]?
>
>
>
>
>
> On Apr 24, 2012, at 2:50 PM, Fred Liu wrote:
>
>
> Yes.
>
> Thanks.
>
> I am not aware of anyone looking into this.
>
> I don't think it is very hard, per se. But such quotas don't fit well with
> the
> notion of many file systems. There might be some restricted use cases
> where it makes good sense, but I'm not convinced it will scale well -- user
> quotas never scale well.
>  -- richard
>
>
> >OK. I see. And I agree such quotas will scale well. From users' side,
> they always
> > ask for more space or even no quotas at all. One of the  main purposes
> behind such quotas
> > is that we can account usage and get the statistics. Is it possible to
> do it without setting
> > such quotas?
> >
> >Thanks.
> >
> >Fred
> >
>
>   *illumos-developer* | 
> Archives
>  |
> ModifyYour
>  Subscription
> 
>



-- 
Eric Schrock
Delphix
http://blog.delphix.com/eschrock

275 Middlefield Road, Suite 50
Menlo Park, CA 94025
http://www.delphix.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [developer] Setting default user/group quotas[usage accounting]?

2012-04-25 Thread Fred Liu
Missing an important ‘NOT’:

>OK. I see. And I agree such quotas will **NOT** scale well. From users' side, 
>they always
> ask for more space or even no quotas at all. One of the  main purposes behind 
> such quotas
> is that we can account usage and get the statistics. Is it possible to do it 
> without setting
> such quotas?
>
>Thanks.
>
>Fred
>


_
From: Fred Liu
Sent: 星期三, 四月 25, 2012 20:05
To: develo...@lists.illumos.org
Cc: 'zfs-discuss@opensolaris.org'
Subject: RE: [developer] Setting default user/group quotas[usage accounting]?





On Apr 24, 2012, at 2:50 PM, Fred Liu wrote:


Yes.

Thanks.

I am not aware of anyone looking into this.

I don't think it is very hard, per se. But such quotas don't fit well with the
notion of many file systems. There might be some restricted use cases
where it makes good sense, but I'm not convinced it will scale well -- user
quotas never scale well.
 -- richard


>OK. I see. And I agree such quotas will scale well. From users' side, they 
>always
> ask for more space or even no quotas at all. One of the  main purposes behind 
> such quotas
> is that we can account usage and get the statistics. Is it possible to do it 
> without setting
> such quotas?
>
>Thanks.
>
>Fred
>

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [developer] Setting default user/group quotas[usage accounting]?

2012-04-25 Thread Fred Liu



On Apr 24, 2012, at 2:50 PM, Fred Liu wrote:


Yes.

Thanks.

I am not aware of anyone looking into this.

I don't think it is very hard, per se. But such quotas don't fit well with the
notion of many file systems. There might be some restricted use cases
where it makes good sense, but I'm not convinced it will scale well -- user
quotas never scale well.
 -- richard


>OK. I see. And I agree such quotas will scale well. From users' side, they 
>always
> ask for more space or even no quotas at all. One of the  main purposes behind 
> such quotas
> is that we can account usage and get the statistics. Is it possible to do it 
> without setting
> such quotas?
>
>Thanks.
>
>Fred
>

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss