Re: [zfs-discuss] Re: Re: can I use zfs on just a partition?

2007-01-26 Thread Constantin Gonzalez Schmitz
Hi,

 When you do the initial install, how do you do the slicing?
 
 Just create like:
 / 10G
 swap 2G
 /altroot 10G
 /zfs restofdisk

yes.

 Or do you just create the first three slices and leave the rest of the
 disk untouched?  I understand the concept at this point, just trying to
 explain to a third party exactly what they need to do to prep the system
 disk for me :)

No. You need to be able to tell ZFS what to use. Hence, if your pool is
created at the slice level, you need to create a slice for it.

So the above is the way to go.

And yes, you only should do this on laptos and other machines where you only
have 1 disk or are otherwise very disk-limited :).

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] high density SAS

2007-01-26 Thread Casper . Dik

Well Solaris SAS isn't there yet but anyway just found some interesting
high density SAS/SATA enclosures.

http://xtore.com/product_list.asp?cat=JBOD

The XJ 2000 is like the x4500 in that it holds 48 drives, however with
the XJ 2000 2 drives are on each carrier and you can get to them from
the front.

I don't like xtore in general but the 24 bay (2.5 SAS) and 48 bay
JBODs are interesting.  How badly can you mess up a JBOD?

Two words: vibration, cooling.

Casper
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: How much do we really want zpool remove?

2007-01-26 Thread Dick Davies

On 25/01/07, Brian Hechinger [EMAIL PROTECTED] wrote:


The other point is, how many other volume management systems allow you to remove
disks?  I bet if the answer is not zero, it's not large.  ;)


Even Linux LVM can do this (with pvmove) - slow, but you can do it online.


--
Rasputin :: Jack of All Trades - Master of Nuns
http://number9.hellooperator.net/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] multihosted ZFS

2007-01-26 Thread Ari-Pekka Oksavuori

Hi!

I've been testing ZFS, and would like to use it on SAN attached disks in
our production environment, where multiple machines can see the same
zpools. I'm having some concerns about importing/exporting pools on
possible failure situations. If box that was using some zpool crashes
(for example sending break to the host when testing this), I would like
to import that pool on some other host right away. Of course I'll have
to use import -f cause the pool was not exported. Now the other host is
serving the disk, no problem there, but when I boot the crashed host
again, it wants to keep using the pools it previosly had and it doesn't
realize that the pool is now in use by the other host. That leads to two
systems using the same zpool which is not nice.

Is there any solution to this problem, or do I have to get Sun Cluster
3.2 if I want to serve same zpools from many hosts? We may try Sun
Cluster anyway, but I'd like to know if this can be solved without it.


-- 
 Ari-Pekka Oksavuori aoksavuo at cs.tut.fi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] multihosted ZFS

2007-01-26 Thread James C. McPherson

Ari-Pekka Oksavuori wrote:

Hi!

I've been testing ZFS, and would like to use it on SAN attached disks in
our production environment, where multiple machines can see the same
zpools. I'm having some concerns about importing/exporting pools on
possible failure situations. If box that was using some zpool crashes
(for example sending break to the host when testing this), I would like
to import that pool on some other host right away. Of course I'll have
to use import -f cause the pool was not exported. Now the other host is
serving the disk, no problem there, but when I boot the crashed host
again, it wants to keep using the pools it previosly had and it doesn't
realize that the pool is now in use by the other host. That leads to two
systems using the same zpool which is not nice.


s/not nice/really really bad/ :-)


Is there any solution to this problem, or do I have to get Sun Cluster
3.2 if I want to serve same zpools from many hosts? We may try Sun
Cluster anyway, but I'd like to know if this can be solved without it.


You can't do it *safely* without the protection of a high-
availability framework such as SunCluster.


best regards,
James C. McPherson
--
Solaris kernel software engineer
Sun Microsystems
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] multihosted ZFS

2007-01-26 Thread Ari-Pekka Oksavuori
James C. McPherson wrote:
 You can't do it *safely* without the protection of a high-
 availability framework such as SunCluster.

Thanks for the fast reply. :) We'll have look into the Cluster solution.

-- 
 Ari-Pekka Oksavuori aoksavuo at cs.tut.fi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] bug id 6343667

2007-01-26 Thread Robert Milkowski
Hello zfs-discuss,

  Is anyone working on that bug? Any progress?

  It's really PITA on x4500 when one wants/needs snapshots regularry
  and resilvering of bad disks can take many days...

-- 
Best regards,
 Robert  mailto:[EMAIL PROTECTED]
 http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS or UFS - what to do?

2007-01-26 Thread Jeffery Malloch
Hi Folks,

I am currently in the midst of setting up a completely new file server using a 
pretty well loaded Sun T2000 (8x1GHz, 16GB RAM) connected to an Engenio 6994 
product (I work for LSI Logic so Engenio is a no brainer).  I have configured a 
couple of zpools from Volume groups on the Engenio box - 1x2.5TB and 1x3.75TB.  
I then created sub zfs systems below that and set quotas and sharenfs'd them so 
that it appears that these file systems are dynamically shrinkable and 
growable.  It looks very good...  I can see the correct file system sizes on 
all types of machines (Linux 32/64bit and of course Solaris boxes) and if I 
resize the quota it's picked up in NFS right away.  But I would be the first in 
our organization to use this in an enterprise system so I definitely have some 
concerns that I'm hoping someone here can address.

1.  How stable is ZFS?  The Engenio box is completely configured for RAID5 with 
hot spares and write cache (8GB) has battery backup so I'm not too concerned 
from a hardware side.  I'm looking for an idea of how stable ZFS itself is in 
terms of corruptability, uptime and OS stability.

2.  Recommended config.  Above, I have a fairly simple setup.  In many of the 
examples the granularity is home directory level and when you have many many 
users that could get to be a bit of a nightmare administratively.  I am really 
only looking for high level dynamic size adjustability and am not interested in 
its built in RAID features.  But given that, any real world recommendations?

3.  Caveats?  Anything I'm missing that isn't in the docs that could turn into 
a BIG gotchya?

4.  Since all data access is via NFS we are concerned that 32 bit systems 
(Mainly Linux and Windows via Samba) will not be able to access all the data 
areas of a 2TB+ zpool even if the zfs quota on a particular share is less then 
that.  Can anyone comment?

The bottom line is that with anything new there is cause for concern.  
Especially if it hasn't been tested within our organization.  But the 
convenience/functionality factors are way too hard to ignore.

Thanks,

Jeff
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS or UFS - what to do?

2007-01-26 Thread Robert Milkowski
Hello Jeffery,

Friday, January 26, 2007, 3:16:44 PM, you wrote:

JM Hi Folks,

JM I am currently in the midst of setting up a completely new file
JM server using a pretty well loaded Sun T2000 (8x1GHz, 16GB RAM)
JM connected to an Engenio 6994 product (I work for LSI Logic so
JM Engenio is a no brainer).  I have configured a couple of zpools
JM from Volume groups on the Engenio box - 1x2.5TB and 1x3.75TB.  I
JM then created sub zfs systems below that and set quotas and
JM sharenfs'd them so that it appears that these file systems are
JM dynamically shrinkable and growable.  It looks very good...  I can
JM see the correct file system sizes on all types of machines (Linux
JM 32/64bit and of course Solaris boxes) and if I resize the quota
JM it's picked up in NFS right away.  But I would be the first in our
JM organization to use this in an enterprise system so I definitely
JM have some concerns that I'm hoping someone here can address.

JM 1.  How stable is ZFS?  The Engenio box is completely configured
JM for RAID5 with hot spares and write cache (8GB) has battery backup
JM so I'm not too concerned from a hardware side.  I'm looking for an
JM idea of how stable ZFS itself is in terms of corruptability, uptime and OS 
stability.

When it comes to uptime, os stability or corruptability - no problems
here.

However if you give ZFS entire LUN's on Enginio devices IIRC with that
arrays when zfs issues flush wrtie cache to the array it actually does
and this can possibly hurt performance. There's a way to setup array
to ignore flush commands or you can put zfs on SMI. You have to check
if this problem was actually with Enginio - I'm not sure.

However, depending on workload, consider doing RAID in ZFS instead of
in on the array. Especially 'coz you get self-healing from ZFS then.

At least doing stripe between several RAID5 LUNs would be good idea.


JM 2.  Recommended config.  Above, I have a fairly simple setup.  In
JM many of the examples the granularity is home directory level and
JM when you have many many users that could get to be a bit of a
JM nightmare administratively.  I am really only looking for high
JM level dynamic size adjustability and am not interested in its
JM built in RAID features.  But given that, any real world recommendations?

Depending on how much users you have consider creating a file system
for each user or at least for a group of users if you can group them.


JM 3.  Caveats?  Anything I'm missing that isn't in the docs that could turn 
into a BIG gotchya?

WRITE CACHE problem I mentioned above - but check if it was really
Enginio - anyway there're simple workarounds.

There're some performance issues in corner cases hope you won't hit
one. Use at least S10U3 or Nevada (there're some people using nevada
in production :)).


JM 4.  Since all data access is via NFS we are concerned that 32 bit
JM systems (Mainly Linux and Windows via Samba) will not be able to
JM access all the data areas of a 2TB+ zpool even if the zfs quota on
JM a particular share is less then that.  Can anyone comment?

If there's quota on a file system then nfs client will see that quota
as a file system size IIRC so it shouldn't be a problem. But that
means a file system for each users.


JM The bottom line is that with anything new there is cause for
JM concern.  Especially if it hasn't been tested within our
JM organization.  But the convenience/functionality factors are way too hard 
to ignore.


ZFS is new, that's right. There're some problems, mostly related to
performance and hot spare support (when doing raid in ZFS). Other that
that you should be ok. Quite a lot of people are using ZFS in a
production. I myself have ZFS in a production for years and right now
with well over 100TB of data on it using different storage arrays and
I'm still migrating more and more data. Never lost any data on ZFS, at
least I don't know about it :)



-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS or UFS - what to do?

2007-01-26 Thread Francois Dion
On Fri, 2007-01-26 at 06:16 -0800, Jeffery Malloch wrote:
 Hi Folks,
 
 I am currently in the midst of setting up a completely new file server using 
 a pretty well loaded Sun T2000 (8x1GHz, 16GB RAM) connected to an Engenio 
 6994 product (I work for LSI Logic so Engenio is a no brainer).  I have 
 configured a couple of zpools from Volume groups on the Engenio box - 1x2.5TB 
 and 1x3.75TB.  I then created sub zfs systems below that and set quotas and 
 sharenfs'd them so that it appears that these file systems are dynamically 
 shrinkable and growable.  It looks very good...  I can see the correct file 
 system sizes on all types of machines (Linux 32/64bit and of course Solaris 
 boxes) and if I resize the quota it's picked up in NFS right away.  But I 
 would be the first in our organization to use this in an enterprise system so 
 I definitely have some concerns that I'm hoping someone here can address.
 
 1.  How stable is ZFS?  The Engenio box is completely configured for RAID5 
 with hot spares

That partly defeats the purpose of ZFS. ZFS offers raid-z and raid-z2
(double parity) with all the advantages of raid-5 or raid-6 but without
several of the raid-5 issues. It also has features that a raid-5
controller could never do: ensure data integrity from the kernel to the
disk, and self correction.

  and write cache (8GB) has battery backup so I'm not too concerned from a 
 hardware side.

Whereas the cache/battery backup is a requirement if you run raid-5, it
is not for zfs.

   I'm looking for an idea of how stable ZFS itself is in terms of 
 corruptability, uptime and OS stability.

Since Solaris 10 U3, it is rock solid. No issue here. 1.3TB or so
currently assigned in FC drives, in production without any issues. We
switched after losing some data from hardware mirroring. Our sysadmin is
ecstatic with zfs. Some of the filesystems have compression enabled and
that increases even the throughput, if you have the cpu/ram available.

 2.  Recommended config.

The most reliable setup is a JBOD + zfs. But if you have cache, on your
box, there might be some magic setup you have to do for that box, and
I'm sure somebody on the list will help you with that. I dont have an
Engenio.

Francois
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: multihosted ZFS

2007-01-26 Thread Peter Eriksson
If you _boot_ the original machine then it should see that the pool now is 
owned by
the other host and ignore it (you'd have to do a zpool import -f again I 
think). Not tested though so don't take my word for it...

However if you simply type go and let it continue from where it was then 
things definitely will not be pretty... :-)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS or UFS - what to do?

2007-01-26 Thread Bill Sommerfeld
On Fri, 2007-01-26 at 06:16 -0800, Jeffery Malloch wrote:
 2.  Recommended config.  

1) Since this is a system that many users will depend on, use
zfs-managed redundancy, either mirroring or raid-z, between the LUNs
exported by the storage system.  You may think your storage system is
perfect, but are you sure?  with a non-redundant zfs, over time, you'll
know for sure, but you might find this out at a very inconvenient time.

With zfs-managed redundancy, if bit rot happens, you have an excellent
chance of slogging through without any application-visible impact.

2) Enable compression.  For the software development workloads I'm
seeing, this generally recovers the space lost to redundancy.

- Bill


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-26 Thread Anantha N. Srirama
I've used ZFS since July/August 2006 when Sol 10 Update 2 came out (first 
release to integrate ZFS.) I've used it on three servers (E25K domain, and 2 
E2900s) extensivesely; two them are production. I've over 3TB of storage from 
an EMC SAN under ZFS management for no less than 6 months. Like your 
configuration we've defered data redundancy to SAN. My observations are:

1. ZFS is stable to a very large extent. There are two known issues that I'm 
aware of:
  a. You can end up in an endless 'reboot' cycle when you've a corrupt zpool. I 
came across this when I had data corruption due to a HBA mismatch with EMC SAN. 
This mismatch injected data corruption in transit and the EMC faithfully wrote 
bad data, upon reading this bad data ZFS threw up all over the floor for that 
pool. There is a documented workaround to snap out of the 'reboot' cycle, I've 
not checked if this is fixed in 11/06 update 3.
  b. Your server will hang when one of the underlying disks disappear. In our 
case we had a T2000 running 11/06 and had a mirrored zpool against two internal 
drives. When we pulled one of the drives abruptly the server simply hung. I 
believe this is a known bug, workaround?

2. When you've I/O operations that either request fsync or open files with 
O_DSYNC option coupled with high I/O ZFS will choke. It won't crash but the 
filesystem I/O runs like molases on a cold morning.

All my feedback is based on Solaris 10 Update 2 (aka 06/06) and I've no 
comments on NFS. I strongly recommend that you use ZFS data redundancy (z1, z2, 
or mirror) and simply delegate the Engenio to stripe the data for performance.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: multihosted ZFS

2007-01-26 Thread Ed Gould

On Jan 26, 2007, at 7:17, Peter Eriksson wrote:
If you _boot_ the original machine then it should see that the pool 
now is owned by
the other host and ignore it (you'd have to do a zpool import -f 
again I think). Not tested though so don't take my word for it...


Conceptually, that's about right, but in practice it's not quite as 
simple as that.  We had to do a lot of work in Cluster to ensure that 
the zpool would never be imported on more than one node at a time.


However if you simply type go and let it continue from where it was 
then things definitely will not be pretty... :-)


Yes, but that's only one of the bad scenarios.

--Ed

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-26 Thread Brian Hechinger
On Fri, Jan 26, 2007 at 08:06:46AM -0800, Anantha N. Srirama wrote:
 
   b. Your server will hang when one of the underlying disks disappear. In our 
 case we had a T2000 running 11/06 and had a mirrored zpool against two 
 internal drives. When we pulled one of the drives abruptly the server simply 
 hung. I believe this is a known bug, workaround?

This was just covered here and looks like the fix will make it into u4 (i think 
it's in svn_48?)

The workaround is to do a 'zpool offline' whenever possible before removing a 
disk.  Yes,
this is not always possible (in the case of disk death), but will help in some 
situations.

I can't wait for U4.  :)

-brian
-- 
The reason I don't use Gnome: every single other window manager I know of is
very powerfully extensible, where you can switch actions to different mouse
buttons. Guess which one is not, because it might confuse the poor users?
Here's a hint: it's not the small and fast one.--Linus
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: multihosted ZFS

2007-01-26 Thread Ari-Pekka Oksavuori
Peter Eriksson wrote:
 If you _boot_ the original machine then it should see that the pool now is 
 owned by
 the other host and ignore it (you'd have to do a zpool import -f again I 
 think). Not tested though so don't take my word for it...
 
 However if you simply type go and let it continue from where it was then 
 things definitely will not be pretty... :-)

I tested this, same thing with reboot.

-- 
 Ari-Pekka Oksavuori aoksavuo at cs.tut.fi
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: How much do we really want zpool remove?

2007-01-26 Thread Darren Dunham
 - We need to avoid customers thinking Veritas can shrink, ZFS can't. That
   is wrong. ZFS _filesystems_ grow and shrink all the time, it's just the
   pools below them that can just grow. And Veritas does not even have pools.

I'm sure that this issue is different for different environments, but I
assure you it wasn't raised because we're looking at a spec chart and
someone saw a missing check in the ZFS column.  The ability to
deallocate in-use storage without having to migrate the existing data is
used today by many administrators.  We'll live with this not being
possible in ZFS at the moment, but the limitation is real and the
flexibility of filesystems within the pool doesn't alleviate it.

 Sorry if I'm stating the obvious or stuff that has been discussed before,
 but the more I think about zpool remove, the more I think it's a question
 of willingness to plan/work/script/provision vs. a real show stopper.

Show stopper would depend on the environment.  It's certainly not that
in many places.  I agree that if I could exactly plan all my storage
perfectly in advance, then several ways that it would be really useful
would be reduced.  However one of the reasons to have it is precisely
because it is so difficult to get good predictions for storage use.

I know just a touch of the internals of ZFS to understand why
remove/split/evacuate is much more difficult than it might be in simpler
volume managers.  I'm happy we've got what we have today and that people
have already thought up ways of attacking this problem to make ZFS even
better.
-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: multihosted ZFS

2007-01-26 Thread Ari-Pekka Oksavuori
Ed Gould wrote:
 On Jan 26, 2007, at 7:17, Peter Eriksson wrote:
 If you _boot_ the original machine then it should see that the pool
 now is owned by
 the other host and ignore it (you'd have to do a zpool import -f
 again I think). Not tested though so don't take my word for it...
 
 Conceptually, that's about right, but in practice it's not quite as
 simple as that.  We had to do a lot of work in Cluster to ensure that
 the zpool would never be imported on more than one node at a time.

Did VxVM use hostid on disks to check where the disk groups were last
used, and won't automatically import groups with different id on disk.
Would something like this be hard to implement?

-- 
 Ari-Pekka Oksavuori aoksavuo at cs.tut.fi
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: How much do we really want zpool remove?

2007-01-26 Thread Wade . Stuart






[EMAIL PROTECTED] wrote on 01/26/2007 03:00:13 AM:

 Hi,

 I do agree that zpool remove is a _very_ desirable feature, not doubt
about
 that.

 Here are a couple of thoughts and workarounds, in random order, that
might
 give us some more perspective:

 - My home machine has 4 disks and a big zpool across them. Fine. But what
   if a controller fails or worse, a CPU? Right, I need a second machine,
if
   I'm really honest with myself and serious with my data. Don't laugh,
ZFS
   on a Solaris server is becoming my mission-critical home storage
solution
   that is supposed to last beyond CDs and DVDs and other vulnerable
media.

   So, if I was an enterprise, I'd be willing to keep enough empty LUNs
   available to facilitate at least the migration of one or more
filesystems
   if not complete pools. With a little bit of scripting, this can be done
   quite easily and efficiently through zfs send/receive and some LUN
   juggling.

   If I was an enterprise's server admin and the storage guys wouldn't
have
   enough space for migrations, I'd be worried.


I think you may find in practice that many medium to large enterprise IT
departments are in this exact situation -- we do not have luns sitting
stagnant just waiting for data migrations of our largest data sets.   We
have been sold (and rightly so, because it works and is cost effective and
has no downtime) that we should be able to move luns around at will without
duplicating (to tape or disk) and dumping.  You are really expecting to
have the storage guys to have 40tb of disk just sitting collecting dust
when you want to pull out 10 disks from a 44tb system?  This type of
thinking may very well be why Sun has hard time in the last few years
(although zfs, and recent products show that the tide is turning).

 - We need to avoid customers thinking Veritas can shrink, ZFS can't.
That
   is wrong. ZFS _filesystems_ grow and shrink all the time, it's just the
   pools below them that can just grow. And Veritas does not even have
pools.


Sorry,  that is silly.  Can we compare if we call them both volumes or
filesystems (or any virtualization of each) which are reserved for data in
which we want to remove and add disks online?  vxfs can grow and shrink
and the volumes can grow and shrink.  Pools may blur the line of volume/fs
but they are still delivering the same constraints to administrators trying
to admin these boxes and the disks attached to them.


   People have started to follow a One-pool-to-store-them-all which I
think
   is not always appropriate. Some alternatives:

   - One pool per zone might be a good idea if you want to migrate zones
 across systems which then becomes easy through zpool export/import in
 a SAN.

   - One pool per service level (mirror, RAID-Z2, fast, slow, cheap,
expensive)
 might be another idea. Keep some cheap mirrored storage handy
 for your pool
 migration needs and you could wiggle your life around zpool remove.

You went from one pool to share data (the major advantage of the pool
concept) to a bunch of constrained pools. Also how does this resolve the
issue of lun migration online?


 Switching between Mirror, RAID-Z, RAID-Z2 then becomes just a zfs
 send/receive pair.

 Shrinking a pool requires some more zfs send/receiving and maybe some
 scripting, but these are IMHO less painful than living without ZFS'
 data integrity and the other joys of ZFS.


Ohh, never mind,  dump to tape and restore (err disk) -- you do realize
that the industry has been selling products that have made this behavior
obsolete for close to 10 years now?


 Sorry if I'm stating the obvious or stuff that has been discussed before,
 but the more I think about zpool remove, the more I think it's a question
 of willingness to plan/work/script/provision vs. a real show stopper.


No,  it is a specific workflow that requires disk to stay online, while
allowing for economically sound use of resources -- this is not about
laziness (that is how I am reading your view) or not wanting to script up
solutions.


 Best regards,
Constantin

 P.S.: Now with my big mouth I hope I'll survive a customer confcall next
 week with a customer asking for exactly zpool remove :).

I hope so,  you may want to rethink the script and go back in sysadmin
time 10 years approach.  ZFS buys alot and is a great filesystem but there
are places such as this that are still weak and need fixing for many
environments to be able to replace vxvm/vxfs or other solutions.  Sure, you
will find people that are viewing this new pooled filesystem with old eyes,
but there are admins on this list that actually understand what they are
missing and the other options for working around these issues.  We don't
look at this like a feature tickmark,  but as a feature that we know is
missing that we really need to consider moving some of our systems from
vxvm/fs to zfs.


-Wade Stuart


 --
 Constantin GonzalezSun Microsystems 

Re: [zfs-discuss] Re: multihosted ZFS

2007-01-26 Thread Darren Dunham
  On Jan 26, 2007, at 7:17, Peter Eriksson wrote:
  If you _boot_ the original machine then it should see that the pool
  now is owned by
  the other host and ignore it (you'd have to do a zpool import -f
  again I think). Not tested though so don't take my word for it...
  
  Conceptually, that's about right, but in practice it's not quite as
  simple as that.  We had to do a lot of work in Cluster to ensure that
  the zpool would never be imported on more than one node at a time.
 
 Did VxVM use hostid on disks to check where the disk groups were last
 used, and won't automatically import groups with different id on disk.
 Would something like this be hard to implement?

Yes, it does.  There was a long thread on this not too long ago.
Something similar will be added to ZFS.  It won't be a full cluster
solution, but it would aid in hand-failover situations like this.

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] UFS on zvol: volblocksize and maxcontig

2007-01-26 Thread Brian H. Nelson

Hi all!

First off, if this has been discussed, please point me in that 
direction. I have searched high and low and really can't find much info 
on the subject.


We have a large-ish (200gb) UFS file system on a Sun Enterprise 250 that 
is being shared with samba (lots of files, mostly random IO). OS is 
Solaris 10u3. Disk set  is 7x36gb 10k scsi, 4 internal 3 external.


For several reasons we currently need to stay on UFS and can't switch to 
ZFS proper. So instead we have opted to do UFS on a zvol using raid-z, 
in lieu of UFS on SVM using raid5 (we want/need raid protection). This 
decision was made because of the ease of disk set portability of zpools, 
and also the [assumed] performance benefit vs SVM.


Anyways, I've been pondering the volblocksize parameter, and trying to 
figure out how it interacts with UFS. When the zvol was setup, I took 
the default 8k size. Since UFS uses an 8k blocksize, this seemed to be a 
reasonable choice. I've been thinking more about it lately, and have 
also read that UFS will do R/W in bigger than 8k blocks when it can, up 
to maxcontig (default of 16, ie 128k).


This presented me with several questions: Would a volblocksize of 128k 
and maxcontig 16 provide better UFS performance? Overall, or only in 
certain situations (ie only for sequential IO)? Would increasing the 
maxcontig beyond 16 make any difference (good, bad or indifferent) if 
the underlying device is limited to 128k blocks?


What exactly does volblocksize control? My observations thus far 
indicate that it simply sets a max block size for the [virtual] zvol 
device. Changing volblocksize does NOT seem to have an impact on IOs to 
the underlying physical disks, which always seem to float in the 50-110k 
range). How does volblocksize affect IO that is not of a set block size?


Finally, why does volblocksize only affects raidz and mirror devices? It 
seems to have no effect on 'simple' devices, even though I presume 
striping is still used there. That is also assuming that volblocksize 
interacts with striping.


Any answers or input is greatly appreciated.

Thanks much!
-Brian

--
---
Brian H. Nelson Youngstown State University
System Administrator   Media and Academic Computing
 bnelson[at]cis.ysu.edu
---

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] UFS on zvol: volblocksize and maxcontig

2007-01-26 Thread Darren J Moffat

Brian H. Nelson wrote:
For several reasons we currently need to stay on UFS and can't switch to 
ZFS proper. So instead we have opted to do UFS on a zvol using raid-z, 


Can you state what those reasons are please ?

I know that isn't answering the question you are asking but it is worth 
making sure you have the correct info.


I'd also like to understand why UFS works for you but ZFS as a 
filesystem does not.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] UFS on zvol: volblocksize and maxcontig

2007-01-26 Thread Brian H. Nelson

Darren J Moffat wrote:

Brian H. Nelson wrote:
For several reasons we currently need to stay on UFS and can't switch 
to ZFS proper. So instead we have opted to do UFS on a zvol using 
raid-z, 


Can you state what those reasons are please ?

I know that isn't answering the question you are asking but it is 
worth making sure you have the correct info.


I'd also like to understand why UFS works for you but ZFS as a 
filesystem does not.




I knew someone would ask that :)

The primary reason is that our backup software (EMC/Legato Networker 
7.2) does not appear to support zfs. We don't have the funds currently 
to upgrade to the new version that does.


The other reason is that the machine has been around for years, already 
using UFS and quotas extensively. Over winter break we had time to 
upgrade to Solaris 10 and migrate the volume from svm to zvol, but not 
much more.There are a few thousand users on the machine. The thought of 
transitioning to that many zfs 'partitions' in order to have per-user 
quotas seemed daunting, not to mention the administrative re-training 
needed (edquota doesn't work. du is reporting 3000 filesystems?! etc).


IMO, the quota-per-file-system approach seems inconvenient when you get 
past a handful of file systems. Unless I'm really missing something, it 
just seems like a nightmare to have to deal with such a ridiculous 
number of file systems.


-Brian

--
---
Brian H. Nelson Youngstown State University
System Administrator   Media and Academic Computing
 bnelson[at]cis.ysu.edu
---

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-26 Thread Akhilesh Mritunjai
Oh yep, I know that churning feeling in stomach that there's got to be a 
GOTCHA somewhere... it can't be *that* simple!
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] UFS on zvol: volblocksize and maxcontig

2007-01-26 Thread Casper . Dik

The other reason is that the machine has been around for years, already 
using UFS and quotas extensively. Over winter break we had time to 
upgrade to Solaris 10 and migrate the volume from svm to zvol, but not 
much more.There are a few thousand users on the machine. The thought of 
transitioning to that many zfs 'partitions' in order to have per-user 
quotas seemed daunting, not to mention the administrative re-training 
needed (edquota doesn't work. du is reporting 3000 filesystems?! etc).

I'm assuming df?

I think that the problem you are describing is a symptom of how
existing tools and methods fall apart when confronted with
huge numbers of filesystems, but only because more information
if presented by df than you did before.

I'd love to have an option to df which only reported pools, not
filesystems.  (rather than having to type df -F ufs; zpool list)

The same problem exists with automounted home directories
(but only active directories are shown, again this is something
ZFS may want to emulate)

IMO, the quota-per-file-system approach seems inconvenient when you get 
past a handful of file systems. Unless I'm really missing something, it 
just seems like a nightmare to have to deal with such a ridiculous 
number of file systems.

Why?  What additional per-filesystem overhead from a maintenance perspective
are you seeing?

Casper
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] UFS on zvol: volblocksize and maxcontig

2007-01-26 Thread James F. Hranicky
Brian H. Nelson wrote:

 IMO, the quota-per-file-system approach seems inconvenient when you get
 past a handful of file systems. Unless I'm really missing something, it
 just seems like a nightmare to have to deal with such a ridiculous
 number of file systems.

Seconded -- is there any chance BSD-style quotas will be implemented in
ZFS?

I notice there's an RFE:

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6501037

Jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-26 Thread Gary Mills
On Fri, Jan 26, 2007 at 09:33:40AM -0800, Akhilesh Mritunjai wrote:
 ZFS Rule #0: You gotta have redundancy
 ZFS Rule #1: Redundancy shall be managed by zfs, and zfs alone.
 
 Whatever you have, junk it. Let ZFS manage mirroring and redundancy. ZFS 
 doesn't forgive even single bit errors!

How does this work in an environment with storage that's centrally-
managed and shared between many servers?  I'm putting together a new
IMAP server that will eventually use 3TB of space from our Netapp via
an iSCSI SAN.  The Netapp provides all of the disk management and
redundancy that I'll ever need.  The server will only see a virtual
disk (a LUN).  I want to use ZFS on that LUN because it's superior
to UFS in this application, even without the redundancy.  There's
no way to get the Netapp to behave like a JBOD.  Are you saying that
this configuration isn't going to work?

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] UFS on zvol: volblocksize and maxcontig

2007-01-26 Thread Brian H. Nelson

[EMAIL PROTECTED] wrote:

*snip*
IMO, the quota-per-file-system approach seems inconvenient when you get 
past a handful of file systems. Unless I'm really missing something, it 
just seems like a nightmare to have to deal with such a ridiculous 
number of file systems.



Why?  What additional per-filesystem overhead from a maintenance perspective
are you seeing?

Casper
  
The obvious example would be /var/mail . UFS quotas are easy. Doing the 
same thing with ZFS would be (I think) impossible. You would have to 
completely convert and existing system to a maildir or home directory 
mail storage setup.


Other file-system-specific software could also have issues. Networker 
for instance does backups per filesystem. In that situation I could then 
possibly have ~3000 backup sets DAILY for a single machine (worst case, 
that each file system has changes). Granted, that may not be better or 
worse, just 'different' and not what I'm used to. On the other hand, I 
could certainly see where that could add a ton of overhead to backup 
processing.


Don't get me wrong, zfs quotas are a good thing, and could certainly be 
useful in many situations. I just don't think I agree that they are a 
one to one replacement for ufs quotas in terms of usability in all 
situations.


-Brian

--
---
Brian H. Nelson Youngstown State University
System Administrator   Media and Academic Computing
 bnelson[at]cis.ysu.edu
---

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] UFS on zvol: volblocksize and maxcontig

2007-01-26 Thread Wade . Stuart






[EMAIL PROTECTED] wrote on 01/26/2007 12:20:17 PM:

 [EMAIL PROTECTED] wrote:
 *snip*
 IMO, the quota-per-file-system approach seems inconvenient when you get
 past a handful of file systems. Unless I'm really missing something, it
 just seems like a nightmare to have to deal with such a ridiculous
 number of file systems.


 Why?  What additional per-filesystem overhead from a maintenance
perspective
 are you seeing?

 Casper

 The obvious example would be /var/mail . UFS quotas are easy. Doing
 the same thing with ZFS would be (I think) impossible. You would
 have to completely convert and existing system to a maildir or home
 directory mail storage setup.

 Other file-system-specific software could also have issues.
 Networker for instance does backups per filesystem. In that
 situation I could then possibly have ~3000 backup sets DAILY for a
 single machine (worst case, that each file system has changes).
 Granted, that may not be better or worse, just 'different' and not
 what I'm used to. On the other hand, I could certainly see where
 that could add a ton of overhead to backup processing.

 Don't get me wrong, zfs quotas are a good thing, and could certainly
 be useful in many situations. I just don't think I agree that they
 are a one to one replacement for ufs quotas in terms of usability in
 all situations.

Yes, there is an RFE out there for this that has been dispatched?.  In
many cases the zfs quotas work very well and are actually a godsend (after
getting over the initial shock of seeing screens of df output) but they
fail to cover usage when a filesystem or directory tree must be shared by
multiple users where each users needs to have limits to what disk space
they may use -- think department folders, or your example of mail.  The RFE
does not go into details about HOW this would be done when implemented.
User level quotas don't need to exactly match ufs quotas -- they can be
rethunk for zfs.  Are zfs style user quotas:

per zfs fs?
per zfs fs and all of their children (recursive)?
affected by snapshots data usage?
applied to lists of fs and summed (
username:100,000:/tank/home/username;/tank/departments/usersdepartment  #
allow 100,000 bytes to be used in total between these two unrelated
filesystems by this user)...

I have faith that user quotas are going to come sometime, these how
questions are interesting to me...


-Wade Stuart


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] A little different look at filesystems ... Just looking for ideas

2007-01-26 Thread Ross, Gary \(G.A.\)
Here's something I've been noodling around for a while. I'd like to run
this by some of you in this forum and see what you think. If I'm off
topic, I apologize.

ZFS gives large companies the ability to have huge amounts of data
available to the desktop user. Moving the user data from a locally
installed system to a ZFS based server makes a huge amount of sense, not
only saving money in the base configuration of the desktop, but also in
maintenance, and backups etc ... It make mobility within the company
easier, as they can just access the files from anywhere IN the company. 

That's wonderful. Only there is one small problem. Many companies that
are having major issues with mobility are giving more and more employees
laptops. Some of the data they need can be gotten off the Company's
portal, but it still requires the OS and applications to be installed
locally, and user data to be on the local disk as well. As more and more
laptops are purchased, the issues simply multiply, and the company now
has the same maintenance issues they started with!

What if something like the old CacheFS was revived, using ZFS as the
base file system instead of UFS? Using the ZFS filesystems on servers as
the master systems, the laptop builds a cache of files, used in the
last month or so. Could be applications or user data, it would not
matter. If the system was disconnected from the network, say on an
airplane, the data and applications would still be available. Using the
Copy on Write method in ZFS, the local cache of user data would then
update when connected back to the server. If anything happens to the
system, the only files actually lost would be what was done since the
last update to the master. 

This model could be used at the desktop as well. It would effective
reduce the bandwidth needed for NFS mounted clients, and could handle
far many clients than without a cache. In the end, the only thing
different from a laptop and a desktop might be the size of the cache. If
there was a problem, simply clean the cache out and start over. This
also would commonize the maintenance models between laptop and desktop. 

Could this be a good thing, or am I way off base??? 


Gary A. Ross
Network Operations Architect
Ford Motor Company
[EMAIL PROTECTED]
Phone: (313) 390-4313

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] multihosted ZFS

2007-01-26 Thread Marion Hakanson
[EMAIL PROTECTED] said:
 . . .
 realize that the pool is now in use by the other host. That leads to two
 systems using the same zpool which is not nice.
 
 Is there any solution to this problem, or do I have to get Sun Cluster 3.2 if
 I want to serve same zpools from many hosts? We may try Sun Cluster anyway,
 but I'd like to know if this can be solved without it. 

Perhaps I'm stating the obvious, but here goes:

You could use SAN zoning of the affected LUN's to keep multiple hosts
from seeing the zpool.  When failover time comes, you change the zoning
to make the LUN's visible to the new host, then import.  When the old
host reboots, it won't find any zpool.  Better safe than sorry

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-26 Thread Ed Gould

On Jan 26, 2007, at 9:42, Gary Mills wrote:

How does this work in an environment with storage that's centrally-
managed and shared between many servers?  I'm putting together a new
IMAP server that will eventually use 3TB of space from our Netapp via
an iSCSI SAN.  The Netapp provides all of the disk management and
redundancy that I'll ever need.  The server will only see a virtual
disk (a LUN).  I want to use ZFS on that LUN because it's superior
to UFS in this application, even without the redundancy.  There's
no way to get the Netapp to behave like a JBOD.  Are you saying that
this configuration isn't going to work?


It will work, but if the storage system corrupts the data, ZFS will be 
unable to correct it.  It will detect the error.


A number that I've been quoting, albeit without a good reference, comes 
from Jim Gray, who has been around the data-management industry for 
longer than I have (and I've been in this business since 1970); he's 
currently at Microsoft.  Jim says that the controller/drive subsystem 
writes data to the wrong sector of the drive without notice about once 
per drive per year.  In a 400-drive array, that's once a day.  ZFS will 
detect this error when the file is read (one of the blocks' checksum 
will not match).  But it can only correct the error if it manages the 
redundancy.


I would suggest exporting two LUNs from your central storage and let 
ZFS mirror them.  You can get a wider range of space/performance 
tradeoffs if you give ZFS a JBOD, but that doesn't sound like an 
option.


--Ed

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A little different look at filesystems ... Just looking for ideas

2007-01-26 Thread Ed Gould

On Jan 26, 2007, at 10:57, Ross, Gary (G.A.) wrote:

...
What if something like the old CacheFS was revived, using ZFS as the
base file system instead of UFS?
 ...

Could this be a good thing, or am I way off base???


Disconnected operation is a hard problem.  One of the better research 
efforts in that area was CODA, at CMU.  CODA was, as I recall, and 
extension to AFS, but it's probably reasonable to take some of those 
ideas and marry them with ZFS.  CODA is now open-source; at least the 
BSDs have it.


--Ed

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A little different look at filesystems ... Just looking for ideas

2007-01-26 Thread Brian Hechinger
On Fri, Jan 26, 2007 at 11:11:13AM -0800, Ed Gould wrote:
 
 Disconnected operation is a hard problem.  One of the better research 
 efforts in that area was CODA, at CMU.  CODA was, as I recall, and 
 extension to AFS, but it's probably reasonable to take some of those 
 ideas and marry them with ZFS.  CODA is now open-source; at least the 
 BSDs have it.

It's funny you should mention CODA.  I've just recently started looking at it
as a way to get davfs mounting support onto Solaris.

It's not been easy.  The CODA Solaris kernel module is several years old and
looks like it hasn't been touched in at least 2 years.  It does not cleanly
build on svn_50.  CODA itself has issues as well.

CODA certainly looks like an interesting option as it makes it very easy to
support filesystems under Solaris (we *still* lack smbfs for pete's sake)

It seems like lots of work is going to be required to make it useful however.

NetBSD 3.1 is currently getting installed on my Ghetto Laptop, at which point
I will start playing with CODA.  If I like what I see, I'll probably look into
spending some time trying to at least get the kernel module working.

-brian
-- 
The reason I don't use Gnome: every single other window manager I know of is
very powerfully extensible, where you can switch actions to different mouse
buttons. Guess which one is not, because it might confuse the poor users?
Here's a hint: it's not the small and fast one.--Linus
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-26 Thread Gary Mills
On Fri, Jan 26, 2007 at 11:05:17AM -0800, Ed Gould wrote:
 On Jan 26, 2007, at 9:42, Gary Mills wrote:
 How does this work in an environment with storage that's centrally-
 managed and shared between many servers?
 
 It will work, but if the storage system corrupts the data, ZFS will be 
 unable to correct it.  It will detect the error.
 
 A number that I've been quoting, albeit without a good reference, comes 
 from Jim Gray, who has been around the data-management industry for 
 longer than I have (and I've been in this business since 1970); he's 
 currently at Microsoft.  Jim says that the controller/drive subsystem 
 writes data to the wrong sector of the drive without notice about once 
 per drive per year.  In a 400-drive array, that's once a day.  ZFS will 
 detect this error when the file is read (one of the blocks' checksum 
 will not match).  But it can only correct the error if it manages the 
 redundancy.

Our Netapp does double-parity RAID.  In fact, the filesystem design is
remarkably similar to that of ZFS.  Wouldn't that also detect the
error?  I suppose it depends if the `wrong sector without notice'
error is repeated each time.  Or is it random?

 I would suggest exporting two LUNs from your central storage and let 
 ZFS mirror them.  You can get a wider range of space/performance 
 tradeoffs if you give ZFS a JBOD, but that doesn't sound like an 
 option.

That would double the amount of disk that we'd require.  I am actually
planning on using two iSCSI LUNs and letting ZFS stripe across them.
When we need to expand the ZFS pool, I'd like to just expand the two
LUNs on the Netapp.  If ZFS won't accomodate that, I can just add a
couple more LUNs.  This is all convenient and easily managable.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-26 Thread Richard Elling

Gary Mills wrote:

On Fri, Jan 26, 2007 at 11:05:17AM -0800, Ed Gould wrote:

On Jan 26, 2007, at 9:42, Gary Mills wrote:

How does this work in an environment with storage that's centrally-
managed and shared between many servers?
It will work, but if the storage system corrupts the data, ZFS will be 
unable to correct it.  It will detect the error.


A number that I've been quoting, albeit without a good reference, comes 
from Jim Gray, who has been around the data-management industry for 
longer than I have (and I've been in this business since 1970); he's 
currently at Microsoft.  Jim says that the controller/drive subsystem 
writes data to the wrong sector of the drive without notice about once 
per drive per year.  In a 400-drive array, that's once a day.  ZFS will 
detect this error when the file is read (one of the blocks' checksum 
will not match).  But it can only correct the error if it manages the 
redundancy.


The quote from Jim seems to be related to the leaves of the tree (disks).
Anecdotally, now that we have ZFS at the trunk, we're seeing that the
branches are also corrupting data.  We've speculated that it would occur,
but now we can measure it, and it is non-zero.  See Anantha's post for
one such anecdote.


Our Netapp does double-parity RAID.  In fact, the filesystem design is
remarkably similar to that of ZFS.  Wouldn't that also detect the
error?  I suppose it depends if the `wrong sector without notice'
error is repeated each time.  Or is it random?


We're having a debate related to this, data would be appreciated :-)
Do you get small, random read performance equivalent to N-2 spindles
for an N-way double-parity volume?

I would suggest exporting two LUNs from your central storage and let 
ZFS mirror them.  You can get a wider range of space/performance 
tradeoffs if you give ZFS a JBOD, but that doesn't sound like an 
option.


That would double the amount of disk that we'd require.  I am actually
planning on using two iSCSI LUNs and letting ZFS stripe across them.
When we need to expand the ZFS pool, I'd like to just expand the two
LUNs on the Netapp.  If ZFS won't accomodate that, I can just add a
couple more LUNs.  This is all convenient and easily managable.


Sounds reasonable to me :-)
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-26 Thread Torrey McMahon

Gary Mills wrote:

On Fri, Jan 26, 2007 at 11:05:17AM -0800, Ed Gould wrote:
  

On Jan 26, 2007, at 9:42, Gary Mills wrote:


How does this work in an environment with storage that's centrally-
managed and shared between many servers?
  
It will work, but if the storage system corrupts the data, ZFS will be 
unable to correct it.  It will detect the error.


A number that I've been quoting, albeit without a good reference, comes 
from Jim Gray, who has been around the data-management industry for 
longer than I have (and I've been in this business since 1970); he's 
currently at Microsoft.  Jim says that the controller/drive subsystem 
writes data to the wrong sector of the drive without notice about once 
per drive per year.  In a 400-drive array, that's once a day.  ZFS will 
detect this error when the file is read (one of the blocks' checksum 
will not match).  But it can only correct the error if it manages the 
redundancy.



Our Netapp does double-parity RAID.  In fact, the filesystem design is
remarkably similar to that of ZFS.  Wouldn't that also detect the
error?  I suppose it depends if the `wrong sector without notice'
error is repeated each time. 


If the wrong block is written by the controller then you're out of luck. 
The filesystem would read the incorrect block and ... who knows. Thats 
why the ZFS checksums are important.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-26 Thread Wade . Stuart






[EMAIL PROTECTED] wrote on 01/26/2007 01:43:35 PM:

 On Fri, Jan 26, 2007 at 11:05:17AM -0800, Ed Gould wrote:
  On Jan 26, 2007, at 9:42, Gary Mills wrote:
  How does this work in an environment with storage that's centrally-
  managed and shared between many servers?
 
  It will work, but if the storage system corrupts the data, ZFS will be
  unable to correct it.  It will detect the error.
 
  A number that I've been quoting, albeit without a good reference, comes

  from Jim Gray, who has been around the data-management industry for
  longer than I have (and I've been in this business since 1970); he's
  currently at Microsoft.  Jim says that the controller/drive subsystem
  writes data to the wrong sector of the drive without notice about once
  per drive per year.  In a 400-drive array, that's once a day.  ZFS will

  detect this error when the file is read (one of the blocks' checksum
  will not match).  But it can only correct the error if it manages the
  redundancy.

 Our Netapp does double-parity RAID.  In fact, the filesystem design is
 remarkably similar to that of ZFS.  Wouldn't that also detect the
 error?  I suppose it depends if the `wrong sector without notice'
 error is repeated each time.  Or is it random?

I do not know,  WAFL and other portions of NetApp backends are never really
described in very technical details -- even getting real IOPS numbers from
them seems to be a hassle, much magic -- little meat.  To me, zfs is very
well defined behavior and methodology (you can even see the source to
verify specifics) and this allows you to _know_ what weak points are.
NetApp, EMC  and other disk vendors may have financial benefits for
allowing edge cases such as the write hole or bit rot (x errors per disk
are acceptable losses,  after x errors then consider replacing disk
cost/benefit analysis -- will customers actually know a bit is flipped?).
In EMC's case it is very common for a disk to have multiple read/write
errors before EMC will swap out the disk,  they even use a substantial
portion of the disk as replacement and parity bits (outside of raid) so
they offset or postpone the replacement volume/costs on the customer.

The most detailed description of WAFL I was able to find last time I looked
was:
http://www.netapp.com/library/tr/3002.pdf



  I would suggest exporting two LUNs from your central storage and let
  ZFS mirror them.  You can get a wider range of space/performance
  tradeoffs if you give ZFS a JBOD, but that doesn't sound like an
  option.

 That would double the amount of disk that we'd require.  I am actually
 planning on using two iSCSI LUNs and letting ZFS stripe across them.
 When we need to expand the ZFS pool, I'd like to just expand the two
 LUNs on the Netapp.  If ZFS won't accomodate that, I can just add a
 couple more LUNs.  This is all convenient and easily managable.

If you do have bit errors coming from the netapp zfs will find them and
will not be able to correct in this case.



 --
 -Gary Mills--Unix Support--U of M Academic Computing and
Networking-
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-26 Thread Ed Gould

On Jan 26, 2007, at 12:13, Richard Elling wrote:

On Fri, Jan 26, 2007 at 11:05:17AM -0800, Ed Gould wrote:
A number that I've been quoting, albeit without a good reference, 
comes from Jim Gray, who has been around the data-management industry 
for longer than I have (and I've been in this business since 1970); 
he's currently at Microsoft.  Jim says that the controller/drive 
subsystem writes data to the wrong sector of the drive without notice 
about once per drive per year.  In a 400-drive array, that's once a 
day.  ZFS will detect this error when the file is read (one of the 
blocks' checksum will not match).  But it can only correct the error 
if it manages the redundancy.


The quote from Jim seems to be related to the leaves of the tree 
(disks).

Anecdotally, now that we have ZFS at the trunk, we're seeing that the
branches are also corrupting data.  We've speculated that it would 
occur,

but now we can measure it, and it is non-zero.  See Anantha's post for
one such anecdote.


Actually, Jim was referring to everything but the trunk.  He didn't 
specify where from the HBA to the drive the error actually occurs.  I 
don't think it really matters.  I saw him give a talk a few years ago 
at the Usenix FAST conference; that's where I got this information.


--Ed

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] UFS on zvol: volblocksize and maxcontig

2007-01-26 Thread Eric Enright

On 1/26/07, Darren J Moffat [EMAIL PROTECTED] wrote:

Brian H. Nelson wrote:
 For several reasons we currently need to stay on UFS and can't switch to
 ZFS proper. So instead we have opted to do UFS on a zvol using raid-z,

Can you state what those reasons are please ?

I know that isn't answering the question you are asking but it is worth
making sure you have the correct info.

I'd also like to understand why UFS works for you but ZFS as a
filesystem does not.


Samba does not currently support ZFS ACLs.  This thread caught my eye
as I just recently considered a similar solution.  Support is being
worked on though, apparently, so I can wait:

http://lists.samba.org/archive/samba-technical/2007-January/051123.html

--
Eric Enright
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-26 Thread Dana H. Myers
Ed Gould wrote:
 On Jan 26, 2007, at 12:13, Richard Elling wrote:
 On Fri, Jan 26, 2007 at 11:05:17AM -0800, Ed Gould wrote:
 A number that I've been quoting, albeit without a good reference,
 comes from Jim Gray, who has been around the data-management industry
 for longer than I have (and I've been in this business since 1970);
 he's currently at Microsoft.  Jim says that the controller/drive
 subsystem writes data to the wrong sector of the drive without notice
 about once per drive per year.  In a 400-drive array, that's once a
 day.  ZFS will detect this error when the file is read (one of the
 blocks' checksum will not match).  But it can only correct the error
 if it manages the redundancy.

 Actually, Jim was referring to everything but the trunk.  He didn't
 specify where from the HBA to the drive the error actually occurs.  I
 don't think it really matters.  I saw him give a talk a few years ago at
 the Usenix FAST conference; that's where I got this information.

So this leaves me wondering how often the controller/drive subsystem
reads data from the wrong sector of the drive without notice; is it
symmetrical with respect to writing, and thus about once a drive/year,
or are there factors which change this?

Dana
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-26 Thread Torrey McMahon

Dana H. Myers wrote:

Ed Gould wrote:
  

On Jan 26, 2007, at 12:13, Richard Elling wrote:


On Fri, Jan 26, 2007 at 11:05:17AM -0800, Ed Gould wrote:
  

A number that I've been quoting, albeit without a good reference,
comes from Jim Gray, who has been around the data-management industry
for longer than I have (and I've been in this business since 1970);
he's currently at Microsoft.  Jim says that the controller/drive
subsystem writes data to the wrong sector of the drive without notice
about once per drive per year.  In a 400-drive array, that's once a
day.  ZFS will detect this error when the file is read (one of the
blocks' checksum will not match).  But it can only correct the error
if it manages the redundancy.



  

Actually, Jim was referring to everything but the trunk.  He didn't
specify where from the HBA to the drive the error actually occurs.  I
don't think it really matters.  I saw him give a talk a few years ago at
the Usenix FAST conference; that's where I got this information.



So this leaves me wondering how often the controller/drive subsystem
reads data from the wrong sector of the drive without notice; is it
symmetrical with respect to writing, and thus about once a drive/year,
or are there factors which change this?
  


It's not symmetrical. Often times its a fw bug. Others a spurious event 
causes one block to be read/written instead of an other one. (Alpha 
particles anyone?)



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-26 Thread Ed Gould

On Jan 26, 2007, at 12:52, Dana H. Myers wrote:

So this leaves me wondering how often the controller/drive subsystem
reads data from the wrong sector of the drive without notice; is it
symmetrical with respect to writing, and thus about once a drive/year,
or are there factors which change this?


My guess is that it would be symmetric, but I don't really know.

--Ed

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-01-26 Thread Rainer Heilke
 So, if I was an enterprise, I'd be willing to keep
  enough empty LUNs
 available to facilitate at least the migration of
  one or more filesystems
 if not complete pools.

You might be, but don't be surprised when the Financials folks laugh you out of 
their office. Large corporations do not make money by leaving wads of cash 
lying around, and that's exactly what a few terabytes of unused storage in a 
high-end SAN is. This is in addition to the laughter generated by the comment 
that, not a big deal if the Financials and HR databases are offline for three 
days while we do the migration. Good luck writing up a business case that 
justifies this sort of fiscal generosity.

Sorry, this argument smacks a little too much of being out of touch with the 
fiscal (and time) restrictions of working in a typical corporation, as opposed 
to a well-funded research group.

I hope I'm not sounding rude, but those of us working in medium to large 
corporations simply do not have the money for such luxuries. Period.

Rainer
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-26 Thread Dana H. Myers
Torrey McMahon wrote:
 Dana H. Myers wrote:
 Ed Gould wrote:
  
 On Jan 26, 2007, at 12:13, Richard Elling wrote:

 On Fri, Jan 26, 2007 at 11:05:17AM -0800, Ed Gould wrote:
  
 A number that I've been quoting, albeit without a good reference,
 comes from Jim Gray, who has been around the data-management industry
 for longer than I have (and I've been in this business since 1970);
 he's currently at Microsoft.  Jim says that the controller/drive
 subsystem writes data to the wrong sector of the drive without notice
 about once per drive per year.  In a 400-drive array, that's once a
 day.  ZFS will detect this error when the file is read (one of the
 blocks' checksum will not match).  But it can only correct the error
 if it manages the redundancy.
 

  
 Actually, Jim was referring to everything but the trunk.  He didn't
 specify where from the HBA to the drive the error actually occurs.  I
 don't think it really matters.  I saw him give a talk a few years ago at
 the Usenix FAST conference; that's where I got this information.
 

 So this leaves me wondering how often the controller/drive subsystem
 reads data from the wrong sector of the drive without notice; is it
 symmetrical with respect to writing, and thus about once a drive/year,
 or are there factors which change this?
   
 
 It's not symmetrical. Often times its a fw bug. Others a spurious event
 causes one block to be read/written instead of an other one. (Alpha
 particles anyone?)

I would tend to expect these spurious events to impact read and write
equally; more specifically, the chance of any one read or write being
mis-addressed is about the same.  Since, AFAIK, there are many more reads
from a disk typically than writes, this would seem to suggest that there
would be more mis-addressed reads in a drive/year than mis-addressed
writes.  Is this the reason for the asymmetry?

(I'm sure waving my hands here)

Dana
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] UFS on zvol: volblocksize and maxcontig

2007-01-26 Thread Robert Thurlow

Eric Enright wrote:


Samba does not currently support ZFS ACLs.


Yes, but this just means you can't get/set your ACLs from a CIFS
client.  ACLs will be enforced just fine once set locally on the
server; you may also be able to get/set them from an NFS client.
You may know this, but I know some are confused by this and think
you lose ACL protection.

Rob T
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-26 Thread Ed Gould

On Jan 26, 2007, at 13:16, Dana H. Myers wrote:

I would tend to expect these spurious events to impact read and write
equally; more specifically, the chance of any one read or write being
mis-addressed is about the same.  Since, AFAIK, there are many more 
reads
from a disk typically than writes, this would seem to suggest that 
there

would be more mis-addressed reads in a drive/year than mis-addressed
writes.  Is this the reason for the asymmetry?


Jim's once per drive per year number was not very precise.  I took it 
to be just one significant digit.  I don't recall if he distinguished 
reads from writes.


--Ed

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-26 Thread Selim Daoud

it would be good to have real data and not only guess ot anecdots

this story about wrong blocks being written by  RAID controllers
sounds like the anti-terrorism propaganda we are leaving in: exagerate
the facts to catch everyone's attention
.It's going to take more than that to prove RAID ctrls have been doing
a bad jobs for the last 30 years
Let's make up  real stories with hard fact first
s.

On 1/26/07, Ed Gould [EMAIL PROTECTED] wrote:

On Jan 26, 2007, at 12:52, Dana H. Myers wrote:
 So this leaves me wondering how often the controller/drive subsystem
 reads data from the wrong sector of the drive without notice; is it
 symmetrical with respect to writing, and thus about once a drive/year,
 or are there factors which change this?

My guess is that it would be symmetric, but I don't really know.

--Ed

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-26 Thread Ed Gould

On Jan 26, 2007, at 13:29, Selim Daoud wrote:

it would be good to have real data and not only guess ot anecdots


Yes, I agree.  I'm sorry I don't have the data that Jim presented at 
FAST, but he did present actual data.  Richard Elling (I believe it was 
Richard) has also posted some related data from ZFS experience to this 
list.


There is more than just anecdotal evidence for this.

--Ed

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-01-26 Thread Richard Elling

Rainer Heilke wrote:

So, if I was an enterprise, I'd be willing to keep
 enough empty LUNs
available to facilitate at least the migration of
 one or more filesystems
if not complete pools.


You might be, but don't be surprised when the Financials folks laugh you out of their 
office. Large corporations do not make money by leaving wads of cash lying around, and 
that's exactly what a few terabytes of unused storage in a high-end SAN is. This is in 
addition to the laughter generated by the comment that, not a big deal if the 
Financials and HR databases are offline for three days while we do the migration. 
Good luck writing up a business case that justifies this sort of fiscal generosity.


To be fair, you can replace vdevs with same-sized or larger vdevs online.
The issue is that you cannot replace with smaller vdevs nor can you
eliminate vdevs.  In other words, I can migrate data around without
downtime, I just can't shrink or eliminate vdevs without send/recv.
This is where the philosophical disconnect lies.  Everytime we descend
into this rathole, we stir up more confusion :-(

If you consider your pool of storage as a zpool, then the management of
subparts of the pool is done at the file system level.  This concept is
different than other combinations of devices and file systems such as
SVM+UFS.  When answering the ZFS shrink question, you need to make sure
you're not applying the old concepts to the new model.

Personally, I've never been in the situation where users ask for less storage,
but maybe I'm just the odd guy out? ;-)  Others have offered cases where
a shrink or vdev restructuring could be useful.  But I still see some
confusion with file system management (including zvols) and device management.
The shrink feature is primarily at the device management level.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


RE: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-26 Thread Paul Fisher
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Ed Gould
 Sent: Friday, January 26, 2007 3:38 PM
 
 Yes, I agree.  I'm sorry I don't have the data that Jim presented at 
 FAST, but he did present actual data.  Richard Elling (I believe it 
 was
 Richard) has also posted some related data from ZFS experience to this 
 list.

This seems to be from Jim and on point:

http://www.usenix.org/event/fast05/tech/gray.pdf


paul
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] UFS on zvol: volblocksize and maxcontig

2007-01-26 Thread Eric Enright

On 1/26/07, Robert Thurlow [EMAIL PROTECTED] wrote:

Eric Enright wrote:

 Samba does not currently support ZFS ACLs.

Yes, but this just means you can't get/set your ACLs from a CIFS
client.  ACLs will be enforced just fine once set locally on the
server; you may also be able to get/set them from an NFS client.
You may know this, but I know some are confused by this and think
you lose ACL protection.


Quite right.

Getting them set poses a problem for my specific case, unfortunately.

--
Eric Enright
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] bug id 6343667

2007-01-26 Thread eric kustarz


On Jan 26, 2007, at 6:02 AM, Robert Milkowski wrote:


Hello zfs-discuss,

  Is anyone working on that bug? Any progress?


For bug:
6343667 scrub/resilver has to start over when a snapshot is taken

I believe that is on Matt and Mark's radar, and they have made some  
progress.


eric

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-26 Thread Torrey McMahon

Dana H. Myers wrote:

Torrey McMahon wrote:
  

Dana H. Myers wrote:


Ed Gould wrote:
 
  

On Jan 26, 2007, at 12:13, Richard Elling wrote:
   


On Fri, Jan 26, 2007 at 11:05:17AM -0800, Ed Gould wrote:
 
  

A number that I've been quoting, albeit without a good reference,
comes from Jim Gray, who has been around the data-management industry
for longer than I have (and I've been in this business since 1970);
he's currently at Microsoft.  Jim says that the controller/drive
subsystem writes data to the wrong sector of the drive without notice
about once per drive per year.  In a 400-drive array, that's once a
day.  ZFS will detect this error when the file is read (one of the
blocks' checksum will not match).  But it can only correct the error
if it manages the redundancy.


 
  

Actually, Jim was referring to everything but the trunk.  He didn't
specify where from the HBA to the drive the error actually occurs.  I
don't think it really matters.  I saw him give a talk a few years ago at
the Usenix FAST conference; that's where I got this information.



So this leaves me wondering how often the controller/drive subsystem
reads data from the wrong sector of the drive without notice; is it
symmetrical with respect to writing, and thus about once a drive/year,
or are there factors which change this?
  
  

It's not symmetrical. Often times its a fw bug. Others a spurious event
causes one block to be read/written instead of an other one. (Alpha
particles anyone?)



I would tend to expect these spurious events to impact read and write
equally; more specifically, the chance of any one read or write being
mis-addressed is about the same.  Since, AFAIK, there are many more reads
from a disk typically than writes, this would seem to suggest that there
would be more mis-addressed reads in a drive/year than mis-addressed
writes.  Is this the reason for the asymmetry?

(I'm sure waving my hands here)


For the spurious events, yes, I would expect things to be impacted 
symmetrically depending when it comes to errors during reads and errors 
during writes. That is if you could figure out what spurious event 
occurred. In most cases the spurious errors are caught only at read time 
and you're left wondering. Was it an incorrect read? Was the data 
written incorrectly? You end up throwing your hands up and saying, Lets 
hope that doesn't happen again. It's much easier to unearth a fw bug in 
a particular disk drive operating in certain conditions and fix it.


Now that we're checksumming things I'd expect to find more errors, and 
hopefully be in a condition to fix them, then we have in the past. We 
will also start getting customer complaints like, We moved to ZFS and 
now we are seeing media errors more often. Why is ZFS broken? This is 
similar to the StorADE issues we had in NWS - Ahhh, the good old days - 
when we started doing a much better job discovering issues and reporting 
them when in the past we were blissfully silent. We used to have some 
data on that with nice graphs but I can't find them lying about.





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS or UFS - what to do?

2007-01-26 Thread Jason J. W. Williams

Hi Jeff,

We're running a FLX210 which I believe is an Engenio 2884. In our case
it also is attached to a T2000. ZFS has run VERY stably for us with
data integrity issues at all.

We did have a significant latency problem caused by ZFS flushing the
write cache on the array after every write, but that can be fixed by
configuring your array to ignore cache flushes. The instructions for
Engenio products are here: http://blogs.digitar.com/jjww/?itemid=44

We use the config for a production database, so I can't speak to the
NFS issues. All I would mention is to watch the RAM consumption by
ZFS.

Does anyone on the list have a recommendation for ARC sizing with NFS?

Best Regards,
Jason


On 1/26/07, Jeffery Malloch [EMAIL PROTECTED] wrote:

Hi Folks,

I am currently in the midst of setting up a completely new file server using a pretty 
well loaded Sun T2000 (8x1GHz, 16GB RAM) connected to an Engenio 6994 product (I work for 
LSI Logic so Engenio is a no brainer).  I have configured a couple of zpools from Volume 
groups on the Engenio box - 1x2.5TB and 1x3.75TB.  I then created sub zfs systems below 
that and set quotas and sharenfs'd them so that it appears that these file 
systems are dynamically shrinkable and growable.  It looks very good...  I can see 
the correct file system sizes on all types of machines (Linux 32/64bit and of course 
Solaris boxes) and if I resize the quota it's picked up in NFS right away.  But I would 
be the first in our organization to use this in an enterprise system so I definitely have 
some concerns that I'm hoping someone here can address.

1.  How stable is ZFS?  The Engenio box is completely configured for RAID5 with 
hot spares and write cache (8GB) has battery backup so I'm not too concerned 
from a hardware side.  I'm looking for an idea of how stable ZFS itself is in 
terms of corruptability, uptime and OS stability.

2.  Recommended config.  Above, I have a fairly simple setup.  In many of the 
examples the granularity is home directory level and when you have many many 
users that could get to be a bit of a nightmare administratively.  I am really 
only looking for high level dynamic size adjustability and am not interested in 
its built in RAID features.  But given that, any real world recommendations?

3.  Caveats?  Anything I'm missing that isn't in the docs that could turn into 
a BIG gotchya?

4.  Since all data access is via NFS we are concerned that 32 bit systems 
(Mainly Linux and Windows via Samba) will not be able to access all the data 
areas of a 2TB+ zpool even if the zfs quota on a particular share is less then 
that.  Can anyone comment?

The bottom line is that with anything new there is cause for concern.  
Especially if it hasn't been tested within our organization.  But the 
convenience/functionality factors are way too hard to ignore.

Thanks,

Jeff


This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS or UFS - what to do?

2007-01-26 Thread Jason J. W. Williams

Correction: ZFS has run VERY stably for us with  data integrity
issues at all. should read ZFS has run VERY stably for us with  NO
data integrity issues at all.


On 1/26/07, Jason J. W. Williams [EMAIL PROTECTED] wrote:

Hi Jeff,

We're running a FLX210 which I believe is an Engenio 2884. In our case
it also is attached to a T2000. ZFS has run VERY stably for us with
data integrity issues at all.

We did have a significant latency problem caused by ZFS flushing the
write cache on the array after every write, but that can be fixed by
configuring your array to ignore cache flushes. The instructions for
Engenio products are here: http://blogs.digitar.com/jjww/?itemid=44

We use the config for a production database, so I can't speak to the
NFS issues. All I would mention is to watch the RAM consumption by
ZFS.

Does anyone on the list have a recommendation for ARC sizing with NFS?

Best Regards,
Jason


On 1/26/07, Jeffery Malloch [EMAIL PROTECTED] wrote:
 Hi Folks,

 I am currently in the midst of setting up a completely new file server using a pretty 
well loaded Sun T2000 (8x1GHz, 16GB RAM) connected to an Engenio 6994 product (I work for 
LSI Logic so Engenio is a no brainer).  I have configured a couple of zpools from Volume 
groups on the Engenio box - 1x2.5TB and 1x3.75TB.  I then created sub zfs systems below that 
and set quotas and sharenfs'd them so that it appears that these file systems 
are dynamically shrinkable and growable.  It looks very good...  I can see the correct file 
system sizes on all types of machines (Linux 32/64bit and of course Solaris boxes) and if I 
resize the quota it's picked up in NFS right away.  But I would be the first in our 
organization to use this in an enterprise system so I definitely have some concerns that I'm 
hoping someone here can address.

 1.  How stable is ZFS?  The Engenio box is completely configured for RAID5 
with hot spares and write cache (8GB) has battery backup so I'm not too concerned 
from a hardware side.  I'm looking for an idea of how stable ZFS itself is in 
terms of corruptability, uptime and OS stability.

 2.  Recommended config.  Above, I have a fairly simple setup.  In many of the 
examples the granularity is home directory level and when you have many many users 
that could get to be a bit of a nightmare administratively.  I am really only 
looking for high level dynamic size adjustability and am not interested in its 
built in RAID features.  But given that, any real world recommendations?

 3.  Caveats?  Anything I'm missing that isn't in the docs that could turn 
into a BIG gotchya?

 4.  Since all data access is via NFS we are concerned that 32 bit systems 
(Mainly Linux and Windows via Samba) will not be able to access all the data areas 
of a 2TB+ zpool even if the zfs quota on a particular share is less then that.  
Can anyone comment?

 The bottom line is that with anything new there is cause for concern.  
Especially if it hasn't been tested within our organization.  But the 
convenience/functionality factors are way too hard to ignore.

 Thanks,

 Jeff


 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-01-26 Thread Al Hopper
On Fri, 26 Jan 2007, Rainer Heilke wrote:

  So, if I was an enterprise, I'd be willing to keep
   enough empty LUNs
  available to facilitate at least the migration of
   one or more filesystems
  if not complete pools.

 reformatted ...
 You might be, but don't be surprised when the Financials folks laugh you
 out of their office. Large corporations do not make money by leaving
 wads of cash lying around, and that's exactly what a few terabytes of
 unused storage in a high-end SAN is. This is in addition to the laughter

But this is exactly where ZFS distrupts Large corporations thinking.
You're talking about (for example) 2 terabytes on a high-end SAN which
costs (what ?) per GB (including the capital cost of the hi-end SAN)
versus a dual Opteron box with 12 * 500Gb SATA disk drives that gives you
5TB of storage for (in round numbers) a total of ~ $6k.  And how much are
your ongoing monthlies on that hi-end SAN box?  (Don't answer)  So - aside
from the occasional use of the box for data migration, this ZFS storage
box has 1,001 other uses.  Pick any two (uses), based on your knowledge
of big corporation thinking and its an easy sell to management.

Now your accounting folks are going to be asking you to justify the
purchase of that hi-end SAN box and why you're not using ZFS
everywhere.  :)

Oh - and the accounting folks love it when you tell them there's no
ongoing cost of ownership - because Joe Screwdriver can swap out a failed
Seagate 500Gb SATA drive after he picks up a replacement from Frys on his
lunch break!

 generated by the comment that, not a big deal if the Financials and HR
 databases are offline for three days while we do the migration. Good

Again - sounds like more legacy thinking.  With multiple gigabit
ethernet connections you can move terrabytes of information in a hour,
instead of in 24-hours - using legacy tape systems etc.  This can be
easily handled during scheduled downtime.

 luck writing up a business case that justifies this sort of fiscal
 generosity.

 Sorry, this argument smacks a little too much of being out of touch with
 the fiscal (and time) restrictions of working in a typical corporation,
 as opposed to a well-funded research group.

 I hope I'm not sounding rude, but those of us working in medium to large
 corporations simply do not have the money for such luxuries. Period.

On the contrary - if you're not thinking ZFS, you're wasting a ton of IT
$s and hurting the competitiveness of your business.

Regards,

Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
   Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
 OpenSolaris Governing Board (OGB) Member - Feb 2006
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-01-26 Thread Jason J. W. Williams

To be fair, you can replace vdevs with same-sized or larger vdevs online.
The issue is that you cannot replace with smaller vdevs nor can you
eliminate vdevs.  In other words, I can migrate data around without
downtime, I just can't shrink or eliminate vdevs without send/recv.
This is where the philosophical disconnect lies.  Everytime we descend
into this rathole, we stir up more confusion :-(


We did just this to move off RAID-5 LUNs that were the vdevs for a
pool, to RAID-10 LUNs. Worked very well, and as Richard said was done
all on-line. Doesn't really address the shrinking issue though. :-)

Best Regards,
Jason
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] multihosted ZFS

2007-01-26 Thread Jason J. W. Williams

You could use SAN zoning of the affected LUN's to keep multiple hosts
from seeing the zpool.  When failover time comes, you change the zoning
to make the LUN's visible to the new host, then import.  When the old
host reboots, it won't find any zpool.  Better safe than sorry


Or change the LUN masking on the array. Depending on your switch that
can be less disruptive, and depending on your storage array might be
able to be scripted.

Best Regards,
Jason
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Project Proposal: Availability Suite

2007-01-26 Thread Jason J. W. Williams

Could the replication engine eventually be integrated more tightly
with ZFS? That would be slick alternative to send/recv.

Best Regards,
Jason

On 1/26/07, Jim Dunham [EMAIL PROTECTED] wrote:

Project Overview:

I propose the creation of a project on opensolaris.org, to bring to the community 
two Solaris host-based data services; namely volume snapshot and volume 
replication. These two data services exist today as the Sun StorageTek Availability 
Suite, a Solaris 8, 9  10, unbundled product set, consisting of Instant Image 
(II) and Network Data Replicator (SNDR).

Project Description:

Although Availability Suite is typically known as just two data services (II  
SNDR), there is an underlying Solaris I/O filter driver framework which supports 
these two data services. This framework provides the means to stack one or more 
block-based, pseudo device drivers on to any pre-provisioned cb_ops structure, [ 
http://www.opensolaris.org/os/article/2005-03-31_inside_opensolaris__solaris_driver_programming/#datastructs
 ], thereby shunting all cb_ops I/O into the top of a developed filter driver, (for 
driver specific processing), then out the bottom of this filter driver, back into 
the original cb_ops entry points.

Availability Suite was developed to interpose itself on the I/O stack of a 
block device, providing a filter driver framework with the means to intercept 
any I/O originating from an upstream file system, database or application layer 
I/O. This framework provided the means for Availability Suite to support 
snapshot and remote replication data services for UFS, QFS, VxFS, and more 
recently the ZFS file system, plus various databases like Oracle, Sybase and 
PostgreSQL, and also application I/Os. By providing a filter driver at this 
point in the Solaris I/O stack, it allows for any number of data services to be 
implemented, without regard to the underlying block storage that they will be 
configured on. Today, as a snapshot and/or replication solution, the framework 
allows both the source and destination block storage device to not only differ 
in physical characteristics (DAS, Fibre Channel, iSCSI, etc.), but also logical 
characteristics such as in RAID type, volume managed storage (i.e., SVM, VxVM), 
lofi, zvols, even ram disks.

Community Involvement:

By providing this filter-driver framework, two working filter drivers (II  SNDR), 
and an extensive collection of supporting software and utilities, it is envisioned that 
those individuals and companies that adopt OpenSolaris as a viable storage platform, 
will also utilize and enhance the existing II  SNDR data services, plus have 
offered to them the means in which to develop their own block-based filter driver(s), 
further enhancing the use and adoption on OpenSolaris.

A very timely example that is very applicable to Availability Suite and the OpenSolaris 
community, is the recent announcement of the Project Proposal: lofi [ compression  
encryption ] - http://www.opensolaris.org/jive/click.jspamessageID=26841. By 
leveraging both the Availability Suite and the lofi OpenSolaris projects, it would be 
highly probable to not only offer compression  encryption to lofi devices (as already 
proposed), but by collectively leveraging these two project, creating the means to support 
file systems, databases and applications, across all block-based storage devices.

Since Availability Suite has strong technical ties to storage, please look for email 
discussion for this project at: storage-discuss at opensolaris dot org

A complete set of Availability Suite administration guides can be found at: 
http://docs.sun.com/app/docs?p=coll%2FAVS4.0


Project Lead:

Jim Dunham http://www.opensolaris.org/viewProfile.jspa?username=jdunham

Availability Suite - New Solaris Storage Group


This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-26 Thread Gary Mills
On Fri, Jan 26, 2007 at 11:05:17AM -0800, Ed Gould wrote:
 
 A number that I've been quoting, albeit without a good reference, comes 
 from Jim Gray, who has been around the data-management industry for 
 longer than I have (and I've been in this business since 1970); he's 
 currently at Microsoft.  Jim says that the controller/drive subsystem 
 writes data to the wrong sector of the drive without notice about once 
 per drive per year.  In a 400-drive array, that's once a day.  ZFS will 
 detect this error when the file is read (one of the blocks' checksum 
 will not match).  But it can only correct the error if it manages the 
 redundancy.

My only qualification to enter this discussion is that I once wrote a
floppy disk format program for minix.  I recollect, however, that each
sector on the disk is accompanied by a block that contains the sector
address and a CRC.  In order to write to the wrong sector, both of
these items would have to be read incorrectly.  Otherwise, the
controller would never find the wrong sector.  Are we just talking
about a CRC failure here?  That would be random, but the frequency
of CRC errors would depend on the signal quality.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-26 Thread Toby Thain


On 26-Jan-07, at 7:29 PM, Selim Daoud wrote:


it would be good to have real data and not only guess ot anecdots

this story about wrong blocks being written by  RAID controllers
sounds like the anti-terrorism propaganda we are leaving in: exagerate
the facts to catch everyone's attention
.It's going to take more than that to prove RAID ctrls have been doing
a bad jobs for the last 30 years


It does happen. Hard numbers are available if you look. This sounds a  
bit like the RAID expert I bumped into who just couldn't see the  
paradigm had shifted under him -- the implications of end to end.






Let's make up  real stories with hard fact first
s.



Related links:
https://www.gelato.unsw.edu.au/archives/comp-arch/2006-September/ 
003008.html
http://www.lockss.org/locksswiki/files/3/30/Eurosys2006.pdf [A Fresh  
Look at the Reliability of Long-term Digital Storage, 2006]
http://www.ecsl.cs.sunysb.edu/tr/rpe19.pdf [Challenges of Long-Term  
Digital Archiving: A Survey, 2006]
http://www.cs.wisc.edu/~vijayan/vijayan-thesis.pdf [IRON File  
Systems, 2006]
http://www.tcs.hut.fi/~hhk/phd/phd_Hannu_H_Kari.pdf [Latent Sector  
Faults and Reliability of Disk Arrays, 1997]


--T



On 1/26/07, Ed Gould [EMAIL PROTECTED] wrote:

On Jan 26, 2007, at 12:52, Dana H. Myers wrote:
 So this leaves me wondering how often the controller/drive  
subsystem

 reads data from the wrong sector of the drive without notice; is it
 symmetrical with respect to writing, and thus about once a drive/ 
year,

 or are there factors which change this?

My guess is that it would be symmetric, but I don't really know.

--Ed

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-01-26 Thread Toby Thain


Oh - and the accounting folks love it when you tell them there's no
ongoing cost of ownership - because Joe Screwdriver can swap out a  
failed
Seagate 500Gb SATA drive after he picks up a replacement from Frys  
on his

lunch break!


Why do people think this will work? I never could figure it out.

There's many a slip 'twixt cup and lip. You need the spare already  
sitting there.


--T

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-26 Thread Darren Dunham
 My only qualification to enter this discussion is that I once wrote a
 floppy disk format program for minix.  I recollect, however, that each
 sector on the disk is accompanied by a block that contains the sector
 address and a CRC.

You'd have to define the layer you're talking about.  I presume
something like this occurs between a dumb disk and an intelligent
controller, or even within the encoding parameters of a disk, but I
don't think it does between say a SCSI/FC controller and a disk.

So if the drive itself put the head in the wrong sector, maybe it could
figure that out.  But perhaps the scsi controller had a bug and sent the
wrong address to the drive.  I don't think there's anything at that
layer that would notice (unless the application/file system is encoding
intent into the data).

Corrections about my assumption with SCSI/FC/ATA appreciated.

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Project Proposal: Availability Suite

2007-01-26 Thread Jim Dunham

Jason J. W. Williams wrote:
Could the replication engine eventually be integrated more tightly 
with ZFS?
Not it in the present form. The architecture and implementation of 
Availability Suite is driven off block-based replication at the device 
level (/dev/rdsk/...), something that allows the product to replicate 
any Solaris file system, database, etc., without any knowledge of what 
it is actually replicating.


To pursue ZFS replication in the manner of Availability Suite, one needs 
to see what replication looks like from an abstract point of view. So 
simplistically, remote replication is like the letter 'h', where the 
left side of the letter is the complete I/O path on the primary node, 
the horizontal part of the letter is the remote replication network 
link, and the right side of the letter is only the bottom half of the 
complete I/O path on the secondary node.


Next ZFS would have to have its functional I/O path split into two 
halves, a top and bottom piece.  Next we configure replication, the 
letter 'h', between two given nodes, running both a top and bottom piece 
of ZFS on the source node, and just the bottom half of ZFS on the 
secondary node.


Today, the SNDR component of Availability Suite works like the letter 
'h' today, where we split the Solaris I/O stack into a top and bottom 
half. The top half is that software (file system, database or 
application I/O) that directs its I/Os to the bottom half (raw device, 
volume manager or block device).


So all that needs to be done is to design and build a new variant of the 
letter 'h', and find the place to separate ZFS into two pieces.


- Jim Dunham



That would be slick alternative to send/recv.

Best Regards,
Jason

On 1/26/07, Jim Dunham [EMAIL PROTECTED] wrote:

Project Overview:

I propose the creation of a project on opensolaris.org, to bring to 
the community two Solaris host-based data services; namely volume 
snapshot and volume replication. These two data services exist today 
as the Sun StorageTek Availability Suite, a Solaris 8, 9  10, 
unbundled product set, consisting of Instant Image (II) and Network 
Data Replicator (SNDR).


Project Description:

Although Availability Suite is typically known as just two data 
services (II  SNDR), there is an underlying Solaris I/O filter 
driver framework which supports these two data services. This 
framework provides the means to stack one or more block-based, pseudo 
device drivers on to any pre-provisioned cb_ops structure, [ 
http://www.opensolaris.org/os/article/2005-03-31_inside_opensolaris__solaris_driver_programming/#datastructs 
], thereby shunting all cb_ops I/O into the top of a developed filter 
driver, (for driver specific processing), then out the bottom of this 
filter driver, back into the original cb_ops entry points.


Availability Suite was developed to interpose itself on the I/O stack 
of a block device, providing a filter driver framework with the means 
to intercept any I/O originating from an upstream file system, 
database or application layer I/O. This framework provided the means 
for Availability Suite to support snapshot and remote replication 
data services for UFS, QFS, VxFS, and more recently the ZFS file 
system, plus various databases like Oracle, Sybase and PostgreSQL, 
and also application I/Os. By providing a filter driver at this point 
in the Solaris I/O stack, it allows for any number of data services 
to be implemented, without regard to the underlying block storage 
that they will be configured on. Today, as a snapshot and/or 
replication solution, the framework allows both the source and 
destination block storage device to not only differ in physical 
characteristics (DAS, Fibre Channel, iSCSI, etc.), but also logical 
characteristics such as in RAID type, volume managed storage (i.e., 
SVM, VxVM), lofi, zvols, even ram disks.


Community Involvement:

By providing this filter-driver framework, two working filter drivers 
(II  SNDR), and an extensive collection of supporting software and 
utilities, it is envisioned that those individuals and companies that 
adopt OpenSolaris as a viable storage platform, will also utilize and 
enhance the existing II  SNDR data services, plus have offered to 
them the means in which to develop their own block-based filter 
driver(s), further enhancing the use and adoption on OpenSolaris.


A very timely example that is very applicable to Availability Suite 
and the OpenSolaris community, is the recent announcement of the 
Project Proposal: lofi [ compression  encryption ] - 
http://www.opensolaris.org/jive/click.jspamessageID=26841. By 
leveraging both the Availability Suite and the lofi OpenSolaris 
projects, it would be highly probable to not only offer compression  
encryption to lofi devices (as already proposed), but by collectively 
leveraging these two project, creating the means to support file 
systems, databases and applications, across all block-based storage 
devices.


Since Availability 

Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-01-26 Thread Al Hopper
On Fri, 26 Jan 2007, Toby Thain wrote:

 
  Oh - and the accounting folks love it when you tell them there's no
  ongoing cost of ownership - because Joe Screwdriver can swap out a
  failed
  Seagate 500Gb SATA drive after he picks up a replacement from Frys
  on his
  lunch break!

 Why do people think this will work? I never could figure it out.

 There's many a slip 'twixt cup and lip. You need the spare already
 sitting there.

Agreed.  I remember years ago, when a Sun service tech came onsite at a
fortune 100 company I was working in at the time and we stopped him,
handed him a disk drive in an anti-static bag and said - don't unpack
your tools - it was a bad disk, we replaced it from our spares, here's the
bad one - please replace it under the service agreement.  He thought
about this for about 5 Seconds and said; I wish all my customers were
like you guys.  Then he was gone!  :)

Regards,

Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
   Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
 OpenSolaris Governing Board (OGB) Member - Feb 2006
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-01-26 Thread Al Hopper
On Fri, 26 Jan 2007, Torrey McMahon wrote:

 Al Hopper wrote:
 
  Now your accounting folks are going to be asking you to justify the
  purchase of that hi-end SAN box and why you're not using ZFS
  everywhere.  :)
 
  Oh - and the accounting folks love it when you tell them there's no
  ongoing cost of ownership - because Joe Screwdriver can swap out a failed
  Seagate 500Gb SATA drive after he picks up a replacement from Frys on his
  lunch break!

 Because ZFS doesn't run everywhere.
 Because most low end JBODs are low end for a reason. They aren't as
 reliable, have crappy monitoring, etc.

Agreed.  There will never be one screwdriver that fits everything.  I was
simply trying to re-inforce my point.

 Fix those two things when you get a chance. ;)

Have a good weekend Torrey (and zfs-discuss).

Regards,

Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
   Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
 OpenSolaris Governing Board (OGB) Member - Feb 2006
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-01-26 Thread Torrey McMahon

Richard Elling wrote:


Personally, I've never been in the situation where users ask for less 
storage,
but maybe I'm just the odd guy out? ;-) 


You just realized that JoeSysadmin allocated ten luns to the zpool when 
he realy only should have allocated one.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-01-26 Thread Torrey McMahon

Al Hopper wrote:

On Fri, 26 Jan 2007, Torrey McMahon wrote:

  

Al Hopper wrote:


Now your accounting folks are going to be asking you to justify the
purchase of that hi-end SAN box and why you're not using ZFS
everywhere.  :)

Oh - and the accounting folks love it when you tell them there's no
ongoing cost of ownership - because Joe Screwdriver can swap out a failed
Seagate 500Gb SATA drive after he picks up a replacement from Frys on his
lunch break!
  

Because ZFS doesn't run everywhere.
Because most low end JBODs are low end for a reason. They aren't as
reliable, have crappy monitoring, etc.



Agreed.  There will never be one screwdriver that fits everything.  I was
simply trying to re-inforce my point.
  


It's a good point. We just need to make sure we don't forget that part. 
People love to pull email threads out of contextor google for that 
matter. ;)


  

Fix those two things when you get a chance. ;)



Have a good weekend Torrey (and zfs-discuss).


Same to you Al. (and zfs-discuss).

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-26 Thread Torrey McMahon

Toby Thain wrote:


On 26-Jan-07, at 7:29 PM, Selim Daoud wrote:


it would be good to have real data and not only guess ot anecdots

this story about wrong blocks being written by  RAID controllers
sounds like the anti-terrorism propaganda we are leaving in: exagerate
the facts to catch everyone's attention
.It's going to take more than that to prove RAID ctrls have been doing
a bad jobs for the last 30 years


It does happen. Hard numbers are available if you look. This sounds a 
bit like the RAID expert I bumped into who just couldn't see the 
paradigm had shifted under him -- the implications of end to end. 


It happens. As long we look at the numbers in context and don't run 
around going, Hey...have you seen these numbers? What have been doing 
for the last 35 years!?!? we're ok.




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs rewrite?

2007-01-26 Thread Darren Dunham
 What do you guys think about implementing 'zfs/zpool rewrite' command?
 It'll read every block older than the date when the command was executed
 and write it again (using standard ZFS COW mechanism, simlar to how
 resilvering works, but the data is read from the same disk it is written to=
 ).

#1 How do you control I/O overhead?

#2 Snapshot blocks are never rewritten at the moment.  Most of your
   suggestions seem to imply working on the live data, but doing that
   for snapshots as well might be tricky. 

 3. I created file system with huge amount of data, where most of the
 data is read-only. I change my server from intel to sparc64 machine.
 Adaptive endianess only change byte order to native on write and because
 file system is mostly read-only, it'll need to byteswap all the time.
 And here comes 'zfs rewrite'!

It's only the metadata that is modified anyway, not the file data.  I
would hope that this could be done more easily than a full tree rewrite
(and again the issue with snapshots).  Also, the overhead there probably
isn't going to be very high (since the metadata will be cached in most
cases).  

Other than that, I'm guessing something like this will be necessary to
implement disk evacuation/removal.  If you have to rewrite data from one
disk to elsewhere in the pool, then rewriting the entire tree shouldn't
be much harder.

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs rewrite?

2007-01-26 Thread Toby Thain


On 26-Jan-07, at 11:34 PM, Pawel Jakub Dawidek wrote:


Hi.

What do you guys think about implementing 'zfs/zpool rewrite' command?
It'll read every block older than the date when the command was  
executed

and write it again (using standard ZFS COW mechanism, simlar to how
resilvering works, but the data is read from the same disk it is  
written to).


I see few situations where it might be useful:

1. My file system is almost full (or not) and I'd like to enable
compression on it. Unfortunately compression will work from now on and
I'd also like to compress already stored data. Here comes 'zfs  
rewrite'!


2. I was bad boy and turned off checksuming. Now I suspect something
corrupts my data and I'd really like to checksum everything. Ok, here
comes 'zfs rewrite'!


In this case you deserve what you get.



3. I created file system with huge amount of data, where most of the
data is read-only. I change my server from intel to sparc64 machine.
Adaptive endianess only change byte order to native on write and  
because

file system is mostly read-only, it'll need to byteswap all the time.
And here comes 'zfs rewrite'!


Why would this help? (Obviously file data is never 'swapped').

--T



4. Not sure how ZFS traverse blocks tree, if it is done based on  
files,

it my be used to move data from one file closer to each other, which
will reduce seek times. Because of the way how ZFS works, the data may
become fragmented and 'zfs rewrite' could be used for defragmentation.

5. Once file system encryption is implemented, this mechanism can be
used to encrypt existing file system and also it can be used to change
encryption key.

What do you think?

--
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: Re: How much do we really want zpool remove?

2007-01-26 Thread Rainer Heilke
Richard Elling wrote:

 Rainer Heilke wrote:

 So, if I was an enterprise, I'd be willing to keep
  enough empty LUNs
 available to facilitate at least the migration of
  one or more filesystems
 if not complete pools.



 You might be, but don't be surprised when the Financials folks laugh you out 
 of their office. Large corporations do not make money by leaving wads of 
 cash lying around, and that's exactly what a few terabytes of unused storage 
 in a high-end SAN is. This is in addition to the laughter generated by the 
 comment that, not a big deal if the Financials and HR databases are offline 
 for three days while we do the migration. Good luck writing up a business 
 case that justifies this sort of fiscal generosity.



 To be fair, you can replace vdevs with same-sized or larger vdevs online.
 The issue is that you cannot replace with smaller vdevs nor can you
 eliminate vdevs.  In other words, I can migrate data around without
 downtime, I just can't shrink or eliminate vdevs without send/recv.
 This is where the philosophical disconnect lies.  Everytime we descend
 into this rathole, we stir up more confusion :-(

 If you consider your pool of storage as a zpool, then the management of
 subparts of the pool is done at the file system level.  This concept is
 different than other combinations of devices and file systems such as
 SVM+UFS.  When answering the ZFS shrink question, you need to make sure
 you're not applying the old concepts to the new model.

 Personally, I've never been in the situation where users ask for less storage,
 but maybe I'm just the odd guy out? ;-)   Others have offered cases where
 a shrink or vdev restructuring could be useful.  But I still see some
 confusion with file system management (including zvols) and device management.
 The shrink feature is primarily at the device management level.
   -- richard


I understand these arguments, and the differences (and that most users will 
never ask for less storage), but there are many instances where storage needs 
to move around, even between systems, and in that case, unless a whole zpool of 
storage is going, how do you do it? You need to give back two LUN's in a 6-LUN 
zpool. Oh, wait. You can't shrink a zpool.

Many people here are giving examples of where this capability is needed. We 
need to agree that different users' needs vary, and that there are real reasons 
for this.

Rainer
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: Re: How much do we really want zpool remove?

2007-01-26 Thread Rainer Heilke
Al Hopper wrote:

 On Fri, 26 Jan 2007, Rainer Heilke wrote:

 So, if I was an enterprise, I'd be willing to keep
 enough empty LUNs
 available to facilitate at least the migration of
 one or more filesystems
 if not complete pools.

  reformatted ...

 You might be, but don't be surprised when the Financials folks laugh you
 out of their office. Large corporations do not make money by leaving
 wads of cash lying around, and that's exactly what a few terabytes of
 unused storage in a high-end SAN is. This is in addition to the laughter
   


 But this is exactly where ZFS distrupts Large corporations thinking.


Yes and no. A corporation has a SAN for reasons that have been valid for years; 
you won't turn that ship around on a skating rink.

 You're talking about (for example) 2 terabytes on a high-end SAN which
 costs (what ?) per GB (including the capital cost of the hi-end SAN)
 versus a dual Opteron box with 12 * 500Gb SATA disk drives that gives you
 5TB of storage for (in round numbers) a total of ~ $6k.  And how much are
 your ongoing monthlies on that hi-end SAN box?  (Don't answer)  So - aside
 from the occasional use of the box for data migration, this ZFS storage
 box has 1,001 other uses.  Pick any two (uses), based on your knowledge
 of big corporation thinking and its an easy sell to management.

 Now your accounting folks are going to be asking you to justify the
 purchase of that hi-end SAN box and why you're not using ZFS
 everywhere.  :)

No, they're going to be asking me why I want to run a $400K server holding all 
of our inventory and financials data on a cheap piece of storage I picked up at 
Pa's Pizza Parlor and Computer Parts. There are values (real and imagined, 
perhaps) that a SAN offers. And, when the rest of the company is running on the 
SAN, why aren't you?

As a side-note, if your company has a mainframe (yes, they still exist!), when 
will ZFS run on it? We'll need the SAN for a while, yet.

 generated by the comment that, not a big deal if the Financials and HR
 databases are offline for three days while we do the migration. Good

 Again - sounds like more legacy thinking.  With multiple gigabit
 ethernet connections you can move terrabytes of information in a hour,
 instead of in 24-hours - using legacy tape systems etc.  This can be
 easily handled during scheduled downtime.

If your company is graced with being able to cost-justify the rip-and-replace 
of the entire 100Mb network, more power to you. Someone has to pay for all of 
this, and good luck fobbing it all of on some client.

 Sorry, this argument smacks a little too much of being out of touch with
 the fiscal (and time) restrictions of working in a typical corporation,
 as opposed to a well-funded research group.

 I hope I'm not sounding rude, but those of us working in medium to large
 corporations simply do not have the money for such luxuries. Period.

 On the contrary - if you're not thinking ZFS, you're wasting a ton of IT
 $s and hurting the competitiveness of your business.

But you can't write off the investment of the old gear in six months and move 
on. I wish life worked like that, but it doesn't. At least, not where I work. 
:-(

 Regards,

 Al Hopper

Rainer
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs rewrite?

2007-01-26 Thread Frank Cusack

On January 27, 2007 12:27:17 AM -0200 Toby Thain [EMAIL PROTECTED] wrote:

On 26-Jan-07, at 11:34 PM, Pawel Jakub Dawidek wrote:

3. I created file system with huge amount of data, where most of the
data is read-only. I change my server from intel to sparc64 machine.
Adaptive endianess only change byte order to native on write and
because
file system is mostly read-only, it'll need to byteswap all the time.
And here comes 'zfs rewrite'!


Why would this help? (Obviously file data is never 'swapped').


Metadata (incl checksums?) still has to be byte-swapped.  Or would
atime updates also force a metadata update?  Or am I totally mistaken.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-26 Thread Anton B. Rang
 1.  How stable is ZFS?

It's a new file system; there will be bugs.  It appears to be well-tested, 
though.  There are a few known issues; for instance, a write failure can panic 
the system under some circumstances.  UFS has known issues too

 2.  Recommended config.  Above, I have a fairly
 simple setup.  In many of the examples the
 granularity is home directory level and when you have
 many many users that could get to be a bit of a
 nightmare administratively.

Do you need user quotas?  If so, you need a file system per user with ZFS.  
That may be an argument against it in some environments, but in my experience 
tends to be more important in academic settings than corporations.

 4.  Since all data access is via NFS we are concerned
 that 32 bit systems (Mainly Linux and Windows via
 Samba) will not be able to access all the data areas
 of a 2TB+ zpool even if the zfs quota on a particular
 share is less then that.  Can anyone comment?

Not a problem.  NFS doesn't really deal with volumes, just files, so the 
offsets are always file-relative and the volume can be as large as desired.

Anton
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: high density SAS

2007-01-26 Thread Anton B. Rang
  How badly can you mess up a JBOD?
 
 Two words: vibration, cooling.

Three more: power, signal quality.

I've seen even individual drive cases with bad enough signal quality to cause 
bit errors.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs rewrite?

2007-01-26 Thread Jeff Bonwick
On Fri, Jan 26, 2007 at 10:57:19PM -0800, Frank Cusack wrote:
 On January 27, 2007 12:27:17 AM -0200 Toby Thain [EMAIL PROTECTED] wrote:
 On 26-Jan-07, at 11:34 PM, Pawel Jakub Dawidek wrote:
 3. I created file system with huge amount of data, where most of the
 data is read-only. I change my server from intel to sparc64 machine.
 Adaptive endianess only change byte order to native on write and
 because
 file system is mostly read-only, it'll need to byteswap all the time.
 And here comes 'zfs rewrite'!
 
 Why would this help? (Obviously file data is never 'swapped').
 
 Metadata (incl checksums?) still has to be byte-swapped.  Or would
 atime updates also force a metadata update?  Or am I totally mistaken.

You're all correct.  File data is never byte-swapped.  Most metadata
needs to be byte-swapped, but it's generally only 1-2% of your space.
So the overhead shouldn't be significant, even if you never rewrite.

An atime update will indeed cause a znode rewrite (unless you run
with zfs set atime=off), so znodes will get rewritten by reads.

The only other non-trivial metadata is the indirect blocks.
All files up to 128k are stored in a single block: ZFS has
variable blocksize from 512 bytes to 128k, so a 35k file consumes
exactly 35k (not, say, 40k as it would with a fixed 8k blocksize).
Single-block files have no indirect blocks, and hence no metadata
other than the znode.  So all that remains is the indirect blocks
for files larger than 128k -- which is to say, not very much.

Jeff
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss