Re: [zfs-discuss] Re: Re: simple Raid-Z question

2007-04-08 Thread Ed Gould

Eric Haycraft wrote:
Since no one seems to believe that you can expand a raidz pool, I have attached the following output from solaris 11/06 showing me doing just that.  The first expanision is with like sized disks, and the second expansion is with larger disks. I realize that the documentation only has examples using mirrors, but raidz and raid2z are fully supported for adding disk space. 


No one has said that you can't increase the size of a zpool.  What can't 
be increased is the size of a RAID-Z vdev (except by increasing the size 
of all of the components of the RAID-Z).  You have created additional 
RAID-Z vdevs and added them to the pool.

--
--Ed
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: update on zfs boot support

2007-03-19 Thread Ed Gould

Richard Elling wrote:

warning: noun/verb overload.  In my context, swap is a verb.


It is also a common shorthand for swap space.
--
--Ed
begin:vcard
fn:Ed Gould
n:Gould;Ed
org:Sun Microsystems, Inc.;Solaris Cluster
adr;dom:M/S UMPK17-201;;17 Network Circle;Menlo Park;CA;94025
email;internet:[EMAIL PROTECTED]
title:File System Architect, PSARC Chair
tel;work:+1.650.786.4937
x-mozilla-html:FALSE
version:2.1
end:vcard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: multihosted ZFS

2007-01-26 Thread Ed Gould

On Jan 26, 2007, at 7:17, Peter Eriksson wrote:
If you _boot_ the original machine then it should see that the pool 
now is owned by
the other host and ignore it (you'd have to do a zpool import -f 
again I think). Not tested though so don't take my word for it...


Conceptually, that's about right, but in practice it's not quite as 
simple as that.  We had to do a lot of work in Cluster to ensure that 
the zpool would never be imported on more than one node at a time.


However if you simply type go and let it continue from where it was 
then things definitely will not be pretty... :-)


Yes, but that's only one of the bad scenarios.

--Ed

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-26 Thread Ed Gould

On Jan 26, 2007, at 9:42, Gary Mills wrote:

How does this work in an environment with storage that's centrally-
managed and shared between many servers?  I'm putting together a new
IMAP server that will eventually use 3TB of space from our Netapp via
an iSCSI SAN.  The Netapp provides all of the disk management and
redundancy that I'll ever need.  The server will only see a virtual
disk (a LUN).  I want to use ZFS on that LUN because it's superior
to UFS in this application, even without the redundancy.  There's
no way to get the Netapp to behave like a JBOD.  Are you saying that
this configuration isn't going to work?


It will work, but if the storage system corrupts the data, ZFS will be 
unable to correct it.  It will detect the error.


A number that I've been quoting, albeit without a good reference, comes 
from Jim Gray, who has been around the data-management industry for 
longer than I have (and I've been in this business since 1970); he's 
currently at Microsoft.  Jim says that the controller/drive subsystem 
writes data to the wrong sector of the drive without notice about once 
per drive per year.  In a 400-drive array, that's once a day.  ZFS will 
detect this error when the file is read (one of the blocks' checksum 
will not match).  But it can only correct the error if it manages the 
redundancy.


I would suggest exporting two LUNs from your central storage and let 
ZFS mirror them.  You can get a wider range of space/performance 
tradeoffs if you give ZFS a JBOD, but that doesn't sound like an 
option.


--Ed

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A little different look at filesystems ... Just looking for ideas

2007-01-26 Thread Ed Gould

On Jan 26, 2007, at 10:57, Ross, Gary (G.A.) wrote:

...
What if something like the old CacheFS was revived, using ZFS as the
base file system instead of UFS?
 ...

Could this be a good thing, or am I way off base???


Disconnected operation is a hard problem.  One of the better research 
efforts in that area was CODA, at CMU.  CODA was, as I recall, and 
extension to AFS, but it's probably reasonable to take some of those 
ideas and marry them with ZFS.  CODA is now open-source; at least the 
BSDs have it.


--Ed

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-26 Thread Ed Gould

On Jan 26, 2007, at 12:13, Richard Elling wrote:

On Fri, Jan 26, 2007 at 11:05:17AM -0800, Ed Gould wrote:
A number that I've been quoting, albeit without a good reference, 
comes from Jim Gray, who has been around the data-management industry 
for longer than I have (and I've been in this business since 1970); 
he's currently at Microsoft.  Jim says that the controller/drive 
subsystem writes data to the wrong sector of the drive without notice 
about once per drive per year.  In a 400-drive array, that's once a 
day.  ZFS will detect this error when the file is read (one of the 
blocks' checksum will not match).  But it can only correct the error 
if it manages the redundancy.


The quote from Jim seems to be related to the leaves of the tree 
(disks).

Anecdotally, now that we have ZFS at the trunk, we're seeing that the
branches are also corrupting data.  We've speculated that it would 
occur,

but now we can measure it, and it is non-zero.  See Anantha's post for
one such anecdote.


Actually, Jim was referring to everything but the trunk.  He didn't 
specify where from the HBA to the drive the error actually occurs.  I 
don't think it really matters.  I saw him give a talk a few years ago 
at the Usenix FAST conference; that's where I got this information.


--Ed

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-26 Thread Ed Gould

On Jan 26, 2007, at 12:52, Dana H. Myers wrote:

So this leaves me wondering how often the controller/drive subsystem
reads data from the wrong sector of the drive without notice; is it
symmetrical with respect to writing, and thus about once a drive/year,
or are there factors which change this?


My guess is that it would be symmetric, but I don't really know.

--Ed

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-26 Thread Ed Gould

On Jan 26, 2007, at 13:16, Dana H. Myers wrote:

I would tend to expect these spurious events to impact read and write
equally; more specifically, the chance of any one read or write being
mis-addressed is about the same.  Since, AFAIK, there are many more 
reads
from a disk typically than writes, this would seem to suggest that 
there

would be more mis-addressed reads in a drive/year than mis-addressed
writes.  Is this the reason for the asymmetry?


Jim's once per drive per year number was not very precise.  I took it 
to be just one significant digit.  I don't recall if he distinguished 
reads from writes.


--Ed

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-26 Thread Ed Gould

On Jan 26, 2007, at 13:29, Selim Daoud wrote:

it would be good to have real data and not only guess ot anecdots


Yes, I agree.  I'm sorry I don't have the data that Jim presented at 
FAST, but he did present actual data.  Richard Elling (I believe it was 
Richard) has also posted some related data from ZFS experience to this 
list.


There is more than just anecdotal evidence for this.

--Ed

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] External drive enclosures + Sun Server for mass storage

2007-01-20 Thread Ed Gould

Shannon Roddy wrote:

For sun to charge 4-8 times street price for hard drives that
they order just the same as I do from the same manufacturers that I
order from is infuriating.


Are you sure they're really the same drives?  Mechanically, they 
probably are, but last I knew (I don't work in the Storage part of Sun, 
so I have no particular knowledge about current practices), Sun and 
other systems vendors (I know both Apple and DEC did) had custom 
firmware in the drives they resell.  One reason for this is that the 
systems vendors qualified the drives with a particular firmware load, 
and did not buy just the latest firmware that the drive manufacturer 
wanted to ship, for quality-control reasons.  At least some of the time, 
there were custom functionality changes as well.

--
--Ed
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Distributed FS

2007-01-08 Thread Ed Gould

Ivan wrote:

Hi,

Is ZFS comparable to PVFS2?  Could it also be used as an distributed filesystem 
at the moment or are there any plans for this in the future?


I don't know anything at all about PVFS2, so I can't comment on that point.

As far as ZFS being used as a distributed file system, it cannot be used 
as such today, but it is something we sould like to develop.  Do you 
have a specific use case in mind for a distributed file system?

--
--Ed
begin:vcard
fn:Ed Gould
n:Gould;Ed
org:Sun Microsystems, Inc.;Solaris Cluster
adr;dom:M/S UMPK17-201;;17 Network Circle;Menlo Park;CA;94025
email;internet:[EMAIL PROTECTED]
title:File System Architect, PSARC Chair
tel;work:+1.650.786.4937
x-mozilla-html:FALSE
version:2.1
end:vcard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: !

2006-12-22 Thread Ed Gould

On  Dec 22, 2006, at 09:50, Anton B. Rang wrote:

Phantom writes and/or misdirected reads/writes:

I haven't seen probabilities published on this; obviously the disk  
vendors would claim zero, but we believe they're slightly  
wrong.  ;-)  That said, 1 in 10^8 bits would mean we’d have an  
error in every 12 megabytes written!  That’s clearly far too low.   
1 in 10^8 blocks would be an error in every 46 gigabytes written;  
that is also clearly far too low. (At 1 GB/second that would be a  
phantom write every minute.)


Jim Gray (a well-known and respected database expert, currently at  
Microsoft) clams that the drive/controller combination will write  
data to the wrong place on the drive at a rate of about one incident/ 
drive/year.  In a 400-drive array (JBOD or RAID, doesn't matter),  
that would be about once a day.  This is a kind of error that (so  
far, at least) can only be detected (and potentially corrected, given  
redundancy) by ZFS.


--Ed



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A Plea for Help: Thumper/ZFS/NFS/B43

2006-12-09 Thread Ed Gould

On Dec 9, 2006, at 8:59 , Jim Mauro wrote:
AnywayI'm feeling rather naive' here, but I've seen the NFS  
enforced synchronous semantics phrase
kicked around many times as the explanation for suboptimal  
performance for metadata-intensive
operations when ZFS is the underlying file system, but I never  
really understood what's unsynchronous

about doing the same thing to a local ZFS


If I remember correctly, the difference is that NFS requires that the  
operation be committed to stable storage before the return to the  
client.  This is definitely a heavier operation than the local case,  
where the return to the caller may happen as soon as the operation is  
cached.  If there's a crash, the local system does not guarantee to  
the caller that the operation is on disk, but NFS does.


Both guarantee consistency but NFS makes stronger guarantees of  
completeness.


--Ed


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mirrored Raidz

2006-10-20 Thread Ed Gould

On Oct 20, 2006, at 0:48, Torrey McMahon wrote:

Anthony Miller wrote:
I want to create create a raidz on one array and have it mirrored to 
the other array.


Do you think this will get you more availability compared to a simple 
mirror? I'm curious as to why you would want to do this.


This configuration will survive the failure of one drive in either 
RAIDZ *plus* the failure of any number of drives (or the whole mirror) 
in the other.  That may or may not be valuable enough to choose, but it 
will survive more failures than just a mirror.


--Ed

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic

2006-09-08 Thread Ed Gould

On Sep 8, 2006, at 9:33, Richard Elling - PAE wrote:
I was looking for a new AM2 socket motherboard a few weeks ago.  All 
of the ones
I looked at had 2xIDE and 4xSATA with onboard (SATA) RAID.  All were 
less than $150.
In other words, the days of having a JBOD-only solution are over 
except for single

disk systems.  4x750 GBytes is a *lot* of data (and video).


It's not clear to me that JBOD is dead.  The (S)ATA RAID cards I've 
seen are really software RAID solutions that know just enough in the 
controller to let the BIOS boot off a RAID volume.  None of the 
expensive RAID stuff is in the controller.


--Ed

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can a zfs storage pool be imported readonly?

2006-08-30 Thread Ed Gould

oab wrote:

I'm new to ZFS so I was wondering if it is possible to concurrently
share a ZFS storage pool between two separate machines. I am currently
evaluating Sybase IQ
running on ZFS rather than raw devices(initial performance tests look
very promising) and need now to evaluate whether the IQ query server

 functionality will work.

Not today.  We are very early on in the process of defining a project to 
make ZFS a shared file system for Sun Cluster.  When this project is 
complete (no idea when that might be, yet, sorry), it will be possible 
to share a zpool among nodes of a cluster.


We have support for fail-over ZFS (HA-ZFS) in the next release of Sun 
Cluster, due out later this year.  That doesn't provide concurrent 
access, but it does provide high availability.



This simply means that the Query server will have readonly access to
the data (no lock manage required as far as I can tell), so that is why
I would need to import the storage pool readonly or at least without
exporting it.


It's conceivable that it might work if *all* the nodes imported the pool 
read-only, but that's not tested and certainly not supported.

--
--Ed
begin:vcard
fn:Ed Gould
n:Gould;Ed
org:Sun Microsystems, Inc.;Sun Cluster
adr;dom:M/S UMPK17-201;;17 Network Circle;Menlo Park;CA;94025
email;internet:[EMAIL PROTECTED]
title:File System Architect
tel;work:650/786-4937
x-mozilla-html:FALSE
version:2.1
end:vcard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Proposal: zfs create -o

2006-08-15 Thread Ed Gould

Brian Hechinger wrote:

Could you mix and match by keeping the current style assuming there
are no -o options present?

# zfs create pool/fs

If you need to specify options, then they should all be options:

# zfs create -o name=pool/fs -o mountpoint=/bar -o etc


I would be tempted to have two forms of the command.  One (as exists 
today) takes no options.  The other takes only options, but doesn't need 
the -o markers.  For example:


zfs create pool/fs
or
zfs create mountpoint=/bar name=pool/fs compression=on

In the second form, options may appear in any order.

--
--Ed
begin:vcard
fn:Ed Gould
n:Gould;Ed
org:Sun Microsystems, Inc.;Sun Cluster
adr;dom:M/S UMPK17-201;;17 Network Circle;Menlo Park;CA;94025
email;internet:[EMAIL PROTECTED]
title:File System Architect
tel;work:650/786-4937
x-mozilla-html:FALSE
version:2.1
end:vcard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: PSARC 2006/288 zpool history

2006-05-03 Thread Ed Gould

On May 3, 2006, at 15:21, eric kustarz wrote:
There's basically two writes that need to happen: one for time and one 
for the subcommand string.  The kernel just needs to make sure if a 
write completes, the data is parseable (has a delimiter).  Its then up 
to the userland parser (zpool history) to figure out if there are 
incomplete records, but the kernel guarantees it is parseable.  The 
userland command only prints out complete records, nothing partial.  
So the userland command needs to handle if say for one record that 
the time entry was written but the subcommand was not.


I'm not clear on how the parser knows enough to do that.  I believe I 
saw that a record looked like


\0arbitrary number of NUL-terminated strings

If this is correct, how can the parser know if a string (or part of 
one) got dropped?


I think this might be a case where a structured record (like the 
compact XML suggestion made earlier) would help.  At least having 
distinguished start and end markers (whether they be one byte each, 
or XML constructs) for a record looks necessary to me.


--Ed

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss