date:20070221

Re[2]: [zfs-discuss] Google paper on disk reliability

2007-02-21 Thread Robert Milkowski

Hello Jesus,

Wednesday, February 21, 2007, 5:54:35 AM, you wrote:

JC -BEGIN PGP SIGNED MESSAGE-
JC Hash: SHA1

JC Joerg Schilling wrote:
 What they missed to say is that you need to access the whole disk
 frequently enough in order to give SMART the ability to work.

JC I thought modern disks could be instructed to do offline scanning,
JC using any idle time available.

it was mentioned also in the paper

-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Perforce on ZFS

2007-02-21 Thread Roch - PAE


So Jonathan, you have a concern about the on-disk space
efficiency for small file (more or less subsector). It is a
problem that we can throw rust at. I am not sure if this is
the basis of Claude's concern though.

Creating small files, last week I did a small test. With ZFS
I can create 4600 files  _and_ sync up the  pool to disk and
saw no more than 500 I/Os.  I'm no FS  expert but this looks
absolutely amazing to  me (ok,  I'm rather enthousiastic  in
general).  Logging UFS needs 1 I/O per file (so ~10X more for
my test).  I don't know where  other filesystems are on that
metric.

I also pointed out that ZFS is not too CPU efficient at tiny 
write(2) syscalls. But this inefficiency rescinds around 8K writes.
This here is a CPU benchmark (I/O is non-factor) :

CHUNK   ZFS vz UFS

1B  4X slower
1K  2X slower
8K  25% slower
32K equal
64K 30% faster

Waiting for a more specific problem statement, I can only
stick to what I said, I know of no small file problems with
ZFS; If there is one, I'd just like to see the data.

-r

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Samba ACLs en ZFS

2007-02-21 Thread Rodrigo Lería


Thank you gro your answers, but I have another question.

If I don't use any special ACL with Samba and ZFS, only each user can
write and read from his home directory. I am affected with the
incompatibility?


Thank you again.

Rod

2007/2/19, Eric Enright [EMAIL PROTECTED]:

On 2/19/07, Rod [EMAIL PROTECTED] wrote:
 Are now supported Samba Acls with ZFS?

 I am looking for information on the release notes of 3.0.24 version Samba, 
but I can't see anything about ZFS and ACLs.

 Does anybody knows something?

It's not there yet.  I spent some time looking at this a few weeks
ago, and last I looked there was a Sun engineer on the SFW team
working on ZFS ACL support, who said he'd have something in two or
three weeks.  That was several weeks ago, and I haven't looked into
it beyond a quick glance since.

One thing I did try out was loopback mounting the filesystem via NFS
and exporting /that/ with Samba, which seemed to work fine as far as
getting/setting ACLs via Explorer.  That is clearly not an optimal
solution, however, and I decided that I could live with the real
permissions being invisible.

--
Eric Enright




--

Rodrigo Lería
http://www.preparatuviaje.com

*
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: SPEC SFS benchmark of NFS/ZFS/B56 - please help to improve it!

2007-02-21 Thread Leon Koll

More detailed description of readdir test and conclusion at the end:

Roch asked me:
 Is this a NFS V3 or V4 test or don't care ?

I am running NFS V3 but the short test of NFS V4 showed that the
problem is there.

Then Roch asked:
 I've run rdir  on a few of  my large directories, However my
 large directories  are  not much larger  than  ncsize, maybe
 your's are. Do I understand that you hit the issue only upon
 first large rdir after reboot ?

After reboot of the NFS client (see below).

Then Roch added:
 If so, it might me that we get a speedup from the part of
 the run in which we are initially filling the dnlc cache.
 That could explain thge increase in sys time. But the real
 time increase seems too much to be due to this.

 Anyway I'm interested in the directory size rdir reports and
 the ncsize/D from mdb -k. Also a third pass through might
 yield a lead.

 -r

ncsize has a default value. People told me don't increase dnlc size when 
running ZFS.
# echo 'ncsize/D' | mdb -k
ncsize:
ncsize: 129675

Directory size? There are 160 ZFS'es under zpool tank1, each ZFS is
202MB, total 31.5GB, 1224000 files

# zpool list
NAMESIZEUSED   AVAILCAP  HEALTH ALTROOT
tank1   382G   31.5G351G 8%  ONLINE -

More detailed results:
ZFS local runs - normal behavior:

1. 2:33.406
2. 2:25.353
3. 2:27.033

NFS V3/ZFS runs - first is ok, then jumped up:

1. 3:14.185
2. 4:47.681
3. 4:52.213
4. 4:49.841
5. 4:53.069
6. 4:45.290

after reboot of the NFS client:

1. 2:56.760
2. 4:43.397

after reboot of both client and server:

1.real 3:12.841
2.real 4:50.869

after reboot of the NFS server only:

1. 5:15.048
2. 4:54.686
3. 4:48.713

It means the problem is on the NFS client: after rebbot of the client the first 
run is ok, then all the rest are bad. When the server was rebooted, it 
didn't help and the results stayed bad.

Roch replied :
 I'd hypothesize that when the client doesn't know about a file he
 just gets the data and boom. But once he's got a cached copy
 he needs more time to figure out if the data is up to date.

 This seems to have been a tradeoff of metadata operations in favor of
 faster data op (!?).
 
 Note also that SFS doesn't use the client's NFS code. It
 runs it's own user space client.

The fact that the described problem is 100%-NFS-client-problem, there is 
nothing to do with ZFS code to improve the situtaion.
And the SFS problem we observed (see the first message in this thread) has 
nothing common with this one. Unfortunately, the abnormal behavior of NFS/ZFS 
during an SFS test didn't get much attention so I don't have any clue. Anyway, 
I'll update this thread when I have more information on the problem.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Samba ACLs en ZFS

2007-02-21 Thread Ed Plese

On Wed, Feb 21, 2007 at 11:21:27AM +0100, Rodrigo Ler?a wrote:
 If I don't use any special ACL with Samba and ZFS, only each user can
 write and read from his home directory. I am affected with the
 incompatibility?

Samba runs as the requesting user during file access.  Because of that,
any file permissions or ACLs are respected even if Samba doesn't have
support for the ACLs.  The main thing that Samba support for ZFS ACLs
will bring is the ability to view and set the ACLs from a Windows client
and in particular through the normal Windows ACL GUI.


Ed Plese
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] suggestion: directory promotion to filesystem

2007-02-21 Thread Adrian Saul

Not sure how technically feasible it is, but something I thought of while 
shuffling some files around my home server.  My poor understanding of ZFS 
internals is that the entire pool is effectivly a tree structure, with nodes 
either being data or metadata.  Given that, couldnt ZFS just change a directory 
node to a filesystem with little effort, allowing me do everything ZFS does 
with filesystems on a subset of my filesystem :)

Say you have some filesystems you created early on before you had a good idea 
of usage.  Say for example I made a large share filesystem and started filling 
it up with photos and movies and some assorted downloads.  A few months later I 
realise it would be so much nicer to be able to snapshot my movies and photos 
seperatly for backups, instead of doing the whole share.

Not hard to work around - zfs create and a mv/tar command and it is done... 
some time later.  If there was say  a zfs graft directory newfs command, 
you could just break of the directory as a new filesystem and away you go - no 
copying, no risking cleaning up the wrong files etc.

Corollary - zfs merge - take a filesystem and merge it into an existing 
filesystem.

Just a thought - any comments welcome.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] suggestion: directory promotion to filesystem

2007-02-21 Thread Sanjeev Bagewadi

Adrian,

Seems like a cool idea to me :-) Not sure if there is anything of this
kind being thought about...

Would be a good idea to file an RFE.

Regards,
Sanjeev

Adrian Saul wrote:

Not sure how technically feasible it is, but something I thought of while
shuffling some files around my home server. My poor understanding of ZFS
internals is that the entire pool is effectivly a tree structure, with nodes
either being data or metadata. Given that, couldnt ZFS just change a directory
node to a filesystem with little effort, allowing me do everything ZFS does
with filesystems on a subset of my filesystem :)

Say you have some filesystems you created early on before you had a good idea
of usage. Say for example I made a large share filesystem and started filling
it up with photos and movies and some assorted downloads. A few months later I
realise it would be so much nicer to be able to snapshot my movies and photos
seperatly for backups, instead of doing the whole share.

Not hard to work around - zfs create and a mv/tar command and it is done... some time later. If
there was say a zfs graft directory newfs command, you could just break
of the directory as a new filesystem and away you go - no copying, no risking cleaning up the wrong
files etc.

Corollary - zfs merge - take a filesystem and merge it into an existing
filesystem.

Just a thought - any comments welcome.

This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Samba ACLs en ZFS

2007-02-21 Thread Rodrigo Lería


Thank you very much for your answer. It is very userful for me.

Thank you

2007/2/21, Ed Plese [EMAIL PROTECTED]:

On Wed, Feb 21, 2007 at 11:21:27AM +0100, Rodrigo Ler?a wrote:
 If I don't use any special ACL with Samba and ZFS, only each user can
 write and read from his home directory. I am affected with the
 incompatibility?

Samba runs as the requesting user during file access.  Because of that,
any file permissions or ACLs are respected even if Samba doesn't have
support for the ACLs.  The main thing that Samba support for ZFS ACLs
will bring is the ability to view and set the ACLs from a Windows client
and in particular through the normal Windows ACL GUI.


Ed Plese




--

Rodrigo Lería
http://www.preparatuviaje.com

*
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-02-21 Thread Valery Fouques

  The ability to shrink a pool by removing devices is the only reason my
  enterprise is not yet using ZFS, simply because it prevents us from
  easily migrating storage.
 
 That logic is totally bogus AFAIC. There are so many advantages to
 running ZFS that denying yourself that opportunity is very short sighted -
 especially when there are lots of ways of working around this minor
 feature deficiency.

I cannot let you say that.
Here in my company we are very interested in ZFS, but we do not care about the 
RAID/mirror features, because we already have a SAN with RAID-5 disks, and dual 
fabric connection to the hosts.

We would have migrated already if we could simply migrate data from a storage 
array to another (which we do more often than you might think).

Currently we use (and pay for) VXVM, here is how we do a migration:
1/ Allocate disks from the new array, visible by the host.
2/ Add the disks in the diskgroup.
3/ Run vxevac to evacuate data from old disks.
4/ Remove old disks from the DG.

If you explain how to do that with ZFS, no downtime, and new disks with 
different capacities, you're my hero ;-)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-02-21 Thread Rich Teer

On Wed, 21 Feb 2007, Valery Fouques wrote:

 Here in my company we are very interested in ZFS, but we do not care
 about the RAID/mirror features, because we already have a SAN with
 RAID-5 disks, and dual fabric connection to the hosts.

... And presumably you've read the threads where ZFS has helped find
(and repair) corruption in such setups?

(But yeah, I agree the ability to shrink a pool is important.)

-- 
Rich Teer, SCSA, SCNA, SCSECA, OpenSolaris CAB member

President,
Rite Online Inc.

Voice: +1 (250) 979-1638
URL: http://www.rite-group.com/rich
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-02-21 Thread Casper . Dik


I cannot let you say that.
Here in my company we are very interested in ZFS, but we do not care
about the RAID/mirror features, because we already have a SAN with
RAID-5 disks, and dual fabric connection to the hosts.

But you understand that these underlying RAID mechanism give absolutely
no guarantee about data integrity but only that some data was found were
some (possibly other) data was written?  (RAID5 never verifies the
checkum is correct on reads; it only uses it to reconstruct data when
reads fail)

Casper
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] suggestion: directory promotion to filesystem

2007-02-21 Thread Darren Dunham

 Not sure how technically feasible it is, but something I thought of
 while shuffling some files around my home server.  My poor
 understanding of ZFS internals is that the entire pool is effectivly a
 tree structure, with nodes either being data or metadata.  Given that,
 couldnt ZFS just change a directory node to a filesystem with little
 effort, allowing me do everything ZFS does with filesystems on a
 subset of my filesystem :)

 Not hard to work around - zfs create and a mv/tar command and it is
 done... some time later.  If there was say a zfs graft directory
 newfs command, you could just break of the directory as a new
 filesystem and away you go - no copying, no risking cleaning up the
 wrong files etc.

I think there are some details in the tree that keep you from simply
splitting them off immediately.  

Some of the issues were discussed a while back.  I don't know if anyone
has tried to work on it or talked about alternative solutions.

See also:
http://www.opensolaris.org/jive/thread.jspa?messageID=28262
http://bugs.opensolaris.org/view_bug.do?bug_id=6400399

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Replacing a drive using ZFS

2007-02-21 Thread Matt Cohen

We have a system with two drives in it, part UFS, part ZFS.  It's a software 
mirrored system with slices 0,1,3 setup as small UFS slices, and slice 4 on 
each drive being the ZFS slice.

One of the drives is failing and we need to replace it.

I just want to make sure I have the correct order of things before I do this.

This is our pool:
NAME  STATE READ WRITE CKSUM
mainpoolONLINE   0 0 0
  mirror  ONLINE   0 0 0
c0t0d0s4  ONLINE   0 0   243
c0t1d0s4  ONLINE   0 0 0

1)  zpool detach mainpool c0t0d0s4
2)  powerdown system, replace faulty drive
3)  reboot system, setup slices to match the current setup
4)  zpool add mainpool c0t0d0s4

This will add the new drive back into the mirrored pool and sync the new slice 
4 back into the mirror, correct?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Replacing a drive using ZFS

2007-02-21 Thread Dana H. Myers

Matt Cohen wrote:
 We have a system with two drives in it, part UFS, part ZFS.  It's a software 
 mirrored system with slices 0,1,3 setup as small UFS slices, and slice 4 on 
 each drive being the ZFS slice.
 
 One of the drives is failing and we need to replace it.
 
 I just want to make sure I have the correct order of things before I do this.
 
 This is our pool:
 NAME  STATE READ WRITE CKSUM
 mainpoolONLINE   0 0 0
   mirror  ONLINE   0 0 0
 c0t0d0s4  ONLINE   0 0   243
 c0t1d0s4  ONLINE   0 0 0
 
 1)  zpool detach mainpool c0t0d0s4
 2)  powerdown system, replace faulty drive
 3)  reboot system, setup slices to match the current setup
 4)  zpool add mainpool c0t0d0s4
^^^
I think you want to use 'zpool attach' here to create a two-way mirror,
right?

Dana
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Replacing a drive using ZFS

2007-02-21 Thread Cindy . Swearingen


Matt,

Generally, when a disk needs to be replaced, you replace the disk,
use the zpool replace command, and you're done...

This is only a little more complicated in your scenario below because
of the sharing the disk between ZFS and UFS.

Most disks are hot-pluggable so you generally don't need to shut down
the system to replace the disk, but only you know if your disks
are hot-pluggable. In addition, if the disk is shared between UFS
and ZFS contains important system files, then you might need
to bring the system down.

However, you don't need to use zpool detach or zpool add if you are
just replacing the disk.

The steps would look like this:

1. Shut down the system (if necessary)
2. Replace the faulty disk
3. Set up the slices on replacement disk as needed
4. Bring the system back up (if necessary)
5. Run this command:

# zpool replace mainpool c0t0d0s4

Let us know how it goes, particularly me, since I need to know if this
works as documented. :-)

Thanks,

Cindy
Matt Cohen wrote:

We have a system with two drives in it, part UFS, part ZFS.  It's a software 
mirrored system with slices 0,1,3 setup as small UFS slices, and slice 4 on 
each drive being the ZFS slice.

One of the drives is failing and we need to replace it.

I just want to make sure I have the correct order of things before I do this.

This is our pool:
NAME  STATE READ WRITE CKSUM
mainpoolONLINE   0 0 0
  mirror  ONLINE   0 0 0
c0t0d0s4  ONLINE   0 0   243
c0t1d0s4  ONLINE   0 0 0

1)  zpool detach mainpool c0t0d0s4
2)  powerdown system, replace faulty drive
3)  reboot system, setup slices to match the current setup
4)  zpool add mainpool c0t0d0s4

This will add the new drive back into the mirrored pool and sync the new slice 
4 back into the mirror, correct?
 
 
This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] suggestion: directory promotion to filesystem

2007-02-21 Thread Matthew Ahrens


Adrian Saul wrote:

Not hard to work around - zfs create and a mv/tar command and it is
done... some time later.  If there was say  a zfs graft directory
newfs command, you could just break of the directory as a new
filesystem and away you go - no copying, no risking cleaning up the
wrong files etc.


Yep, this idea was previously discussed on this list -- search for zfs 
split and see the following RFE:


6400399 want zfs split

zfs join was also discussed but I don't think it's especially feasible or 
useful.


--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Replacing a drive using ZFS

2007-02-21 Thread Richard Elling


Matt,
Also, since you only have two drives and are using software mirroring
for the UFS slices, you'll need to follow the proper procedures for the
software mirroring metadata replicas.  See the pertinent docs for details.
 -- richard

[EMAIL PROTECTED] wrote:

Matt,

Generally, when a disk needs to be replaced, you replace the disk,
use the zpool replace command, and you're done...

This is only a little more complicated in your scenario below because
of the sharing the disk between ZFS and UFS.

Most disks are hot-pluggable so you generally don't need to shut down
the system to replace the disk, but only you know if your disks
are hot-pluggable. In addition, if the disk is shared between UFS
and ZFS contains important system files, then you might need
to bring the system down.

However, you don't need to use zpool detach or zpool add if you are
just replacing the disk.

The steps would look like this:

1. Shut down the system (if necessary)
2. Replace the faulty disk
3. Set up the slices on replacement disk as needed
4. Bring the system back up (if necessary)
5. Run this command:

# zpool replace mainpool c0t0d0s4

Let us know how it goes, particularly me, since I need to know if this
works as documented. :-)

Thanks,

Cindy
Matt Cohen wrote:
We have a system with two drives in it, part UFS, part ZFS.  It's a 
software mirrored system with slices 0,1,3 setup as small UFS slices, 
and slice 4 on each drive being the ZFS slice.


One of the drives is failing and we need to replace it.

I just want to make sure I have the correct order of things before I 
do this.


This is our pool:
NAME  STATE READ WRITE CKSUM
mainpoolONLINE   0 0 0
  mirror  ONLINE   0 0 0
c0t0d0s4  ONLINE   0 0   243
c0t1d0s4  ONLINE   0 0 0

1)  zpool detach mainpool c0t0d0s4
2)  powerdown system, replace faulty drive
3)  reboot system, setup slices to match the current setup
4)  zpool add mainpool c0t0d0s4

This will add the new drive back into the mirrored pool and sync the 
new slice 4 back into the mirror, correct?
 
 
This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-02-21 Thread Frank Cusack


On February 21, 2007 4:43:34 PM +0100 [EMAIL PROTECTED] wrote:



I cannot let you say that.
Here in my company we are very interested in ZFS, but we do not care
about the RAID/mirror features, because we already have a SAN with
RAID-5 disks, and dual fabric connection to the hosts.


But you understand that these underlying RAID mechanism give absolutely
no guarantee about data integrity but only that some data was found were
some (possibly other) data was written?  (RAID5 never verifies the
checkum is correct on reads; it only uses it to reconstruct data when
reads fail)


um, I thought smarter arrays did that these days.  Of course it's not
end-to-end so the parity verification isn't as useful as it should be;
gigo.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-02-21 Thread Richard Elling


Valery Fouques wrote:

The ability to shrink a pool by removing devices is the only reason my
enterprise is not yet using ZFS, simply because it prevents us from
easily migrating storage.

That logic is totally bogus AFAIC. There are so many advantages to
running ZFS that denying yourself that opportunity is very short sighted -
especially when there are lots of ways of working around this minor
feature deficiency.


I cannot let you say that.
Here in my company we are very interested in ZFS, but we do not care about the 
RAID/mirror features, because we already have a SAN with RAID-5 disks, and dual 
fabric connection to the hosts.

We would have migrated already if we could simply migrate data from a storage 
array to another (which we do more often than you might think).

Currently we use (and pay for) VXVM, here is how we do a migration:


But you describe VxVM feature, not a file system feature.


1/ Allocate disks from the new array, visible by the host.
2/ Add the disks in the diskgroup.
3/ Run vxevac to evacuate data from old disks.
4/ Remove old disks from the DG.

If you explain how to do that with ZFS, no downtime, and new disks with 
different capacities, you're my hero ;-)


zpool replace old-disk new-disk
The caveat is that new-disk must be as big or bigger than old-disk.
This caveat is the core of the shrink problem
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-02-21 Thread Casper . Dik


On February 21, 2007 4:43:34 PM +0100 [EMAIL PROTECTED] wrote:

 I cannot let you say that.
 Here in my company we are very interested in ZFS, but we do not care
 about the RAID/mirror features, because we already have a SAN with
 RAID-5 disks, and dual fabric connection to the hosts.

 But you understand that these underlying RAID mechanism give absolutely
 no guarantee about data integrity but only that some data was found were
 some (possibly other) data was written?  (RAID5 never verifies the
 checkum is correct on reads; it only uses it to reconstruct data when
 reads fail)

um, I thought smarter arrays did that these days.  Of course it's not
end-to-end so the parity verification isn't as useful as it should be;
gigo.

Generate extra I/O and verify parity, is that not something that may
be a problem in performance benchmarking?

For mirroring, a similar problem exists, of course.  ZFS reads from the
right side of the mirror and corrects the wrong side if it finds an
error.  RAIDs do not.

Casper
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-02-21 Thread Frank Cusack

On February 21, 2007 10:55:43 AM -0800 Richard Elling 
[EMAIL PROTECTED] wrote:

Valery Fouques wrote:

The ability to shrink a pool by removing devices is the only reason my
enterprise is not yet using ZFS, simply because it prevents us from
easily migrating storage.

That logic is totally bogus AFAIC. There are so many advantages to
running ZFS that denying yourself that opportunity is very short
sighted - especially when there are lots of ways of working around this
minor feature deficiency.


I cannot let you say that.
Here in my company we are very interested in ZFS, but we do not care
about the RAID/mirror features, because we already have a SAN with
RAID-5 disks, and dual fabric connection to the hosts.

We would have migrated already if we could simply migrate data from a
storage array to another (which we do more often than you might think).

Currently we use (and pay for) VXVM, here is how we do a migration:


But you describe VxVM feature, not a file system feature.


But in the context of zfs, this is appropriate.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: Perforce on ZFS

2007-02-21 Thread Gary Gendel

Perforce is based upon berkely db (some early version), so standard database 
XXX on ZFS techniques are relevant. For example, putting the journal file on a 
different disk than the table files. There are several threads about optimizing 
databases under ZFS.

If you need a screaming perforce server, talk to IC Manage, Inc. who is a VAR 
of Perforce. They also have added the ability to do remote replication, etc. so 
you can can have servers local to the end users in an enterprise environment.

It seems to me that the network is usually the limiting factor in Perforce 
transactions, though operations like fstat and have shouldn't be overused 
because they are very table taxing. Later Perforce versions have reduced the 
amount of table and record locking that goes on so you might find improvement 
just by upgrading both servers and clients (the server operations downgrade to 
match the version of the client).

All this said, I'd love to see experiments done with perforce on ZFS. It would 
help us all tune ZFS for these kinds of applications.

Gary
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] suggestion: directory promotion to filesystem

2007-02-21 Thread Pawel Jakub Dawidek

On Wed, Feb 21, 2007 at 10:11:43AM -0800, Matthew Ahrens wrote:
 Adrian Saul wrote:
 Not hard to work around - zfs create and a mv/tar command and it is
 done... some time later.  If there was say  a zfs graft directory
 newfs command, you could just break of the directory as a new
 filesystem and away you go - no copying, no risking cleaning up the
 wrong files etc.
 
 Yep, this idea was previously discussed on this list -- search for zfs 
 split and see the following RFE:
 
 6400399 want zfs split
 
 zfs join was also discussed but I don't think it's especially feasible or 
 useful.

'zfs join' can be hard because of inode number collisions, but may be
useful. Imagine a situation that you have the following file systems:

/tank
/tank/foo
/tank/bar

and you want to move huge amount of data from /tank/foo to /tank/bar.
If you use mv/tar/dump it will copy entire data. Much faster will be to
'zfs join tank tank/foo  zfs join tank tank/bar' then just mv the data
and 'zfs split' them back:)

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpU7idVrPav6.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Another paper

2007-02-21 Thread Gregory Shaw

Below is another paper on drive failure analysis, this one won best  
paper at usenix:


http://www.usenix.org/events/fast07/tech/schroeder/schroeder_html/ 
index.html


What I found most interesting was the idea that drives don't fail  
outright most of the time.   They can slow down operations, and  
slowly die.


With this behavior in mind, I had an idea for a new feature in ZFS:

	If a disk fitness test were available to verify disk read/write and  
performance, future drive problems could be avoided.


Some example tests:
- full disk read
- 8kb r/w iops
- 1mb r/w iops
- raw throughput

	Since one disk may be different than others, I thought a comparison  
between two presumably similar disks would be useful.


The command would be something like:
zpool dft c1t0d0 c1t1d0

Or:
zpool dft all

	I think this would be a great feature, as only zfs can do fitness  
tests on live running disks behind the scenes.


	With the ability to compare individual disk performance, not only  
will you find bad disks, it's entirely possible you'll find mis- 
configurations (such as bad connections) as well.


	And yes, I do know about SMART.   SMART can pre-indicate a disk  
failure.  However, I've run SMART on drives with bearings that were  
gravel that passed smart, even though I knew the 10k drive was  
running at about 3k rpm due to the bearings.


-
Gregory Shaw, IT Architect
Phone: (303) 272-8817 (x78817)
ITCTO Group, Sun Microsystems Inc.
500 Eldorado Blvd, UBRM02-157   [EMAIL PROTECTED] (work)
Broomfield, CO 80021  [EMAIL PROTECTED] (home)
When Microsoft writes an application for Linux, I've Won. - Linus  
Torvalds




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Need performance data

2007-02-21 Thread Fred Zlotnick


Hi ZFS'ers,

We're putting together an internal ZFS performance document and
could use your experiences.  If you have ZFS performance data to
share please send it to me.  I'm looking for good news or bad,
whatever your actual experience is.  Specific quantitative data
is most useful.  (It seems faster than VxFS on my widgeebot project
is interesting - but not very.)  If you send any data, please be sure
to be specific about the version of ZFS (such as S10 6/06 unpatched,
Solaris Nevada build 56,  S10 11/06 with the following patches, etc.)
that you used to get those results.  Also please be as specific as
possible about the hardware and the application environment.

Inside the ZFS team we're working hard on a number of features, and
particularly on performance.  More information can only help us.

Please reply directly to me, and let me know if you are willing to
let me share your information - shrouded or attributed - in a
summary to this alias at a later date.

Thanks,
Fred Zlotnick
[EMAIL PROTECTED]
Director, Solaris Data Technology

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Another paper

2007-02-21 Thread Richard Elling


Gregory Shaw wrote:
Below is another paper on drive failure analysis, this one won best 
paper at usenix:


http://www.usenix.org/events/fast07/tech/schroeder/schroeder_html/index.html

What I found most interesting was the idea that drives don't fail 
outright most of the time.   They can slow down operations, and slowly die.


Yes, this is what my data shows, too.  You are most likely to see an
unrecoverable read which leads to a retry (slow response symptom).


With this behavior in mind, I had an idea for a new feature in ZFS:

If a disk fitness test were available to verify disk read/write and 
performance, future drive problems could be avoided.


Some example tests:
- full disk read
- 8kb r/w iops
- 1mb r/w iops
- raw throughput


Some problems can be seen by doing a simple sequential read and comparing
it to historical data.  It depends on the failure mode, though.

Since one disk may be different than others, I thought a comparison 
between two presumably similar disks would be useful.


The command would be something like:
zpool dft c1t0d0 c1t1d0

Or:
zpool dft all

I think this would be a great feature, as only zfs can do fitness tests 
on live running disks behind the scenes.


I like the concept, but don't see why ZFS would be required.

With the ability to compare individual disk performance, not only will 
you find bad disks, it's entirely possible you'll find 
mis-configurations (such as bad connections) as well.


A few years ago we looked at unusual changes in response time as a
leading indicator, but I don't recall the details as to why we dropped
the effort.  Perhaps we should take a look again?

And yes, I do know about SMART.   SMART can pre-indicate a disk 
failure.  However, I've run SMART on drives with bearings that were 
gravel that passed smart, even though I knew the 10k drive was running 
at about 3k rpm due to the bearings.   


ditto.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Another paper

2007-02-21 Thread Gregory Shaw



On Feb 21, 2007, at 4:59 PM, Richard Elling wrote:


With this behavior in mind, I had an idea for a new feature in ZFS:
If a disk fitness test were available to verify disk read/write  
and performance, future drive problems could be avoided.

Some example tests:
- full disk read
- 8kb r/w iops
- 1mb r/w iops
- raw throughput


Some problems can be seen by doing a simple sequential read and  
comparing

it to historical data.  It depends on the failure mode, though.



I agree.  Having this feature could provide that history.

Since one disk may be different than others, I thought a  
comparison between two presumably similar disks would be useful.

The command would be something like:
zpool dft c1t0d0 c1t1d0
Or:
zpool dft all
I think this would be a great feature, as only zfs can do fitness  
tests on live running disks behind the scenes.


I like the concept, but don't see why ZFS would be required.



I'm thinking of production systems.  Since you can't evacuate the  
disk, ZFS can do read/write tests on unused portion of the disk.   I  
don't think that would be possible via another solution, such as SVM/ 
UFS.


With the ability to compare individual disk performance, not only  
will you find bad disks, it's entirely possible you'll find mis- 
configurations (such as bad connections) as well.


A few years ago we looked at unusual changes in response time as a
leading indicator, but I don't recall the details as to why we dropped
the effort.  Perhaps we should take a look again?



More information is good in my book.   Anything that can tell me that  
things-aren't-quite-right is more uptime that can be provided.


And yes, I do know about SMART.   SMART can pre-indicate a disk  
failure.  However, I've run SMART on drives with bearings that  
were gravel that passed smart, even though I knew the 10k drive  
was running at about 3k rpm due to the bearings.


ditto.
 -- richard


-
Gregory Shaw, IT Architect
Phone: (303) 272-8817 (x78817)
ITCTO Group, Sun Microsystems Inc.
500 Eldorado Blvd, UBRM02-157   [EMAIL PROTECTED] (work)
Broomfield, CO 80021  [EMAIL PROTECTED] (home)
When Microsoft writes an application for Linux, I've Won. - Linus  
Torvalds




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] suggestion: directory promotion to filesystem

2007-02-21 Thread Spencer Shepler



On Feb 21, 2007, at 12:11 PM, Matthew Ahrens wrote:


Adrian Saul wrote:

Not hard to work around - zfs create and a mv/tar command and it is
done... some time later.  If there was say  a zfs graft directory
newfs command, you could just break of the directory as a new
filesystem and away you go - no copying, no risking cleaning up the
wrong files etc.


Yep, this idea was previously discussed on this list -- search for  
zfs split and see the following RFE:


6400399 want zfs split



Note that current draft specification for NFSv4.1 has the capability
to split a filesystem such that the NFSv4.1 client will recognize it.
Then the new filesystem can be migrated to another server is needed.

Spencer

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Another paper

2007-02-21 Thread Eric Schrock

On Wed, Feb 21, 2007 at 03:35:06PM -0700, Gregory Shaw wrote:
 Below is another paper on drive failure analysis, this one won best  
 paper at usenix:
 
 http://www.usenix.org/events/fast07/tech/schroeder/schroeder_html/ 
 index.html
 
 What I found most interesting was the idea that drives don't fail  
 outright most of the time.   They can slow down operations, and  
 slowly die.

Seems like there are a two pieces you're suggesting here:

1. Some sort of background process to proactively find errors on disks
   in use by ZFS.  This will be accomplished by a background scrubbing
   option, dependent on the block-rewriting work Matt and Mark are
   working on.  This will allow something like zpool set scrub=2weeks,
   which will tell ZFS to scrub my data at an interval such that all
   data is touched over a 2 week period.  This will test reading from
   every block and verifying checksums.  Stressing write failures is a
   little more difficult.

2. Distinguish slow drives from normal drives and proactively mark
   them faulted.  This shouldn't require an explicit zpool dft, as
   we should be watching the response times of the various drives and
   keep this as a statistic.  We want to incorporate this information
   to allow better allocation amongst slower and faster drives.
   Determining that a drive is abnormally slow is much more difficult,
   though it could theoretically be done if we had some basis - either
   historical performance for the same drive or comparison to identical
   drives (manufacturer/model) within the pool.  While we've thought
   about these same issues, there is currently no active effort to keep
   track of these statistics or do anything with them.

These two things combined should avoid the need for an explicit fitness
test.

Hope that helps,

- Eric

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: ZFS failed Disk Rebuild time on x4500

2007-02-21 Thread Richard Elling


Nissim Ben Haim wrote:
I was asked by a customer considering the x4500 - how much time should 
it take to rebuild a failed Disk under RaidZ ?
This question keeps popping because customers perceive software RAID as 
substantially inferior to HW raids.
I could not find someone who has really measured this under several 
scenarios.


It is a function of the amount of space used.  As space used - 0, it
becomes infinitely fast.  As space used - 100% is approaches the speed
of the I/O subsystem.  In my experience, no hardware RAID array comes
close, they all throttle the resync, though some of them allow you to
tune it a little bit.  The key advantage over a hardware RAID system is
that ZFS knows where the data is and doesn't need to replicate unused
space.  A hardware RAID array doesn't know anything about the data, so
it must reconstruct the entire disk.

Also, the reconstruction is done in time order.  See Jeff Bonwick's blog:
http://blogs.sun.com/bonwick/entry/smokin_mirrors

I've measured resync on some slow IDE disks (*not* an X4500) at an average
of 20 MBytes/s.  So if you have a 500 GByte drive, that would resync a 100%
full file system in about 7 hours versus 11 days for some other systems
(who shall remain nameless :-)
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Another paper

2007-02-21 Thread Gregory Shaw



On Feb 21, 2007, at 5:20 PM, Eric Schrock wrote:


On Wed, Feb 21, 2007 at 03:35:06PM -0700, Gregory Shaw wrote:

Below is another paper on drive failure analysis, this one won best
paper at usenix:

http://www.usenix.org/events/fast07/tech/schroeder/schroeder_html/
index.html

What I found most interesting was the idea that drives don't fail
outright most of the time.   They can slow down operations, and
slowly die.


Seems like there are a two pieces you're suggesting here:

1. Some sort of background process to proactively find errors on disks
   in use by ZFS.  This will be accomplished by a background scrubbing
   option, dependent on the block-rewriting work Matt and Mark are
   working on.  This will allow something like zpool set  
scrub=2weeks,

   which will tell ZFS to scrub my data at an interval such that all
   data is touched over a 2 week period.  This will test reading from
   every block and verifying checksums.  Stressing write failures is a
   little more difficult.



I was thinking of something similar to a scrub.   An ongoing process  
seemed too intrusive.  I'd envisioned a cron job similar to a scrub  
(or defrag) that could be run periodically to show any differences  
between disk performance over time.



2. Distinguish slow drives from normal drives and proactively mark
   them faulted.  This shouldn't require an explicit zpool dft, as
   we should be watching the response times of the various drives and
   keep this as a statistic.  We want to incorporate this information
   to allow better allocation amongst slower and faster drives.
   Determining that a drive is abnormally slow is much more  
difficult,

   though it could theoretically be done if we had some basis - either
   historical performance for the same drive or comparison to  
identical

   drives (manufacturer/model) within the pool.  While we've thought
   about these same issues, there is currently no active effort to  
keep

   track of these statistics or do anything with them.



I thought this would be very difficult to determine, as a slow disk  
could be a transient problem.


Me, I like tools that give me information I can work with.   Fully  
automated systems always seem to cause more problems than they solve.


For instance, if I have a drive on a pc using a shared ide bus, is it  
the disk that is slow, or the connection method?   It's obviously the  
second, but finding that programatically will be very difficult.


I like the idea of a dft for testing a disk in a subjective manner.   
One benefit of this could be an objective performance test baseline  
for disks and arrays.


Btw, it does help.  :-)

These two things combined should avoid the need for an explicit  
fitness

test.

Hope that helps,

- Eric

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/ 
eschrock

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


-
Gregory Shaw, IT Architect
Phone: (303) 272-8817 (x78817)
ITCTO Group, Sun Microsystems Inc.
500 Eldorado Blvd, UBRM02-157   [EMAIL PROTECTED] (work)
Broomfield, CO 80021  [EMAIL PROTECTED] (home)
When Microsoft writes an application for Linux, I've Won. - Linus  
Torvalds




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Another paper

2007-02-21 Thread Nicholas Lee


On 2/22/07, Gregory Shaw [EMAIL PROTECTED] wrote:



I was thinking of something similar to a scrub.   An ongoing process
seemed too intrusive.  I'd envisioned a cron job similar to a scrub (or
defrag) that could be run periodically to show any differences between disk
performance over time.



...

I thought this would be very difficult to determine, as a slow disk could be

a transient problem.

Me, I like tools that give me information I can work with.   Fully
automated systems always seem to cause more problems than they solve.



If the stats are publishable,  then something like cacti or any monitoring
tool should provide most admins with enough tools to spot potential issues.

Nicholas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Re: SPEC SFS benchmark of NFS/ZFS/B56 - please help to improve it!

2007-02-21 Thread Matthew Ahrens


Leon Koll wrote:

The fact that the described problem is 100%-NFS-client-problem, there
is nothing to do with ZFS code to improve the situtaion.


You may want to see if the folks over at [EMAIL PROTECTED] 
have any ideas on your NFS problem.


--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Exporting zvol properties to .zfs

2007-02-21 Thread Matthew Ahrens


Dale Ghent wrote:
but it got me thinking about how things such as the current 
compression ratio for a volume could be indicated over a otherwise 
ZFS-agnostic NFS export. The .zfs snapdir came to mind. Perhaps ZFS 
could maintain a special file under there, called compressratio for 
example, and a remote client could cat it or whatever to be aware of how 
volume compression factors into their space usage.


Yeah, it would be cool to be able to access (read-only, at least) the 
zfs property settings over nfs via .zfs/props or some such.  Filed RFE:


6527390 want to read zfs properties over nfs (eg via .zfs/props)

--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Another paper

2007-02-21 Thread TJ Easter


All,
   I think dtrace could be a viable option here.  crond to run a
dtrace script on a regular basis that times a series of reads and then
provides that info to Cacti or rrdtool.   It's not quite the
one-size-fits-all that the OP was looking for, but if you want trends,
this should get 'em.

$0.02

Regards,
TJ Easter

On 2/21/07, Nicholas Lee [EMAIL PROTECTED] wrote:


On 2/22/07, Gregory Shaw [EMAIL PROTECTED] wrote:



 I was thinking of something similar to a scrub.   An ongoing process
seemed too intrusive.  I'd envisioned a cron job similar to a scrub (or
defrag) that could be run periodically to show any differences between disk
performance over time.

...



 I thought this would be very difficult to determine, as a slow disk could
be a transient problem.


 Me, I like tools that give me information I can work with.   Fully
automated systems always seem to cause more problems than they solve.

If the stats are publishable,  then something like cacti or any monitoring
tool should provide most admins with enough tools to spot potential issues.

Nicholas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss





--
Being a humanist means trying to behave decently without expectation
of rewards or punishment after you are dead.  -- Kurt Vonnegut
http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0x31185D8E
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Another paper

2007-02-21 Thread Wee Yeh Tan


Correct me if I'm wrong but fma seems like a more appropriate tool to
track disk errors.


--
Just me,
Wire ...

On 2/22/07, TJ Easter [EMAIL PROTECTED] wrote:

All,
I think dtrace could be a viable option here.  crond to run a
dtrace script on a regular basis that times a series of reads and then
provides that info to Cacti or rrdtool.   It's not quite the
one-size-fits-all that the OP was looking for, but if you want trends,
this should get 'em.

$0.02

Regards,
TJ Easter

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS vs UFS performance Using Different Raid Configurations

2007-02-21 Thread Durga Deep

Since most of our customers are predominantly UFS based, we would like to use 
the same configuration and compare ZFS performance, so that we can announce 
support for ZFS.

We're planning on measuring the performance of a ZFS file system vs UFS file 
system.
Please look at the following scenario and let us know if this is a good 
performance measurement criterion. 
 Also I read in the ZFS Administration guide that the  
You can construct Logical Devices for ZFS using volumes presented by 
software-based volume managers, such as Solaris Volume Manager or Veritas 
Volume ManagerVxVm. These configurations are not recommended. ZFS might work 
properly on such devices, but less-than-optimal performance might be the 
result.. so with this in mind. 

ZFS vs UFS/SVM:

UFS File systems are created using SVM
ZFS File systems are created using the disks.

So using the same disks eg c0t0d0 / c0t1d0 / c0t2d0 / c0t3d0

1) Create STRIPE 
2) Create Mirror 
3) Create RAID-5

And run bunch of performance tests. We're using SWAT for measuring I/O.

-D
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

37 matches

Mail list logo