Re[2]: [zfs-discuss] Google paper on disk reliability
Hello Jesus, Wednesday, February 21, 2007, 5:54:35 AM, you wrote: JC -BEGIN PGP SIGNED MESSAGE- JC Hash: SHA1 JC Joerg Schilling wrote: What they missed to say is that you need to access the whole disk frequently enough in order to give SMART the ability to work. JC I thought modern disks could be instructed to do offline scanning, JC using any idle time available. it was mentioned also in the paper -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Perforce on ZFS
So Jonathan, you have a concern about the on-disk space efficiency for small file (more or less subsector). It is a problem that we can throw rust at. I am not sure if this is the basis of Claude's concern though. Creating small files, last week I did a small test. With ZFS I can create 4600 files _and_ sync up the pool to disk and saw no more than 500 I/Os. I'm no FS expert but this looks absolutely amazing to me (ok, I'm rather enthousiastic in general). Logging UFS needs 1 I/O per file (so ~10X more for my test). I don't know where other filesystems are on that metric. I also pointed out that ZFS is not too CPU efficient at tiny write(2) syscalls. But this inefficiency rescinds around 8K writes. This here is a CPU benchmark (I/O is non-factor) : CHUNK ZFS vz UFS 1B 4X slower 1K 2X slower 8K 25% slower 32K equal 64K 30% faster Waiting for a more specific problem statement, I can only stick to what I said, I know of no small file problems with ZFS; If there is one, I'd just like to see the data. -r ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Samba ACLs en ZFS
Thank you gro your answers, but I have another question. If I don't use any special ACL with Samba and ZFS, only each user can write and read from his home directory. I am affected with the incompatibility? Thank you again. Rod 2007/2/19, Eric Enright [EMAIL PROTECTED]: On 2/19/07, Rod [EMAIL PROTECTED] wrote: Are now supported Samba Acls with ZFS? I am looking for information on the release notes of 3.0.24 version Samba, but I can't see anything about ZFS and ACLs. Does anybody knows something? It's not there yet. I spent some time looking at this a few weeks ago, and last I looked there was a Sun engineer on the SFW team working on ZFS ACL support, who said he'd have something in two or three weeks. That was several weeks ago, and I haven't looked into it beyond a quick glance since. One thing I did try out was loopback mounting the filesystem via NFS and exporting /that/ with Samba, which seemed to work fine as far as getting/setting ACLs via Explorer. That is clearly not an optimal solution, however, and I decided that I could live with the real permissions being invisible. -- Eric Enright -- Rodrigo Lería http://www.preparatuviaje.com * ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: SPEC SFS benchmark of NFS/ZFS/B56 - please help to improve it!
More detailed description of readdir test and conclusion at the end: Roch asked me: Is this a NFS V3 or V4 test or don't care ? I am running NFS V3 but the short test of NFS V4 showed that the problem is there. Then Roch asked: I've run rdir on a few of my large directories, However my large directories are not much larger than ncsize, maybe your's are. Do I understand that you hit the issue only upon first large rdir after reboot ? After reboot of the NFS client (see below). Then Roch added: If so, it might me that we get a speedup from the part of the run in which we are initially filling the dnlc cache. That could explain thge increase in sys time. But the real time increase seems too much to be due to this. Anyway I'm interested in the directory size rdir reports and the ncsize/D from mdb -k. Also a third pass through might yield a lead. -r ncsize has a default value. People told me don't increase dnlc size when running ZFS. # echo 'ncsize/D' | mdb -k ncsize: ncsize: 129675 Directory size? There are 160 ZFS'es under zpool tank1, each ZFS is 202MB, total 31.5GB, 1224000 files # zpool list NAMESIZEUSED AVAILCAP HEALTH ALTROOT tank1 382G 31.5G351G 8% ONLINE - More detailed results: ZFS local runs - normal behavior: 1. 2:33.406 2. 2:25.353 3. 2:27.033 NFS V3/ZFS runs - first is ok, then jumped up: 1. 3:14.185 2. 4:47.681 3. 4:52.213 4. 4:49.841 5. 4:53.069 6. 4:45.290 after reboot of the NFS client: 1. 2:56.760 2. 4:43.397 after reboot of both client and server: 1.real 3:12.841 2.real 4:50.869 after reboot of the NFS server only: 1. 5:15.048 2. 4:54.686 3. 4:48.713 It means the problem is on the NFS client: after rebbot of the client the first run is ok, then all the rest are bad. When the server was rebooted, it didn't help and the results stayed bad. Roch replied : I'd hypothesize that when the client doesn't know about a file he just gets the data and boom. But once he's got a cached copy he needs more time to figure out if the data is up to date. This seems to have been a tradeoff of metadata operations in favor of faster data op (!?). Note also that SFS doesn't use the client's NFS code. It runs it's own user space client. The fact that the described problem is 100%-NFS-client-problem, there is nothing to do with ZFS code to improve the situtaion. And the SFS problem we observed (see the first message in this thread) has nothing common with this one. Unfortunately, the abnormal behavior of NFS/ZFS during an SFS test didn't get much attention so I don't have any clue. Anyway, I'll update this thread when I have more information on the problem. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Samba ACLs en ZFS
On Wed, Feb 21, 2007 at 11:21:27AM +0100, Rodrigo Ler?a wrote: If I don't use any special ACL with Samba and ZFS, only each user can write and read from his home directory. I am affected with the incompatibility? Samba runs as the requesting user during file access. Because of that, any file permissions or ACLs are respected even if Samba doesn't have support for the ACLs. The main thing that Samba support for ZFS ACLs will bring is the ability to view and set the ACLs from a Windows client and in particular through the normal Windows ACL GUI. Ed Plese ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] suggestion: directory promotion to filesystem
Not sure how technically feasible it is, but something I thought of while shuffling some files around my home server. My poor understanding of ZFS internals is that the entire pool is effectivly a tree structure, with nodes either being data or metadata. Given that, couldnt ZFS just change a directory node to a filesystem with little effort, allowing me do everything ZFS does with filesystems on a subset of my filesystem :) Say you have some filesystems you created early on before you had a good idea of usage. Say for example I made a large share filesystem and started filling it up with photos and movies and some assorted downloads. A few months later I realise it would be so much nicer to be able to snapshot my movies and photos seperatly for backups, instead of doing the whole share. Not hard to work around - zfs create and a mv/tar command and it is done... some time later. If there was say a zfs graft directory newfs command, you could just break of the directory as a new filesystem and away you go - no copying, no risking cleaning up the wrong files etc. Corollary - zfs merge - take a filesystem and merge it into an existing filesystem. Just a thought - any comments welcome. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] suggestion: directory promotion to filesystem
Adrian, Seems like a cool idea to me :-) Not sure if there is anything of this kind being thought about... Would be a good idea to file an RFE. Regards, Sanjeev Adrian Saul wrote: Not sure how technically feasible it is, but something I thought of while shuffling some files around my home server. My poor understanding of ZFS internals is that the entire pool is effectivly a tree structure, with nodes either being data or metadata. Given that, couldnt ZFS just change a directory node to a filesystem with little effort, allowing me do everything ZFS does with filesystems on a subset of my filesystem :) Say you have some filesystems you created early on before you had a good idea of usage. Say for example I made a large share filesystem and started filling it up with photos and movies and some assorted downloads. A few months later I realise it would be so much nicer to be able to snapshot my movies and photos seperatly for backups, instead of doing the whole share. Not hard to work around - zfs create and a mv/tar command and it is done... some time later. If there was say a zfs graft directory newfs command, you could just break of the directory as a new filesystem and away you go - no copying, no risking cleaning up the wrong files etc. Corollary - zfs merge - take a filesystem and merge it into an existing filesystem. Just a thought - any comments welcome. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Samba ACLs en ZFS
Thank you very much for your answer. It is very userful for me. Thank you 2007/2/21, Ed Plese [EMAIL PROTECTED]: On Wed, Feb 21, 2007 at 11:21:27AM +0100, Rodrigo Ler?a wrote: If I don't use any special ACL with Samba and ZFS, only each user can write and read from his home directory. I am affected with the incompatibility? Samba runs as the requesting user during file access. Because of that, any file permissions or ACLs are respected even if Samba doesn't have support for the ACLs. The main thing that Samba support for ZFS ACLs will bring is the ability to view and set the ACLs from a Windows client and in particular through the normal Windows ACL GUI. Ed Plese -- Rodrigo Lería http://www.preparatuviaje.com * ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: How much do we really want zpool remove?
The ability to shrink a pool by removing devices is the only reason my enterprise is not yet using ZFS, simply because it prevents us from easily migrating storage. That logic is totally bogus AFAIC. There are so many advantages to running ZFS that denying yourself that opportunity is very short sighted - especially when there are lots of ways of working around this minor feature deficiency. I cannot let you say that. Here in my company we are very interested in ZFS, but we do not care about the RAID/mirror features, because we already have a SAN with RAID-5 disks, and dual fabric connection to the hosts. We would have migrated already if we could simply migrate data from a storage array to another (which we do more often than you might think). Currently we use (and pay for) VXVM, here is how we do a migration: 1/ Allocate disks from the new array, visible by the host. 2/ Add the disks in the diskgroup. 3/ Run vxevac to evacuate data from old disks. 4/ Remove old disks from the DG. If you explain how to do that with ZFS, no downtime, and new disks with different capacities, you're my hero ;-) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?
On Wed, 21 Feb 2007, Valery Fouques wrote: Here in my company we are very interested in ZFS, but we do not care about the RAID/mirror features, because we already have a SAN with RAID-5 disks, and dual fabric connection to the hosts. ... And presumably you've read the threads where ZFS has helped find (and repair) corruption in such setups? (But yeah, I agree the ability to shrink a pool is important.) -- Rich Teer, SCSA, SCNA, SCSECA, OpenSolaris CAB member President, Rite Online Inc. Voice: +1 (250) 979-1638 URL: http://www.rite-group.com/rich ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?
I cannot let you say that. Here in my company we are very interested in ZFS, but we do not care about the RAID/mirror features, because we already have a SAN with RAID-5 disks, and dual fabric connection to the hosts. But you understand that these underlying RAID mechanism give absolutely no guarantee about data integrity but only that some data was found were some (possibly other) data was written? (RAID5 never verifies the checkum is correct on reads; it only uses it to reconstruct data when reads fail) Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] suggestion: directory promotion to filesystem
Not sure how technically feasible it is, but something I thought of while shuffling some files around my home server. My poor understanding of ZFS internals is that the entire pool is effectivly a tree structure, with nodes either being data or metadata. Given that, couldnt ZFS just change a directory node to a filesystem with little effort, allowing me do everything ZFS does with filesystems on a subset of my filesystem :) Not hard to work around - zfs create and a mv/tar command and it is done... some time later. If there was say a zfs graft directory newfs command, you could just break of the directory as a new filesystem and away you go - no copying, no risking cleaning up the wrong files etc. I think there are some details in the tree that keep you from simply splitting them off immediately. Some of the issues were discussed a while back. I don't know if anyone has tried to work on it or talked about alternative solutions. See also: http://www.opensolaris.org/jive/thread.jspa?messageID=28262 http://bugs.opensolaris.org/view_bug.do?bug_id=6400399 -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area This line left intentionally blank to confuse you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Replacing a drive using ZFS
We have a system with two drives in it, part UFS, part ZFS. It's a software mirrored system with slices 0,1,3 setup as small UFS slices, and slice 4 on each drive being the ZFS slice. One of the drives is failing and we need to replace it. I just want to make sure I have the correct order of things before I do this. This is our pool: NAME STATE READ WRITE CKSUM mainpoolONLINE 0 0 0 mirror ONLINE 0 0 0 c0t0d0s4 ONLINE 0 0 243 c0t1d0s4 ONLINE 0 0 0 1) zpool detach mainpool c0t0d0s4 2) powerdown system, replace faulty drive 3) reboot system, setup slices to match the current setup 4) zpool add mainpool c0t0d0s4 This will add the new drive back into the mirrored pool and sync the new slice 4 back into the mirror, correct? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Replacing a drive using ZFS
Matt Cohen wrote: We have a system with two drives in it, part UFS, part ZFS. It's a software mirrored system with slices 0,1,3 setup as small UFS slices, and slice 4 on each drive being the ZFS slice. One of the drives is failing and we need to replace it. I just want to make sure I have the correct order of things before I do this. This is our pool: NAME STATE READ WRITE CKSUM mainpoolONLINE 0 0 0 mirror ONLINE 0 0 0 c0t0d0s4 ONLINE 0 0 243 c0t1d0s4 ONLINE 0 0 0 1) zpool detach mainpool c0t0d0s4 2) powerdown system, replace faulty drive 3) reboot system, setup slices to match the current setup 4) zpool add mainpool c0t0d0s4 ^^^ I think you want to use 'zpool attach' here to create a two-way mirror, right? Dana ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Replacing a drive using ZFS
Matt, Generally, when a disk needs to be replaced, you replace the disk, use the zpool replace command, and you're done... This is only a little more complicated in your scenario below because of the sharing the disk between ZFS and UFS. Most disks are hot-pluggable so you generally don't need to shut down the system to replace the disk, but only you know if your disks are hot-pluggable. In addition, if the disk is shared between UFS and ZFS contains important system files, then you might need to bring the system down. However, you don't need to use zpool detach or zpool add if you are just replacing the disk. The steps would look like this: 1. Shut down the system (if necessary) 2. Replace the faulty disk 3. Set up the slices on replacement disk as needed 4. Bring the system back up (if necessary) 5. Run this command: # zpool replace mainpool c0t0d0s4 Let us know how it goes, particularly me, since I need to know if this works as documented. :-) Thanks, Cindy Matt Cohen wrote: We have a system with two drives in it, part UFS, part ZFS. It's a software mirrored system with slices 0,1,3 setup as small UFS slices, and slice 4 on each drive being the ZFS slice. One of the drives is failing and we need to replace it. I just want to make sure I have the correct order of things before I do this. This is our pool: NAME STATE READ WRITE CKSUM mainpoolONLINE 0 0 0 mirror ONLINE 0 0 0 c0t0d0s4 ONLINE 0 0 243 c0t1d0s4 ONLINE 0 0 0 1) zpool detach mainpool c0t0d0s4 2) powerdown system, replace faulty drive 3) reboot system, setup slices to match the current setup 4) zpool add mainpool c0t0d0s4 This will add the new drive back into the mirrored pool and sync the new slice 4 back into the mirror, correct? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] suggestion: directory promotion to filesystem
Adrian Saul wrote: Not hard to work around - zfs create and a mv/tar command and it is done... some time later. If there was say a zfs graft directory newfs command, you could just break of the directory as a new filesystem and away you go - no copying, no risking cleaning up the wrong files etc. Yep, this idea was previously discussed on this list -- search for zfs split and see the following RFE: 6400399 want zfs split zfs join was also discussed but I don't think it's especially feasible or useful. --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Replacing a drive using ZFS
Matt, Also, since you only have two drives and are using software mirroring for the UFS slices, you'll need to follow the proper procedures for the software mirroring metadata replicas. See the pertinent docs for details. -- richard [EMAIL PROTECTED] wrote: Matt, Generally, when a disk needs to be replaced, you replace the disk, use the zpool replace command, and you're done... This is only a little more complicated in your scenario below because of the sharing the disk between ZFS and UFS. Most disks are hot-pluggable so you generally don't need to shut down the system to replace the disk, but only you know if your disks are hot-pluggable. In addition, if the disk is shared between UFS and ZFS contains important system files, then you might need to bring the system down. However, you don't need to use zpool detach or zpool add if you are just replacing the disk. The steps would look like this: 1. Shut down the system (if necessary) 2. Replace the faulty disk 3. Set up the slices on replacement disk as needed 4. Bring the system back up (if necessary) 5. Run this command: # zpool replace mainpool c0t0d0s4 Let us know how it goes, particularly me, since I need to know if this works as documented. :-) Thanks, Cindy Matt Cohen wrote: We have a system with two drives in it, part UFS, part ZFS. It's a software mirrored system with slices 0,1,3 setup as small UFS slices, and slice 4 on each drive being the ZFS slice. One of the drives is failing and we need to replace it. I just want to make sure I have the correct order of things before I do this. This is our pool: NAME STATE READ WRITE CKSUM mainpoolONLINE 0 0 0 mirror ONLINE 0 0 0 c0t0d0s4 ONLINE 0 0 243 c0t1d0s4 ONLINE 0 0 0 1) zpool detach mainpool c0t0d0s4 2) powerdown system, replace faulty drive 3) reboot system, setup slices to match the current setup 4) zpool add mainpool c0t0d0s4 This will add the new drive back into the mirrored pool and sync the new slice 4 back into the mirror, correct? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?
On February 21, 2007 4:43:34 PM +0100 [EMAIL PROTECTED] wrote: I cannot let you say that. Here in my company we are very interested in ZFS, but we do not care about the RAID/mirror features, because we already have a SAN with RAID-5 disks, and dual fabric connection to the hosts. But you understand that these underlying RAID mechanism give absolutely no guarantee about data integrity but only that some data was found were some (possibly other) data was written? (RAID5 never verifies the checkum is correct on reads; it only uses it to reconstruct data when reads fail) um, I thought smarter arrays did that these days. Of course it's not end-to-end so the parity verification isn't as useful as it should be; gigo. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?
Valery Fouques wrote: The ability to shrink a pool by removing devices is the only reason my enterprise is not yet using ZFS, simply because it prevents us from easily migrating storage. That logic is totally bogus AFAIC. There are so many advantages to running ZFS that denying yourself that opportunity is very short sighted - especially when there are lots of ways of working around this minor feature deficiency. I cannot let you say that. Here in my company we are very interested in ZFS, but we do not care about the RAID/mirror features, because we already have a SAN with RAID-5 disks, and dual fabric connection to the hosts. We would have migrated already if we could simply migrate data from a storage array to another (which we do more often than you might think). Currently we use (and pay for) VXVM, here is how we do a migration: But you describe VxVM feature, not a file system feature. 1/ Allocate disks from the new array, visible by the host. 2/ Add the disks in the diskgroup. 3/ Run vxevac to evacuate data from old disks. 4/ Remove old disks from the DG. If you explain how to do that with ZFS, no downtime, and new disks with different capacities, you're my hero ;-) zpool replace old-disk new-disk The caveat is that new-disk must be as big or bigger than old-disk. This caveat is the core of the shrink problem -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?
On February 21, 2007 4:43:34 PM +0100 [EMAIL PROTECTED] wrote: I cannot let you say that. Here in my company we are very interested in ZFS, but we do not care about the RAID/mirror features, because we already have a SAN with RAID-5 disks, and dual fabric connection to the hosts. But you understand that these underlying RAID mechanism give absolutely no guarantee about data integrity but only that some data was found were some (possibly other) data was written? (RAID5 never verifies the checkum is correct on reads; it only uses it to reconstruct data when reads fail) um, I thought smarter arrays did that these days. Of course it's not end-to-end so the parity verification isn't as useful as it should be; gigo. Generate extra I/O and verify parity, is that not something that may be a problem in performance benchmarking? For mirroring, a similar problem exists, of course. ZFS reads from the right side of the mirror and corrects the wrong side if it finds an error. RAIDs do not. Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?
On February 21, 2007 10:55:43 AM -0800 Richard Elling [EMAIL PROTECTED] wrote: Valery Fouques wrote: The ability to shrink a pool by removing devices is the only reason my enterprise is not yet using ZFS, simply because it prevents us from easily migrating storage. That logic is totally bogus AFAIC. There are so many advantages to running ZFS that denying yourself that opportunity is very short sighted - especially when there are lots of ways of working around this minor feature deficiency. I cannot let you say that. Here in my company we are very interested in ZFS, but we do not care about the RAID/mirror features, because we already have a SAN with RAID-5 disks, and dual fabric connection to the hosts. We would have migrated already if we could simply migrate data from a storage array to another (which we do more often than you might think). Currently we use (and pay for) VXVM, here is how we do a migration: But you describe VxVM feature, not a file system feature. But in the context of zfs, this is appropriate. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: Perforce on ZFS
Perforce is based upon berkely db (some early version), so standard database XXX on ZFS techniques are relevant. For example, putting the journal file on a different disk than the table files. There are several threads about optimizing databases under ZFS. If you need a screaming perforce server, talk to IC Manage, Inc. who is a VAR of Perforce. They also have added the ability to do remote replication, etc. so you can can have servers local to the end users in an enterprise environment. It seems to me that the network is usually the limiting factor in Perforce transactions, though operations like fstat and have shouldn't be overused because they are very table taxing. Later Perforce versions have reduced the amount of table and record locking that goes on so you might find improvement just by upgrading both servers and clients (the server operations downgrade to match the version of the client). All this said, I'd love to see experiments done with perforce on ZFS. It would help us all tune ZFS for these kinds of applications. Gary This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] suggestion: directory promotion to filesystem
On Wed, Feb 21, 2007 at 10:11:43AM -0800, Matthew Ahrens wrote: Adrian Saul wrote: Not hard to work around - zfs create and a mv/tar command and it is done... some time later. If there was say a zfs graft directory newfs command, you could just break of the directory as a new filesystem and away you go - no copying, no risking cleaning up the wrong files etc. Yep, this idea was previously discussed on this list -- search for zfs split and see the following RFE: 6400399 want zfs split zfs join was also discussed but I don't think it's especially feasible or useful. 'zfs join' can be hard because of inode number collisions, but may be useful. Imagine a situation that you have the following file systems: /tank /tank/foo /tank/bar and you want to move huge amount of data from /tank/foo to /tank/bar. If you use mv/tar/dump it will copy entire data. Much faster will be to 'zfs join tank tank/foo zfs join tank tank/bar' then just mv the data and 'zfs split' them back:) -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpU7idVrPav6.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Another paper
Below is another paper on drive failure analysis, this one won best paper at usenix: http://www.usenix.org/events/fast07/tech/schroeder/schroeder_html/ index.html What I found most interesting was the idea that drives don't fail outright most of the time. They can slow down operations, and slowly die. With this behavior in mind, I had an idea for a new feature in ZFS: If a disk fitness test were available to verify disk read/write and performance, future drive problems could be avoided. Some example tests: - full disk read - 8kb r/w iops - 1mb r/w iops - raw throughput Since one disk may be different than others, I thought a comparison between two presumably similar disks would be useful. The command would be something like: zpool dft c1t0d0 c1t1d0 Or: zpool dft all I think this would be a great feature, as only zfs can do fitness tests on live running disks behind the scenes. With the ability to compare individual disk performance, not only will you find bad disks, it's entirely possible you'll find mis- configurations (such as bad connections) as well. And yes, I do know about SMART. SMART can pre-indicate a disk failure. However, I've run SMART on drives with bearings that were gravel that passed smart, even though I knew the 10k drive was running at about 3k rpm due to the bearings. - Gregory Shaw, IT Architect Phone: (303) 272-8817 (x78817) ITCTO Group, Sun Microsystems Inc. 500 Eldorado Blvd, UBRM02-157 [EMAIL PROTECTED] (work) Broomfield, CO 80021 [EMAIL PROTECTED] (home) When Microsoft writes an application for Linux, I've Won. - Linus Torvalds ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Need performance data
Hi ZFS'ers, We're putting together an internal ZFS performance document and could use your experiences. If you have ZFS performance data to share please send it to me. I'm looking for good news or bad, whatever your actual experience is. Specific quantitative data is most useful. (It seems faster than VxFS on my widgeebot project is interesting - but not very.) If you send any data, please be sure to be specific about the version of ZFS (such as S10 6/06 unpatched, Solaris Nevada build 56, S10 11/06 with the following patches, etc.) that you used to get those results. Also please be as specific as possible about the hardware and the application environment. Inside the ZFS team we're working hard on a number of features, and particularly on performance. More information can only help us. Please reply directly to me, and let me know if you are willing to let me share your information - shrouded or attributed - in a summary to this alias at a later date. Thanks, Fred Zlotnick [EMAIL PROTECTED] Director, Solaris Data Technology ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Another paper
Gregory Shaw wrote: Below is another paper on drive failure analysis, this one won best paper at usenix: http://www.usenix.org/events/fast07/tech/schroeder/schroeder_html/index.html What I found most interesting was the idea that drives don't fail outright most of the time. They can slow down operations, and slowly die. Yes, this is what my data shows, too. You are most likely to see an unrecoverable read which leads to a retry (slow response symptom). With this behavior in mind, I had an idea for a new feature in ZFS: If a disk fitness test were available to verify disk read/write and performance, future drive problems could be avoided. Some example tests: - full disk read - 8kb r/w iops - 1mb r/w iops - raw throughput Some problems can be seen by doing a simple sequential read and comparing it to historical data. It depends on the failure mode, though. Since one disk may be different than others, I thought a comparison between two presumably similar disks would be useful. The command would be something like: zpool dft c1t0d0 c1t1d0 Or: zpool dft all I think this would be a great feature, as only zfs can do fitness tests on live running disks behind the scenes. I like the concept, but don't see why ZFS would be required. With the ability to compare individual disk performance, not only will you find bad disks, it's entirely possible you'll find mis-configurations (such as bad connections) as well. A few years ago we looked at unusual changes in response time as a leading indicator, but I don't recall the details as to why we dropped the effort. Perhaps we should take a look again? And yes, I do know about SMART. SMART can pre-indicate a disk failure. However, I've run SMART on drives with bearings that were gravel that passed smart, even though I knew the 10k drive was running at about 3k rpm due to the bearings. ditto. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Another paper
On Feb 21, 2007, at 4:59 PM, Richard Elling wrote: With this behavior in mind, I had an idea for a new feature in ZFS: If a disk fitness test were available to verify disk read/write and performance, future drive problems could be avoided. Some example tests: - full disk read - 8kb r/w iops - 1mb r/w iops - raw throughput Some problems can be seen by doing a simple sequential read and comparing it to historical data. It depends on the failure mode, though. I agree. Having this feature could provide that history. Since one disk may be different than others, I thought a comparison between two presumably similar disks would be useful. The command would be something like: zpool dft c1t0d0 c1t1d0 Or: zpool dft all I think this would be a great feature, as only zfs can do fitness tests on live running disks behind the scenes. I like the concept, but don't see why ZFS would be required. I'm thinking of production systems. Since you can't evacuate the disk, ZFS can do read/write tests on unused portion of the disk. I don't think that would be possible via another solution, such as SVM/ UFS. With the ability to compare individual disk performance, not only will you find bad disks, it's entirely possible you'll find mis- configurations (such as bad connections) as well. A few years ago we looked at unusual changes in response time as a leading indicator, but I don't recall the details as to why we dropped the effort. Perhaps we should take a look again? More information is good in my book. Anything that can tell me that things-aren't-quite-right is more uptime that can be provided. And yes, I do know about SMART. SMART can pre-indicate a disk failure. However, I've run SMART on drives with bearings that were gravel that passed smart, even though I knew the 10k drive was running at about 3k rpm due to the bearings. ditto. -- richard - Gregory Shaw, IT Architect Phone: (303) 272-8817 (x78817) ITCTO Group, Sun Microsystems Inc. 500 Eldorado Blvd, UBRM02-157 [EMAIL PROTECTED] (work) Broomfield, CO 80021 [EMAIL PROTECTED] (home) When Microsoft writes an application for Linux, I've Won. - Linus Torvalds ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] suggestion: directory promotion to filesystem
On Feb 21, 2007, at 12:11 PM, Matthew Ahrens wrote: Adrian Saul wrote: Not hard to work around - zfs create and a mv/tar command and it is done... some time later. If there was say a zfs graft directory newfs command, you could just break of the directory as a new filesystem and away you go - no copying, no risking cleaning up the wrong files etc. Yep, this idea was previously discussed on this list -- search for zfs split and see the following RFE: 6400399 want zfs split Note that current draft specification for NFSv4.1 has the capability to split a filesystem such that the NFSv4.1 client will recognize it. Then the new filesystem can be migrated to another server is needed. Spencer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Another paper
On Wed, Feb 21, 2007 at 03:35:06PM -0700, Gregory Shaw wrote: Below is another paper on drive failure analysis, this one won best paper at usenix: http://www.usenix.org/events/fast07/tech/schroeder/schroeder_html/ index.html What I found most interesting was the idea that drives don't fail outright most of the time. They can slow down operations, and slowly die. Seems like there are a two pieces you're suggesting here: 1. Some sort of background process to proactively find errors on disks in use by ZFS. This will be accomplished by a background scrubbing option, dependent on the block-rewriting work Matt and Mark are working on. This will allow something like zpool set scrub=2weeks, which will tell ZFS to scrub my data at an interval such that all data is touched over a 2 week period. This will test reading from every block and verifying checksums. Stressing write failures is a little more difficult. 2. Distinguish slow drives from normal drives and proactively mark them faulted. This shouldn't require an explicit zpool dft, as we should be watching the response times of the various drives and keep this as a statistic. We want to incorporate this information to allow better allocation amongst slower and faster drives. Determining that a drive is abnormally slow is much more difficult, though it could theoretically be done if we had some basis - either historical performance for the same drive or comparison to identical drives (manufacturer/model) within the pool. While we've thought about these same issues, there is currently no active effort to keep track of these statistics or do anything with them. These two things combined should avoid the need for an explicit fitness test. Hope that helps, - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS failed Disk Rebuild time on x4500
Nissim Ben Haim wrote: I was asked by a customer considering the x4500 - how much time should it take to rebuild a failed Disk under RaidZ ? This question keeps popping because customers perceive software RAID as substantially inferior to HW raids. I could not find someone who has really measured this under several scenarios. It is a function of the amount of space used. As space used - 0, it becomes infinitely fast. As space used - 100% is approaches the speed of the I/O subsystem. In my experience, no hardware RAID array comes close, they all throttle the resync, though some of them allow you to tune it a little bit. The key advantage over a hardware RAID system is that ZFS knows where the data is and doesn't need to replicate unused space. A hardware RAID array doesn't know anything about the data, so it must reconstruct the entire disk. Also, the reconstruction is done in time order. See Jeff Bonwick's blog: http://blogs.sun.com/bonwick/entry/smokin_mirrors I've measured resync on some slow IDE disks (*not* an X4500) at an average of 20 MBytes/s. So if you have a 500 GByte drive, that would resync a 100% full file system in about 7 hours versus 11 days for some other systems (who shall remain nameless :-) -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Another paper
On Feb 21, 2007, at 5:20 PM, Eric Schrock wrote: On Wed, Feb 21, 2007 at 03:35:06PM -0700, Gregory Shaw wrote: Below is another paper on drive failure analysis, this one won best paper at usenix: http://www.usenix.org/events/fast07/tech/schroeder/schroeder_html/ index.html What I found most interesting was the idea that drives don't fail outright most of the time. They can slow down operations, and slowly die. Seems like there are a two pieces you're suggesting here: 1. Some sort of background process to proactively find errors on disks in use by ZFS. This will be accomplished by a background scrubbing option, dependent on the block-rewriting work Matt and Mark are working on. This will allow something like zpool set scrub=2weeks, which will tell ZFS to scrub my data at an interval such that all data is touched over a 2 week period. This will test reading from every block and verifying checksums. Stressing write failures is a little more difficult. I was thinking of something similar to a scrub. An ongoing process seemed too intrusive. I'd envisioned a cron job similar to a scrub (or defrag) that could be run periodically to show any differences between disk performance over time. 2. Distinguish slow drives from normal drives and proactively mark them faulted. This shouldn't require an explicit zpool dft, as we should be watching the response times of the various drives and keep this as a statistic. We want to incorporate this information to allow better allocation amongst slower and faster drives. Determining that a drive is abnormally slow is much more difficult, though it could theoretically be done if we had some basis - either historical performance for the same drive or comparison to identical drives (manufacturer/model) within the pool. While we've thought about these same issues, there is currently no active effort to keep track of these statistics or do anything with them. I thought this would be very difficult to determine, as a slow disk could be a transient problem. Me, I like tools that give me information I can work with. Fully automated systems always seem to cause more problems than they solve. For instance, if I have a drive on a pc using a shared ide bus, is it the disk that is slow, or the connection method? It's obviously the second, but finding that programatically will be very difficult. I like the idea of a dft for testing a disk in a subjective manner. One benefit of this could be an objective performance test baseline for disks and arrays. Btw, it does help. :-) These two things combined should avoid the need for an explicit fitness test. Hope that helps, - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/ eschrock ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss - Gregory Shaw, IT Architect Phone: (303) 272-8817 (x78817) ITCTO Group, Sun Microsystems Inc. 500 Eldorado Blvd, UBRM02-157 [EMAIL PROTECTED] (work) Broomfield, CO 80021 [EMAIL PROTECTED] (home) When Microsoft writes an application for Linux, I've Won. - Linus Torvalds ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Another paper
On 2/22/07, Gregory Shaw [EMAIL PROTECTED] wrote: I was thinking of something similar to a scrub. An ongoing process seemed too intrusive. I'd envisioned a cron job similar to a scrub (or defrag) that could be run periodically to show any differences between disk performance over time. ... I thought this would be very difficult to determine, as a slow disk could be a transient problem. Me, I like tools that give me information I can work with. Fully automated systems always seem to cause more problems than they solve. If the stats are publishable, then something like cacti or any monitoring tool should provide most admins with enough tools to spot potential issues. Nicholas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: SPEC SFS benchmark of NFS/ZFS/B56 - please help to improve it!
Leon Koll wrote: The fact that the described problem is 100%-NFS-client-problem, there is nothing to do with ZFS code to improve the situtaion. You may want to see if the folks over at [EMAIL PROTECTED] have any ideas on your NFS problem. --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Exporting zvol properties to .zfs
Dale Ghent wrote: but it got me thinking about how things such as the current compression ratio for a volume could be indicated over a otherwise ZFS-agnostic NFS export. The .zfs snapdir came to mind. Perhaps ZFS could maintain a special file under there, called compressratio for example, and a remote client could cat it or whatever to be aware of how volume compression factors into their space usage. Yeah, it would be cool to be able to access (read-only, at least) the zfs property settings over nfs via .zfs/props or some such. Filed RFE: 6527390 want to read zfs properties over nfs (eg via .zfs/props) --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Another paper
All, I think dtrace could be a viable option here. crond to run a dtrace script on a regular basis that times a series of reads and then provides that info to Cacti or rrdtool. It's not quite the one-size-fits-all that the OP was looking for, but if you want trends, this should get 'em. $0.02 Regards, TJ Easter On 2/21/07, Nicholas Lee [EMAIL PROTECTED] wrote: On 2/22/07, Gregory Shaw [EMAIL PROTECTED] wrote: I was thinking of something similar to a scrub. An ongoing process seemed too intrusive. I'd envisioned a cron job similar to a scrub (or defrag) that could be run periodically to show any differences between disk performance over time. ... I thought this would be very difficult to determine, as a slow disk could be a transient problem. Me, I like tools that give me information I can work with. Fully automated systems always seem to cause more problems than they solve. If the stats are publishable, then something like cacti or any monitoring tool should provide most admins with enough tools to spot potential issues. Nicholas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Being a humanist means trying to behave decently without expectation of rewards or punishment after you are dead. -- Kurt Vonnegut http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0x31185D8E ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Another paper
Correct me if I'm wrong but fma seems like a more appropriate tool to track disk errors. -- Just me, Wire ... On 2/22/07, TJ Easter [EMAIL PROTECTED] wrote: All, I think dtrace could be a viable option here. crond to run a dtrace script on a regular basis that times a series of reads and then provides that info to Cacti or rrdtool. It's not quite the one-size-fits-all that the OP was looking for, but if you want trends, this should get 'em. $0.02 Regards, TJ Easter ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS vs UFS performance Using Different Raid Configurations
Since most of our customers are predominantly UFS based, we would like to use the same configuration and compare ZFS performance, so that we can announce support for ZFS. We're planning on measuring the performance of a ZFS file system vs UFS file system. Please look at the following scenario and let us know if this is a good performance measurement criterion. Also I read in the ZFS Administration guide that the You can construct Logical Devices for ZFS using volumes presented by software-based volume managers, such as Solaris Volume Manager or Veritas Volume ManagerVxVm. These configurations are not recommended. ZFS might work properly on such devices, but less-than-optimal performance might be the result.. so with this in mind. ZFS vs UFS/SVM: UFS File systems are created using SVM ZFS File systems are created using the disks. So using the same disks eg c0t0d0 / c0t1d0 / c0t2d0 / c0t3d0 1) Create STRIPE 2) Create Mirror 3) Create RAID-5 And run bunch of performance tests. We're using SWAT for measuring I/O. -D This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss