Re: [zfs-discuss] NFS performance on ZFS vs UFS
Tomas Ögren wrote: On 24 January, 2008 - Steve Hillman sent me these 1,9K bytes: I realize that this topic has been fairly well beaten to death on this forum, but I've also read numerous comments from ZFS developers that they'd like to hear about significantly different performance numbers of ZFS vs UFS for NFS-exported filesystems, so here's one more. The server is an x4500 with 44 drives configured in a RAID10 zpool, and two drives mirrored and formatted with UFS for the boot device. It's running Solaris 10u4, patched with the Recommended Patch Set from late Dec/07. The client (if it matters) is an older V20z w/ Solaris 10 3/05. No tuning has been done on either box The test involved copying lots of small files (2-10k) from an NFS client to a mounted NFS volume. A simple 'cp' was done, both with 1 thread and 4 parallel threads (to different directories) and then I monitored to see how fast the files were accumulating on the server. ZFS: 1 thread - 25 files/second; 4 threads - 25 files/second (~6 per thread) UFS: (same server, just exported /var from the boot volume) 1 thread - 200 files/second; 4 threads - 520 files/second (~130/thread) To get similar (lower) consistency guarantees, try disabling ZIL.. google://zil_disable .. This should up the speed, but might cause disk corruption if the server crashes while a client is writing data.. (just like with UFS) Disabling the ZIL does NOT cause disk corruption. It doesn't even cause ZFS to be inconsistent on disk. What it does to is mean that you onlonger have guaranteed synchronous write semantics - ie on crash an application might have done a synch write that never made it to stable storage. BTW there isn't really any such think as disk corruption there is data corruption :-) -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Synchronous scrub?
Hello Eric, Wednesday, January 23, 2008, 7:21:42 PM, you wrote: ES Sorry, no such feature exists. We do generate sysevents for when ES resilvers are completed, but not scrubs. Adding those sysevents would ES be an easy change, but doing anything more complicated (such as baking ES that functionality into zpool(1M)) would be annoying. zpool --wait so it doesn't exit till requested scrub is completed? Shouldn't be that hard to implement (it should wait in user space so it could be killed). Best regards, Robert Milkowski mailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] NFS performance on ZFS vs UFS
Hello Darren, DJM BTW there isn't really any such think as disk corruption there is DJM data corruption :-) Well, if you scratch it hard enough :) -- Best regards, Robert Milkowski mailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Order of operations w/ checksum errors
zpool status shows a few checksum errors against 1 device in a raidz1 3 disk array and no read or write errors against that device. The pool marked as degraded. Is there a difference if you clear the errors for the pool before you scrub versus scrubing then clearing the errors? I'm not sure if the clearing errors prior to a scrub will replicate out any bad blocks that were identified as checksum errors previously that had since been cleared. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] 2 servers, 1 ZFS filesystem and corruptions
Hi, I have this setup: 2xSUN V440 servers with FC adapters, installed Solaris 10u4. Both servers see one LUN on XP storage. On that LUN is created ZFS filesystem (on server1). If I export that ZFS filesystem on server1, I can import it on server2, and vice-versa. If I have imported ZFS on server1 and try to import it on server2, it will fail (which is correct behavior). However, if I export filesystem on server1, import it on server2 and reboot server1 - after reboot, server1 will import same ZFS filesystem that is at that point mounted on server2, and I get corruptions since both systems have same ZFS FS mounted at same time! Is there any way to avoid such behavior - as this issue only arrizes at server reboot? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Order of operations w/ checksum errors
Hello Kam, Friday, January 25, 2008, 9:11:24 AM, you wrote: K zpool status shows a few checksum errors against 1 device in a K raidz1 3 disk array and no read or write errors against that K device. The pool marked as degraded. Is there a difference if you K clear the errors for the pool before you scrub versus scrubing then K clearing the errors? I'm not sure if the clearing errors prior to a K scrub will replicate out any bad blocks that were identified as K checksum errors previously that had since been cleared. K It doesn't mater - scrub won't replicate any errors. Now if these errors were correctable errors they were corrected by zfs at the same time it discovered them. Could you post zpool satus output? -- Best regards, Robert Milkowskimailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 2 servers, 1 ZFS filesystem and corruptions
Hi, pool wasn't exported. server1 was rebooted (with ZFS on it). During reboot ZFS (pool) was released, and I could import it on server2 (which I have done). However, when server1 was booting up it imported pool and mounted ZFS filesystems even thou they were already imported and mounted on server2. As I said, what is interesting - if both servers are up, I cannot import pool to other server, if it is on another. However, if server is booted up it somehow avoids check if same pool is already imported on other server, which in the end leads to same pool being imported on both servers and corruption. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sharenfs with over 10000 file systems
New, yes. Aware - probably not. Given cheap filesystems, users would create many filesystems was an easy guess, but I somehow don't think anybody envisioned that users would be creating tens of thousands of filesystems. ZFS - too good for it's own good :-p IMO (and given mails/posts I've seen typically by people using or wanting to use zfs at large universities and the like, for home directories) this is frequently driven by the need for per-user quotas. Since zfs doesn't have per-uid quotas, this means they end up creating (at least one) filesystem per user. That means a share per user, and locally a mount per user, which will never scale as well as (locally) a single share of /export/home, and a single mount (although there would of course be automounts to /home on demand, but they wouldn't slow down bootup). sharemgr and the like may be attempts to improve the situation, but they mitigate rather than eliminate the consequences of exploding what used to be a single large filesystem into a bunch of relatively small ones, simply based on the need to have per-user quotas with zfs. And there are still situations where a per-uid quota would be useful, such as /var/mail (although I could see that corrupting mailboxes in some cases) or other sorts of shared directories. OTOH, the implementation could certainly vary a little. The equivalent of the quotas file should be automatically created when quotas are enabled, and invisible; and unless quotas are not only disabled but purged somehow, it should maintain per-uid use statistics even for uids with no quotas, to eliminate the need for quotacheck (initialization of quotas might well be restricted to filesystem creation time, to eliminate the need for a cumbersome pass through existing data, at least at first; but that would probably be wanted too, since people don't always plan ahead). But other quota-related functionality could IMO maintain, although the implementations might have to get smarter, and there ought to be some alternative to the method presently used with ufs of simply reading the quotas file to iterate through the available stats. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 2 servers, 1 ZFS filesystem and corruptions
Hello Niksa, Friday, January 25, 2008, 9:27:17 AM, you wrote: NF Hi, NF I have this setup: NF 2xSUN V440 servers with FC adapters, installed Solaris 10u4. NF Both servers see one LUN on XP storage. NF On that LUN is created ZFS filesystem (on server1). NF If I export that ZFS filesystem on server1, I can import it on server2, and vice-versa. NF If I have imported ZFS on server1 and try to import it on NF server2, it will fail (which is correct behavior). NF However, if I export filesystem on server1, import it on server2 NF and reboot server1 - after reboot, server1 will import same ZFS NF filesystem that is at that point mounted on server2, and I get NF corruptions since both systems have same ZFS FS mounted at same time! NF Is there any way to avoid such behavior - as this issue only arrizes at server reboot? If the pool was exported it shouldn't have imported it. You sure you have actually exported it and not just unmounted? Another useful feature for you possible is 'zpool import -R ..' -- Best regards, Robert Milkowski mailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 2 servers, 1 ZFS filesystem and corruptions
Niksa Franceschi wrote: Hi, pool wasn't exported. server1 was rebooted (with ZFS on it). During reboot ZFS (pool) was released, and I could import it on server2 (which I have done). However, when server1 was booting up it imported pool and mounted ZFS filesystems even thou they were already imported and mounted on server2. As I said, what is interesting - if both servers are up, I cannot import pool to other server, if it is on another. However, if server is booted up it somehow avoids check if same pool is already imported on other server, which in the end leads to same pool being imported on both servers and corruption. You need fix for 6282725 hostname/hostid should be stored in the label It is available in latest Nevada bits, but not yet available in Solaris 10 update. For more information please see the following link: http://blogs.sun.com/erickustarz/entry/poor_man_s_cluster_end Hth, Victor ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] missing files on copy
Christopher Gorski wrote: unsorted/photosbackup/laptopd600/[D]/cag2b/eujpg/103-0398_IMG.JPG is a file that is always missing in the new tree. Oops, I meant: unsorted/drive-452a/[E]/drive/archives/seconddisk_20nov2002/eujpg/103-0398_IMG.JPG is always missing in the new tree. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] missing files on copy
Robert Milkowski wrote: Hello Christopher, Friday, January 25, 2008, 5:37:58 AM, you wrote: CG michael schuster wrote: I assume you've assured that there's enough space in /pond ... can you try $(cd pond/photos; tar cf - *) | (cd /pond/copytestsame; tar xf -) CG I tried it, and it worked. The new tree is an exact copy of the old one. could you run your cp as 'truss -t open -o /tmp/cp.truss cp * ' and then see if you can see all files being open for reads and check if they were successfully opened for writes? I ran: #truss -t open -o /tmp/cp.truss cp -pr * /pond/copytestsame/ Same result as with cp. The same files are missing in the new tree. unsorted/photosbackup/laptopd600/[D]/cag2b/eujpg/103-0398_IMG.JPG is a file that is always missing in the new tree. # ls /pond/photos/unsorted/drive-452a/\[E\]/drive/archives/seconddisk_20nov2002/eujpg/103* /pond/photos/unsorted/drive-452a/[E]/drive/archives/seconddisk_20nov2002/eujpg/103-0398_IMG.JPG /pond/photos/unsorted/drive-452a/[E]/drive/archives/seconddisk_20nov2002/eujpg/103-0399_IMG.JPG /pond/photos/unsorted/drive-452a/[E]/drive/archives/seconddisk_20nov2002/eujpg/103-0400_IMG.JPG # ls /pond/copytestsame/unsorted/drive-452a/\[E\]/drive/archives/seconddisk_20nov2002/eujpg/103* /pond/copytestsame/unsorted/drive-452a/[E]/drive/archives/seconddisk_20nov2002/eujpg/103-0399_IMG.JPG /pond/copytestsame/unsorted/drive-452a/[E]/drive/archives/seconddisk_20nov2002/eujpg/103-0400_IMG.JPG # grep eujpg /tmp/cp.truss | grep 103 | grep seconddisk open64(unsorted/drive-452a/[E]/drive/archives/seconddisk_20nov2002/eujpg/103-0399_IMG.JPG, O_RDONLY) = 0 open64(unsorted/drive-452a/[E]/drive/archives/seconddisk_20nov2002/eujpg/103-0399_IMG.JPG, O_RDONLY) = 0 open64(/pond/copytestsame//unsorted/drive-452a/[E]/drive/archives/seconddisk_20nov2002/eujpg/103-0399_IMG.JPG, O_RDONLY) = 6 open64(unsorted/drive-452a/[E]/drive/archives/seconddisk_20nov2002/eujpg/103-0400_IMG.JPG, O_RDONLY) = 0 open64(unsorted/drive-452a/[E]/drive/archives/seconddisk_20nov2002/eujpg/103-0400_IMG.JPG, O_RDONLY) = 0 open64(/pond/copytestsame//unsorted/drive-452a/[E]/drive/archives/seconddisk_20nov2002/eujpg/103-0400_IMG.JPG, O_RDONLY) = 6 The missing file does not seem to be in the truss output. -Chris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] NFS performance on ZFS vs UFS
Torrey McMahon [EMAIL PROTECTED] wrote: http://www.philohome.com/hammerhead/broken-disk.jpg :-) Be careful, things like this can result in device corruption! Jörg -- EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin [EMAIL PROTECTED](uni) [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] missing files on copy
On Fri, 25 Jan 2008 15:18:36 -0500 Tiernan, Daniel [EMAIL PROTECTED] wrote: You may have hit a cp and or shell bug due to the directory naming topology. Rather then depend on cp -r I prefer the cpio method: find * print|cpio -pdumv dest_path I'd try the find by itself to see if it yields the correct file list before piping into cpio... I will look into this and Jörg's suggestion when I return to the machine on Monday. -Chris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] missing files on copy
You may have hit a cp and or shell bug due to the directory naming topology. Rather then depend on cp -r I prefer the cpio method: find * print|cpio -pdumv dest_path I'd try the find by itself to see if it yields the correct file list before piping into cpio... -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Christopher Gorski Sent: Friday, January 25, 2008 12:52 PM To: Robert Milkowski Cc: zfs-discuss@opensolaris.org; michael schuster Subject: Re: [zfs-discuss] missing files on copy Christopher Gorski wrote: unsorted/photosbackup/laptopd600/[D]/cag2b/eujpg/103-0398_IMG.JPG is a file that is always missing in the new tree. Oops, I meant: unsorted/drive-452a/[E]/drive/archives/seconddisk_20nov2002/eujpg/103-0 398_IMG.JPG is always missing in the new tree. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss == Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html == ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 2 servers, 1 ZFS filesystem and corruptions
On Jan 25, 2008, at 6:06 AM, Niksa Franceschi wrote: Yes, the link explains quite well the issue we have. Only difference is that server1 can be manually rebooted, and while it's still down I can mount ZFS pool on server2 even without -f option, and yet server1 when booted up still mounts at same time. Just one questiong though. Is there any ETA when this patch may be available as official Solaris 10 patch? The current ETA is an early build of s10u6. We hope to have patches available before the full update 6. If you have a support contract, feel free to escalate. eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] LowEnd Batt. backed raid controllers that will deal with ZFS commit semantics correctly?
On Fri, Jan 25, 2008 at 12:59:18AM -0500, Kyle McDonald wrote: ... With the 256MB doing write caching, is there any further benefit to moving thte ZIL to a flash or other fast NV storage? Do some tests with/without ZIL enabled. You should see a big difference. You should see something equivalent to the performance of ZIL disabled with ZIL/RAM. I'd do ZIL with a battery-backed RAM in a heartbeat if I could find a card. I think others would as well. -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] LowEnd Batt. backed raid controllers that will deal with ZFS commit semantics correctly?
Albert Chin wrote: On Fri, Jan 25, 2008 at 12:59:18AM -0500, Kyle McDonald wrote: ... With the 256MB doing write caching, is there any further benefit to moving thte ZIL to a flash or other fast NV storage? Do some tests with/without ZIL enabled. You should see a big difference. You should see something equivalent to the performance of ZIL disabled with ZIL/RAM. I'd do ZIL with a battery-backed RAM in a heartbeat if I could find a card. I think others would as well. I agree when your disk's are slow to place the changes in 'safe' storage. My question is, with the ZIL on the main disks of the zPool, *and* those same disks write-cached by the battery-backed RAM on the RIAD controller, aren't the ZIL writes going to be (nearly?) just as fast as they would be to a dedicated NVRAM or FLASH device? Granted the 256MB on the RAID controller may not be enough, and it's a shame to have to share it among all the writes to the disk, not just the ZIL writes, but it should still be a huge improvement. My question is just how close does it come to the dedicated ZIL device? 90%? 50%? For that matter, considering *all* the writes that ZFS will do (in my case) will be to battery backed cache devices, is there still a risk to disabling the ZIL altogether? -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 2 servers, 1 ZFS filesystem and corruptions
Yes, the link explains quite well the issue we have. Only difference is that server1 can be manually rebooted, and while it's still down I can mount ZFS pool on server2 even without -f option, and yet server1 when booted up still mounts at same time. Just one questiong though. Is there any ETA when this patch may be available as official Solaris 10 patch? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] NFS performance on ZFS vs UFS
Robert Milkowski wrote: Hello Darren, DJM BTW there isn't really any such think as disk corruption there is DJM data corruption :-) Well, if you scratch it hard enough :) http://www.philohome.com/hammerhead/broken-disk.jpg :-) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] missing files on copy
Christopher Gorski [EMAIL PROTECTED] wrote: can you try $(cd pond/photos; tar cf - *) | (cd /pond/copytestsame; tar xf -) CG I tried it, and it worked. The new tree is an exact copy of the old one. could you run your cp as 'truss -t open -o /tmp/cp.truss cp * ' and then see if you can see all files being open for reads and check if they were successfully opened for writes? I ran: #truss -t open -o /tmp/cp.truss cp -pr * /pond/copytestsame/ Same result as with cp. The same files are missing in the new tree. unsorted/photosbackup/laptopd600/[D]/cag2b/eujpg/103-0398_IMG.JPG is a file that is always missing in the new tree. ... The missing file does not seem to be in the truss output. Do not expect to see anything useful when tracing open. But check getdents(2) i.e. what gets called from readdir(3). I recently got a star bug report from a FreeBSD guy that turned out to be a result from a missing .. entry in a zfs snapshot root dir. Check the source of the failing program also... I did recently spend a lot of time in fixing nasty bugs in the SCCS source and it turned out that there have been places where the author believed that . and .. are always returned by readdir(3) and that they are always returned first. Jörg -- EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin [EMAIL PROTECTED](uni) [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss