Re: [zfs-discuss] ZFS Pool, what happen when disk failure
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Robert Milkowski On 24/04/2010 13:51, Edward Ned Harvey wrote: But what you might not know: If any pool fails, the system will crash. This actually depends on the failmode property setting in your pools. The default is panic, but it also might be wait or continue - see zpool(1M) man page for more details. You will need to power cycle. The system won't boot up again; you'll have to The system should boot-up properly even if some pools are not accessible (except rpool of course). If it is not the case then there is a bug - last time I checked it worked perfectly fine. This may be different in the latest opensolaris, but in the latest solaris, this is what I know: If a pool fails, and forces an ungraceful shutdown, then during the next bootup, the pool is treated as currently in use by another system. The OS doesn't come up all the way; you have to power cycle again, and go into failsafe mode. Then you can zpool import I think requiring the -f or -F, and reboot again normal. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RAID-Z2 degraded vs RAID-Z1
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Peter Tripp here, I'll swap it in for the sparse file and let it resilver. Can someone with a stronger understanding of ZFS tell me why a degraded RaidZ2 (minus one disk) is less efficient than RaidZ1? (Besides the fact that your pools are always reported as degraded.) I guess the same would apply with RaidZ2 vs RaidZ3 - 1disk. If a raidz2 is degraded by one disk, then the remaining volume has equivalent redundancy to a healthy raidz1. This is true. However, the double parity calculation and/or storage could possibly perform slower than a healthy raidz1. I don't know if that's just an unfounded fear, or if perhaps there's some reality behind it. Good question. I don't know the answer. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Making ZFS better: zfshistory
From: Richard Elling [mailto:richard.ell...@gmail.com] Sent: Saturday, April 24, 2010 10:43 AM Nope. That discussion seems to be concluded now. And the netapp does not have the problem that was suspected. I do not recall reaching that conclusion. I think the definition of the problem is what you continue to miss. The .snapshot directories do precisely what you would want them to do. Which is: The .snapshot directory belonging to a parent contains a copy of the filesystem as it looked at the time of the snapshot. But when you mv or rename a subdirectory, then the .snapshot subdir of the subdirectory correctly maps, to preserve the snapshots inside that directory. Agree, but the path is lost. In the original test, done by Jonathan, it seemed that way. But after two other people responded with their own repeats of that test, which concluded with no problems, Jonathan retried his, and found that he just needs to repeat the same ls command twice to get the proper result on the 2nd try. He is running a very old version of Ontap, so most likely he's just seeing the result of some bug. But others don't have that issue. The path is not lost. The following all work just fine: a/.snapshot/vol0_deleteme/b/c/d.txt or a/e/.snapshot/vol0_deleteme/c/d.txt a/e/c/.snapshot/vol0_deleteme/d.txt It seems, the NetApp snapshot is a directory-level snapshot rather than a file system snapshot. I cannot see how to merge the two, so perhaps adding such a feature to ZFS could not leverage the file system snapshot? Again, in Jonathan's original post, that's what it seemed like. But after Adam and Tom replied showing they didn't have the same behavior ... Jonathan reran his test, and got a different result. Which concluded: The .snapshot directory does indeed maintain a filesystem-level snapshot of the whole filesystem, but it also provides an easy-to-access interface within any subdirectory, even after renaming a directory. It works. Although Jonathan apparently saw a problem in an old version of the OS. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Making ZFS better: zfshistory
From: Ragnar Sundblad [mailto:ra...@csc.kth.se] Sent: Saturday, April 24, 2010 5:18 PM To answer the question you linked to: .shapshot/snapname.0/a/b/c/d.txt from the top of the filesystem a/.snapshot/snapname.0/b/c/d.txt a/e/.shapshot/snapname.0/c/d.txt a/e/c/.snapshot/snapname.0/d.txt I really don't understand what you mean, I think the path is there just fine, and IMHO pretty much in the way you would expect. Precisely. The snapshot exists, with the original name, from the higher level directory. But even after renaming, you go into the subdirectory with the new name, and you can still find the snapshots there. Very convenient. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Making ZFS better: zfshistory
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Freddie Cash From the sounds of it, the .snapshot directory is just a pointer to the corresponding directory in the actual snapshot tree. The snapshots are not actually saved per-directory. They just give you a handy shortcut into that same level in the snapshot directory tree. Precisely correct. Something like this would be very useful in ZFS, especially if you have deep directory trees in a single ZFS filesystem, and you want to compare files/directories in multiple snapshots. For example, our backups server has Especially what you said. But especially *especially* the following: If you have a zfs filesystem /exports, and a nested filesystem /exports/home, and another one /exports/home/jbond ... If you sometimes do your work in /exports/home/jbond/important/files/and/documents ... And you sometimes do your work in /exports/mi6/007/secret/directory ... Then you sometimes need to access snapshots under /exports/.zfs and sometimes under /exports/home/jbond/.zfs It's really nice if you don't have to figure out how far up the tree to go, in order to find the correct .zfs directory for what you want to find right now. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Making ZFS better: zfshistory
From: Richard Elling [mailto:richard.ell...@gmail.com] Sent: Saturday, April 24, 2010 7:42 PM Next, mv /a/e /a/E ls -l a/e/.snapshot/snaptime ENOENT? ls -l a/E/.snapshot/snapname/d.txt this should be ENOENT because d.txt did not exist in a/E at snaptime. Incorrect. E did exist. Inode 12345 existed, but it had a different name at the time of snapshot. Therefore, a/e/.snapshot/snapname/c/d.txt is the file at the time of snapshot. But these are also the same thing: a/E/.snapshot/snapname/c/d.txt a/E/c/.snapshot/snapname/d.txt It would be very annoying if you could have a directory named foo which contains all the snapshots for its own history, and then mv foo bar and suddenly the snapshots all disappear. This is not the behavior. The behavior is: If you mv foo bar then the snapshots which were previously accessible under foo are now accessible under bar. However, if you look in the snapshot of foo's parent, then you will see foo and not bar. Just the way it would have looked, at the time of the snapshot. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Pool, what happen when disk failure
On 25/04/2010 13:08, Edward Ned Harvey wrote: The system should boot-up properly even if some pools are not accessible (except rpool of course). If it is not the case then there is a bug - last time I checked it worked perfectly fine. This may be different in the latest opensolaris, but in the latest solaris, this is what I know: If a pool fails, and forces an ungraceful shutdown, then during the next bootup, the pool is treated as currently in use by another system. The OS doesn't come up all the way; you have to power cycle again, and go into failsafe mode. Then you can zpool import I think requiring the -f or -F, and reboot again normal. I just did a test on Solaris 10/09 - and system came up properly, entirely on its own, with a failed pool. zpool status showed the pool as unavailable (as I removed an underlying device) which is fine. -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Disk/Partition replacement - do partition begin/end/offsets matter?
One of my pools (backup pool) has a disk which I suspect may be going south. I have a replacement disk of the same size. The original pool was using one of the partitions towards the end of the disk. I want to move the partition to the beginning of the disk on the new disk. Does ZFS store/use partition start/end/offsets of partitions etc? Or is it just about: 1. Export the pool. 2. dd if=old_partition of=new_partition 3. Take the old disk out 4. Import the pool. Better to mirror onto the new disk and then remove the old disk. Then you only copy the blocks that are used. I think the question in nutshell is: Do the partition begin/end matter if the partition is same size? No. --chrris -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Making ZFS better: zfshistory
On Apr 25, 2010, at 5:45 AM, Edward Ned Harvey wrote: From: Richard Elling [mailto:richard.ell...@gmail.com] Sent: Saturday, April 24, 2010 7:42 PM Next, mv /a/e /a/E ls -l a/e/.snapshot/snaptime ENOENT? ls -l a/E/.snapshot/snapname/d.txt this should be ENOENT because d.txt did not exist in a/E at snaptime. Incorrect. E did exist. Inode 12345 existed, but it had a different name at the time of snapshot. Therefore, a/e/.snapshot/snapname/c/d.txt is the file at the time of snapshot. But these are also the same thing: a/E/.snapshot/snapname/c/d.txt a/E/c/.snapshot/snapname/d.txt OK, I'll believe you. How about this? mv a/E/c a/c mv a/E a/c mv a/c a/E now a/E/.snapshot/snapname/c/d.txt is ENOENT, correct? It would be very annoying if you could have a directory named foo which contains all the snapshots for its own history, and then mv foo bar and suddenly the snapshots all disappear. This is not the behavior. The behavior is: If you mv foo bar then the snapshots which were previously accessible under foo are now accessible under bar. However, if you look in the snapshot of foo's parent, then you will see foo and not bar. Just the way it would have looked, at the time of the snapshot. The only way I know to describe this is that the path is lost. In other words, you cannot say ../.snapshot/snapname/self is the same as self/.snapshot/snapname, thus the relationship previously described as: Snapshots are taken. You can either file.txt via any of the following: /root/.snapshot/branch/leaf/file.txt /root/branch/.snapshot/leaf/file.txt /root/branch/leaf/.snapshot/file.txt is not guaranteed to be correct. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Help:Is zfs-fuse's performance is not good
I wonder if this is the right place to ask, as the Filesystem in User Space implementation is a separate project. In Solaris ZFS runs in kernel. FUSE implementations are slow, no doubt. Same goes for other FUSE implementations, such as for NTFS. Regards, Tonmaus -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Pool, what happen when disk failure
On 04/26/10 12:08 AM, Edward Ned Harvey wrote: [why do you snip attributions?] On 04/26/10 01:45 AM, Robert Milkowski wrote: The system should boot-up properly even if some pools are not accessible (except rpool of course). If it is not the case then there is a bug - last time I checked it worked perfectly fine. This may be different in the latest opensolaris, but in the latest solaris, this is what I know: If a pool fails, and forces an ungraceful shutdown, then during the next bootup, the pool is treated as currently in use by another system. The OS doesn't come up all the way; you have to power cycle again, and go into failsafe mode. Then you can zpool import I think requiring the -f or -F, and reboot again normal. I think you are describing what happens if the root pool has problems. Other pools are just shown as unavailable. The system will come up, but failure to mount any filesystems in the absent pool will cause the filesystem/local service to be in maintenance state. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mac OS X clients with ZFS server
On Thu, 22 Apr 2010, Mike Mackovitch wrote: Oh, and the kernel.log should at least have the lockd not responding messages in it. So, I presume you meant nothing *else* interesting. I think it's time to look at the packets... I'll double check later (my wife is currently using the lpatop). (...and perhaps time to move this off of zfs-discuss seeing as this is really an NFS/networking issue and not a ZFS issue.) Sounds fair enough! Let's move this to email; meanwhile, what's the packet sniffing incantation I need to use? On Solaris I'd use snoop, but I don't htink Mac OS comes with that! -- Rich Teer, Publisher Vinylphile Magazine www.vinylphilemag.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mac OS X clients with ZFS server
On Fri, 23 Apr 2010, Alex Blewitt wrote: For your information, the ZFS project lives (well, limps really) on at http://code.google.com/p/mac-zfs. You can get ZFS for Snow Leopard from there and we're working on moving forwards from the ancient pool support to something more recent. I've relatively recently merged in the onnv-gate repository (at build 72) which should make things easier to track in the future. That's good to hear! I thought Apple yanking ZFS support from Mac OS was a really dumb idea. Do you work for Apple? No, the entire effort is community based. Please feel free to join up to the mailing list from the project page if you're interested in ZFS on Mac OSX. I tried going to that URL, but got a 404 error... :-( What's the correct one, please? -- Rich Teer, Publisher Vinylphile Magazine www.vinylphilemag.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mac OS X clients with ZFS server
The correct URL is: http://code.google.com/p/maczfs/ -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Rich Teer Sent: Sunday, April 25, 2010 7:11 PM To: Alex Blewitt Cc: ZFS discuss Subject: Re: [zfs-discuss] Mac OS X clients with ZFS server On Fri, 23 Apr 2010, Alex Blewitt wrote: For your information, the ZFS project lives (well, limps really) on at http://code.google.com/p/mac-zfs. You can get ZFS for Snow Leopard from there and we're working on moving forwards from the ancient pool support to something more recent. I've relatively recently merged in the onnv-gate repository (at build 72) which should make things easier to track in the future. That's good to hear! I thought Apple yanking ZFS support from Mac OS was a really dumb idea. Do you work for Apple? No, the entire effort is community based. Please feel free to join up to the mailing list from the project page if you're interested in ZFS on Mac OSX. I tried going to that URL, but got a 404 error... :-( What's the correct one, please? -- Rich Teer, Publisher Vinylphile Magazine www.vinylphilemag.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Thoughts on drives for ZIL/L2ARC?
I have a few old drives here that I thought might help me a little, though not at much as a nice SSD, for those uses. I'd like to speed up NFS writes, and there have been some mentions that even a decent HDD can do this, though not to the same level a good SSD will. The 3 drives are older LVD SCSI Cheetah drives. ST318203LW. I have 2 controllers I could use, one appears to be a RAID controller with a memory module installed. An Adaptec AAA-131U2. The memory module comes up on Google as a 2MB EDO DIMM. Not sure that's worth anything to me. :) The other controller is an Adaptec 29160. Looks to be a 64-bit PCI card, but the machine it came from is only 32-bit PCI, as is my current machine. What say the pros here? I'm concerned that the max data rate is going to be somewhat low with them, but the seek time should be good as they are 10K RPM (I think). The only reason I thought to use one for L2ARC is for dedupe. It sounds like L2ARC helps a lot there. This is for a home server, so all I'm really looking to do is speed things up a bit while I save and look for a decent SSD option. However, if it's a waste of time, I'd rather find out before I install them. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mac OS X clients with ZFS server
On 4/25/10 6:07 PM, Rich Teer rich.t...@rite-group.com wrote: Sounds fair enough! Let's move this to email; meanwhile, what's the packet sniffing incantation I need to use? On Solaris I'd use snoop, but I don't htink Mac OS comes with that! Use Wireshark (formerly Ethereal); works great for me. It does require X11 on your machine. -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mac OS X clients with ZFS server
On 4/25/10 6:11 PM, Rich Teer rich.t...@rite-group.com wrote: I tried going to that URL, but got a 404 error... :-( What's the correct one, please? http://code.google.com/p/maczfs/ -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?
I'm building another 24-bay rackmount storage server, and I'm considering what drives to put in the bays. My chassis is a Supermicro SC846A, so the backplane supports SAS or SATA; my controllers are LSI3081E, again supporting SAS or SATA. Looking at drives, Seagate offers an enterprise (Constellation) 2TB 7200RPM drive in both SAS and SATA configurations; the SAS model offers one quarter the buffer (16MB vs 64MB on the SATA model), the same rotational speed, and costs 10% more than its enterprise SATA twin. (They also offer a Barracuda XT SATA drive; it's roughly 20% less expensive than the Constellation drive, but rated at 60% the MTBF of the others and a predicted rate of nonrecoverable errors an order of magnitude higher.) Assuming I'm going to be using three 8-drive RAIDz2 configurations, and further assuming this server will be used for backing up home directories (lots of small writes/reads), how much benefit will I see from the SAS interface? -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on drives for ZIL/L2ARC?
Travis Tabbal wrote: I have a few old drives here that I thought might help me a little, though not at much as a nice SSD, for those uses. I'd like to speed up NFS writes, and there have been some mentions that even a decent HDD can do this, though not to the same level a good SSD will. The 3 drives are older LVD SCSI Cheetah drives. ST318203LW. I have 2 controllers I could use, one appears to be a RAID controller with a memory module installed. An Adaptec AAA-131U2. The memory module comes up on Google as a 2MB EDO DIMM. Not sure that's worth anything to me. :) The other controller is an Adaptec 29160. Looks to be a 64-bit PCI card, but the machine it came from is only 32-bit PCI, as is my current machine. What say the pros here? I'm concerned that the max data rate is going to be somewhat low with them, but the seek time should be good as they are 10K RPM (I think). The only reason I thought to use one for L2ARC is for dedupe. It sounds like L2ARC helps a lot there. This is for a home server, so all I'm really looking to do is speed things up a bit while I save and look for a decent SSD option. However, if it's a waste of time, I'd rather find out before I install them. I'd like to hear (or see tests of) how hard drive based ZIL/L2ARC can help RAIDZ performance. Examples would be large RAIDZ arrays such as: 8+ drives in a single RAIDZ1 16+ drives in a single RAIDZ2 24+ drives in a single RAIDZ3 (None of these are a series of smaller RAIDZ arrays that are striped.) From the writings I've seen, large non-striped RAIDZ arrays tend to have poor performance that is more or less limited to the I/O capacity of a single disk. The recommendations tend to suggest using smaller RAIDZ arrays and then striping them together whereby the RAIDZ provides redundancy and the striping provides reasonable performance. The advantage of large RAIDZ arrays is you can get better protection from drive failure (e.g. one 16 drive RAIDZ2 can lose any 2 drives vs two 8 drive RAIDZ1 striped arrays that can lose only one drive per array). So what about using a few dedicated two or three way mirrored drives for ZIL and/or L2ARC, in combination with the large RAIDZ arrays? The mirrored ZIL/L2ARC would serve as a cache to the slower RAIDZ. One model for this configuration is the cloud based ZFS test that was done here which used local drives configured as ZIL and L2ARC to minimize the impact of cloud latency, with respectable results: http://blogs.sun.com/jkshah/entry/zfs_with_cloud_storage_and The performance gap between local mirrored disks used for ZIL/L2ARC and a large RAIDZ is not nearly as large as the gap that was addressed in the cloud based ZFS test. Is the gap large enough to potentially benefit from HDD based mirrored ZIL/L2ARCs? Would SSD based ZIL/L2ARCs be necessary to see a worthwhile performance improvement? If this theory works out in practice,useful RAIDZ array sizes may not be as limited as much as they have been to date via best practices guidelines. Admins may then be able to choose to have larger more strongly redundant RAIDZ arrays while still keeping most of the performance of smaller striped RAIDZ arrays by using mirrored ZIL/L2ARC disks or SSDs. -hk ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Identifying drives
I have one storage server with 24 drives, spread across three controllers and split into three RAIDz2 pools. Unfortunately, I have no idea which bay holds which drive. Fortunately, this server is used for secondary storage so I can take it offline for a bit. My plan is to use zpool export to take each pool offline and then dd to do a sustained read off each drive in turn and watch the blinking lights to see which drive is which. In a nutshell: zpool export uberdisk1 zpool export uberdisk2 zpool export uberdisk3 dd if=/dev/rdsk/c9t0d0 of=/dev/null dd if=/dev/rdsk/c9t1d0 of=/dev/null [etc. 22 more times] zpool import uberdisk1 zpool import uberdisk2 zpool import uberdisk3 Are there any glaring errors in my reasoning here? My thinking is I should probably identify these disks before any problems develop, in case of erratic read errors that are enough to make me replace the drive without being enough to make the hardware ID it as bad. -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss