Re: [zfs-discuss] Mac OS X clients with ZFS server
On 26 Apr 2010, at 06:02, Dave Pooser wrote: On 4/25/10 6:07 PM, Rich Teer rich.t...@rite-group.com wrote: Sounds fair enough! Let's move this to email; meanwhile, what's the packet sniffing incantation I need to use? On Solaris I'd use snoop, but I don't htink Mac OS comes with that! Use Wireshark (formerly Ethereal); works great for me. It does require X11 on your machine. Macs come with the command-line tcpdump tool. Wireshark (recommended anyway!) can read files saved by tcpdump and snoop. Cheers, Chris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Identifying drives
On Mon, Apr 26, 2010 at 6:21 AM, Dave Pooser dave@alfordmedia.com wrote: I have one storage server with 24 drives, spread across three controllers and split into three RAIDz2 pools. Unfortunately, I have no idea which bay holds which drive. Fortunately, this server is used for secondary storage so I can take it offline for a bit. My plan is to use zpool export to take each pool offline and then dd to do a sustained read off each drive in turn and watch the blinking lights to see which drive is which. In a nutshell: zpool export uberdisk1 zpool export uberdisk2 zpool export uberdisk3 dd if=/dev/rdsk/c9t0d0 of=/dev/null dd if=/dev/rdsk/c9t1d0 of=/dev/null [etc. 22 more times] zpool import uberdisk1 zpool import uberdisk2 zpool import uberdisk3 Are there any glaring errors in my reasoning here? My thinking is I should probably identify these disks before any problems develop, in case of erratic read errors that are enough to make me replace the drive without being enough to make the hardware ID it as bad. There should be no need to take pools offline or anything like that. If it's just secondary storage then normal usage should be low enough to easily spot which drive you're hammering. (Personally, format-analyze-read rather than dd.) And there ought to be a consistent pattern rather than locations being random. If you can see the serial numbers on the drives then cross-referencing those with the serial numbers from the OS (eg from iostat -En) would be a good idea. (You are, I presume, using regular scrubs to catch latent errors.) -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re-attaching zpools after machine termination [amazon ebs ec2]
Then perhaps you should do zpool import -R / pool *after* you attach EBS. That way Solaris won't automatically try to import the pool and your scripts will do it once disks are available. zpool import doesn't work as there was no previous export. I'm trying to solve the case where the instance terminates unexpectedly; think of someone just pulling the plug. There's no way to do the export operation before it goes down, but I still need to bring it back up, attach the EBS drives and continue as previous. The start/attach/reboot/available cycle is interesting, however. I may be able to init a reboot after attaching the drives, but it's not optimal - there's always a chance the instance might not come back up after the reboot. And it still doesn't answer *why* the drives aren't showing any data after they're initially attached. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files
Hi Tim, thanks for sharing your dedup experience. Especially for Virtualization, having a good pool of experience will help a lot of people. So you see a dedup ratio of 1.29 for two installations of Windows Server 2008 on the same ZFS backing store, if I understand you correctly. What dedup ratios do you see for the third, fourth and fifth server installation? Also, maybe dedup is not the only way to save space. What compression rate do you get? And: Have you tried setting up a Windows System, then setting up the next one based on a ZFS clone of the first one? Hope this helps, Constantin On 04/23/10 08:13 PM, tim Kries wrote: Dedup is a key element for my purpose, because i am planning a central repository for like 150 Windows Server 2008 (R2) servers which would take a lot less storage if they dedup right. -- Sent from OpenSolaris, http://www.opensolaris.org/ Constantin Gonzalez Sun Microsystems GmbH, Germany Principal Field Technologist Blog: constantin.glez.de Tel.: +49 89/4 60 08-25 91 Twitter: @zalez Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Jürgen Kunz ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re-attaching zpools after machine termination [amazon ebs ec2]
On 26/04/2010 09:27, Phillip Oldham wrote: Then perhaps you should do zpool import -R / pool *after* you attach EBS. That way Solaris won't automatically try to import the pool and your scripts will do it once disks are available. zpool import doesn't work as there was no previous export. I'm trying to solve the case where the instance terminates unexpectedly; think of someone just pulling the plug. There's no way to do the export operation before it goes down, but I still need to bring it back up, attach the EBS drives and continue as previous. The start/attach/reboot/available cycle is interesting, however. I may be able to init a reboot after attaching the drives, but it's not optimal - there's always a chance the instance might not come back up after the reboot. And it still doesn't answer *why* the drives aren't showing any data after they're initially attached. You don't have to do exports as I suggested to use 'zpool -R / pool' (notice -R). If you do so that a pool won't be added to zpool.cache and therefore after a reboot (unexpected or not) you will be able to import it again (and do so with -R). That way you can easily script it so import happens after your disks ara available. -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Solaris 10 and ZFS dedupe status
div id=jive-html-wrapper-div brdivdivOn Jan 5, 2010, at 4:38 PM, Bob Friesenhahn wrote:/divbr class=Apple-interchange-newlineblockquote type=citedivOn Mon, 4 Jan 2010, Tony Russell wrote:brbrblockquote type=citeI am under the impression that dedupe is still only in OpenSolaris and that support for dedupe is limited or non existent.nbsp; Is this true?nbsp; I would like to use ZFS and the dedupe capability to store multiple virtual machine images.nbsp; The problem is that this will be in a production environment and would probably call for Solaris 10 instead of OpenSolaris.nbsp; Are my statements on this valid or am I off track?br/blockquotebrIf dedup gets scheduled for Solaris 10 (I don't know), it would surely not be available until at least a year from now.brbrDedup in OpenSolaris still seems risky to use other than for experimental purposes. nbsp;It has only recently become available.br/div/blockquotebr/divdivI've just wrote an entry about update 9, nbsp;I think it will contain zpool version 19, so no dedup for this release if that's nbsp;correct./divdivbr/divdivRegards/div brdiv span class=Apple-style-span style=border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-size: medium; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; divHenrikdiva href=http://sparcv9.blogspot.com/;http://sparcv9.blo gspot.com/a/div/div/span /div br /div___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discu ss I'm pretty sure Solaris 10 update 9 will have zpool version 22 so WILL have dedup. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?
- Dave Pooser dave@alfordmedia.com skrev: I'm building another 24-bay rackmount storage server, and I'm considering what drives to put in the bays. My chassis is a Supermicro SC846A, so the backplane supports SAS or SATA; my controllers are LSI3081E, again supporting SAS or SATA. Looking at drives, Seagate offers an enterprise (Constellation) 2TB 7200RPM drive in both SAS and SATA configurations; the SAS model offers one quarter the buffer (16MB vs 64MB on the SATA model), the same rotational speed, and costs 10% more than its enterprise SATA twin. (They also offer a Barracuda XT SATA drive; it's roughly 20% less expensive than the Constellation drive, but rated at 60% the MTBF of the others and a predicted rate of nonrecoverable errors an order of magnitude higher.) Assuming I'm going to be using three 8-drive RAIDz2 configurations, and further assuming this server will be used for backing up home directories (lots of small writes/reads), how much benefit will I see from the SAS interface? We haver a similar system, SuperMicro 24-bay server with 22x2TB (and two SSDs for the root) configured as three RAIDz2 sets with seven drives each and a spare. We chose 'desktop' drives, since they offer (more or less) the same speed and with that redundancy, the chance for pool failure is so low, I guess 'enterprise' drives wouldn't help a lot more. About SAS vs SATA, I'd guess you won't be able to see any change at all. The bottleneck is the drives, not the interface to them. roy ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] How to delegate zfs snapshot destroy to users?
Hi, I'm trying to let zfs users to create and destroy snapshots in their zfs filesystems. So rpool/vm has the permissions: osol137 19:07 ~: zfs allow rpool/vm Permissions on rpool/vm - Permission sets: @virtual clone,create,destroy,mount,promote,readonly,receive,rename,rollback,send,share,snapshot,userprop Create time permissions: @virtual Local permissions: group staff create,mount now as regular user I do: $ zfs create rpool/vm/vm156888 $ zfs create rpool/vm/vm156888/a $ zfs snapshot rpool/vm/vm156888/a...@1 $ zfs destroy rpool/vm/vm156888/a...@1 cannot destroy 'rpool/vm/vm156888/a...@1': permission denied The only way around I found is to add 'allow' right to the @virtual group sudo zfs allow -s @virtual allow rpool/vm Now as regular user I can: zfs allow vm156888 mount,destroy rpool/vm/vm156888/a zfs destroy rpool/vm/vm156888/a...@1 I believe that I need to do this because the Create time permissions are used only as Local permissions on new filesystem, while for deleting snapshot I need them as Local+Descendent. So user if he wants to use snapshots, he has to know to grant himself mount+delete permissions first. Is this the intended way to go? Thank you -- Vlad ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Making ZFS better: zfshistory
From: Richard Elling [mailto:richard.ell...@gmail.com] Sent: Sunday, April 25, 2010 2:12 PM E did exist. Inode 12345 existed, but it had a different name at the time OK, I'll believe you. How about this? mv a/E/c a/c mv a/E a/c mv a/c a/E The thing that's still confusing you is the idea that directory names or locations matter. They don't. Remember that a directory is just an inode, with text and data inside it, which stores an association of child names and child inode numbers. Suppose somedir is inode 12345. Then if you ls somedir/.snapshot/somesnap then the system is reading a version of inode 12345 in a time gone by. At that time, inode 12345 may have been referenced by its parent using the name foo instead of somedir but that won't even matter in this case because we've only instructed the system to read the contents of a past version of inode 12345. In this case, we haven't told the system to do anything even slightly related to any parent of that inode. We're not even going to know what name was associated with inode 12345 at that time. At the time of somesnap, inode 12345 had contents which indicate a.txt is inode 1000 and b.txt is inode 1050 and so on. So a.txt and b.txt will appear in the directory listing, and if you cat a.txt or b.txt, the system will fetch inode 1000 or 1050 as it appeared at the time of the snapshot. Does that help? There is no actual entity called .snapshot It's a magical thing, just like there is no actual entity called .zfs If you ls somedir or ls somezfsfilesystem you will see, that the parent inode does not contain any reference to anything called .snapshot or .zfs (Unless you turned it on for some reason.) However, if you cd .snapshot or cd .zfs then there's some magic behind the scenes that's able to handle that differently. I don't know how they do that. But I do know it's not listed in the inode like any other normal child subdirectory or file. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Pool, what happen when disk failure
From: Ian Collins [mailto:i...@ianshome.com] Sent: Sunday, April 25, 2010 5:09 PM To: Edward Ned Harvey Cc: 'Robert Milkowski'; zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] ZFS Pool, what happen when disk failure On 04/26/10 12:08 AM, Edward Ned Harvey wrote: [why do you snip attributions?] Nobody snipped attributions, and even if they did, get over it. It's not always needed for any reason. On 04/26/10 01:45 AM, Robert Milkowski wrote: The system should boot-up properly even if some pools are not accessible (except rpool of course). If it is not the case then there is a bug - last time I checked it worked perfectly fine. This may be different in the latest opensolaris, but in the latest solaris, this is what I know: If a pool fails, and forces an ungraceful shutdown, then during the next bootup, the pool is treated as currently in use by another system. The OS doesn't come up all the way; you have to power cycle again, and go into failsafe mode. Then you can zpool import I think requiring the -f or -F, and reboot again normal. I think you are describing what happens if the root pool has problems. Other pools are just shown as unavailable. The system will come up, but failure to mount any filesystems in the absent pool will cause the filesystem/local service to be in maintenance state. No. I don't know how to resolve this - I also have Solaris 10/09, and it's somewhat a regular occurrence for the system to halt and refuse to come up, because something went wrong with the external nonredundant zpool. Namely ... the power got accidentally knocked off the external device. Or the device enclosure failed. Or something like that. So I have to do as I said, power cycle, go into failsafe mode, do a zpool import and I'll see the external pool is in use by system blahblahblah And then I zpool import it, with the -f or -F, and init 6. And then the system comes up clean. I don't know why my experience is different from Robert's. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Dave Pooser (lots of small writes/reads), how much benefit will I see from the SAS interface? In some cases, SAS outperforms SATA. I don't know what circumstances those are. I think the main reason anyone buys SAS disks is for reliability reasons. I maintain data centers for two companies, one of which uses all SAS, and the other uses mostly SATA. I have replaced many SATA disks in the last 3 years, and I have never replaced a single SAS disk. I don't know if my experience would be reflected in the published MTBF of the disks in question. Sometimes those numbers are sort of fudged, so I don't trust 'em or bother to look at them. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on drives for ZIL/L2ARC?
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Travis Tabbal I have a few old drives here that I thought might help me a little, though not at much as a nice SSD, for those uses. I'd like to speed up NFS writes, and there have been some mentions that even a decent HDD can do this, though not to the same level a good SSD will. If your clients are mounting async don't bother. If the clients are mounting async, then all the writes are done asynchronously, fully accelerated, and never any data written to ZIL log. If you'd like to measure whether or not you have anything to gain ... Temporarily disable the ZIL on the server. (And remount your filesystem.) If performance doesn't improve, then you can't gain anything by using a dedicated ZIL device. If performance does improve ... then you could expect to gain about half of the difference, by using a really good SSD. Rough numbers. Very rough. It's not advisable, in most cases, to leave the ZIL disabled. It's valuable after an ungraceful shutdown. So I'd advise only disabling the ZIL while you're testing for performance. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on drives for ZIL/L2ARC?
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Travis Tabbal Oh, one more thing. Your subject says ZIL/L2ARC and your message says I want to speed up NFS writes. ZIL (log) is used for writes. L2ARC (cache) is used for reads. I'd recommend looking at the ZFS Best Practices Guide. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Roy Sigurd Karlsbakk About SAS vs SATA, I'd guess you won't be able to see any change at all. The bottleneck is the drives, not the interface to them. That doesn't agree with my understanding. My understanding, for a single disk, you're right. No disk can come near the bus speed for either SATA or SAS. But SCSI vs ATA, the SCSI is supposed to have a more efficient bus utilization when many disks are all doing thing simultaneously, such as they might in a big old RAID, 48 disks, etc, like you have. 'Course none of that matters if you're serving it all over a 1Gb ether. ;-) I don't know under what circumstances SAS performance would be SATA. Nor do I know by how much. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re-attaching zpools after machine termination [amazon ebs ec2]
On 26/04/2010 11:14, Phillip Oldham wrote: You don't have to do exports as I suggested to use 'zpool -R / pool' (notice -R). I tried this after your suggestion (including the -R switch) but it failed, saying the pool I was trying to import didn't exist. which means it couldn't discover it. does 'zpool import' (no other options) list the pool? -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Expand zpool capacity
It's a litle while ago, but i've found a a href=http://www.youtube.com/watch?v=tpzsSptzmyA;pretty helpful video on YT/a how to completely migrate from one harddrive to another. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on drives for ZIL/L2ARC?
If your clients are mounting async don't bother. If the clients are ounting async, then all the writes are done asynchronously, fully accelerated, and never any data written to ZIL log. I've tried async, things run well until you get to the end of the job, then the process hangs until the write is complete. This was just with tar extracting to the NFS drive. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on drives for ZIL/L2ARC?
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Travis Tabbal Oh, one more thing. Your subject says ZIL/L2ARC and your message says I want to speed up NFS writes. ZIL (log) is used for writes. L2ARC (cache) is used for reads. I'd recommend looking at the ZFS Best Practices Guide. At the end of my OP I mentioned that I was interested in L2ARC for dedupe. It sounds like the DDT can get bigger than RAM and slow things to a crawl. Not that I expect a lot from using an HDD for that, but I thought it might help. I'd like to get a nice SSD or two for this stuff, but that's not in the budget right now. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Making ZFS better: zfshistory
On Apr 26, 2010, at 5:02 AM, Edward Ned Harvey wrote: From: Richard Elling [mailto:richard.ell...@gmail.com] Sent: Sunday, April 25, 2010 2:12 PM E did exist. Inode 12345 existed, but it had a different name at the time OK, I'll believe you. How about this? mv a/E/c a/c mv a/E a/c mv a/c a/E The thing that's still confusing you is the idea that directory names or locations matter. They don't. Maybe directory consistency doesn't matter for MS-DOS 1.0, but I'm pretty sure that directory consistency is useful in UNIX. Remember that a directory is just an inode, with text and data inside it, which stores an association of child names and child inode numbers. Suppose somedir is inode 12345. Then if you ls somedir/.snapshot/somesnap then the system is reading a version of inode 12345 in a time gone by. At that time, inode 12345 may have been referenced by its parent using the name foo instead of somedir but that won't even matter in this case because we've only instructed the system to read the contents of a past version of inode 12345. In this case, we haven't told the system to do anything even slightly related to any parent of that inode. We're not even going to know what name was associated with inode 12345 at that time. At the time of somesnap, inode 12345 had contents which indicate a.txt is inode 1000 and b.txt is inode 1050 and so on. So a.txt and b.txt will appear in the directory listing, and if you cat a.txt or b.txt, the system will fetch inode 1000 or 1050 as it appeared at the time of the snapshot. Does that help? I completely understand this. No magic here. There is no actual entity called .snapshot It's a magical thing, just like there is no actual entity called .zfs If you ls somedir or ls somezfsfilesystem you will see, that the parent inode does not contain any reference to anything called .snapshot or .zfs (Unless you turned it on for some reason.) Yes. And you agree that the relationship to parent directories does not matter, correct? In other words, a tool that looks at either the parent or child snapshot directories is useless. Put another way, you cannot implement something like time machine using directory-level snapshot subdirectories. However, if you cd .snapshot or cd .zfs then there's some magic behind the scenes that's able to handle that differently. I don't know how they do that. But I do know it's not listed in the inode like any other normal child subdirectory or file. 'nuff said -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?
On Apr 25, 2010, at 10:02 PM, Dave Pooser wrote: I'm building another 24-bay rackmount storage server, and I'm considering what drives to put in the bays. My chassis is a Supermicro SC846A, so the backplane supports SAS or SATA; my controllers are LSI3081E, again supporting SAS or SATA. Looking at drives, Seagate offers an enterprise (Constellation) 2TB 7200RPM drive in both SAS and SATA configurations; the SAS model offers one quarter the buffer (16MB vs 64MB on the SATA model), the same rotational speed, and costs 10% more than its enterprise SATA twin. (They also offer a Barracuda XT SATA drive; it's roughly 20% less expensive than the Constellation drive, but rated at 60% the MTBF of the others and a predicted rate of nonrecoverable errors an order of magnitude higher.) Assuming I'm going to be using three 8-drive RAIDz2 configurations, and further assuming this server will be used for backing up home directories (lots of small writes/reads), how much benefit will I see from the SAS interface? For a single connection from a host to a disk, they are basically equivalent. SAS shines with multiple connections to one or more hosts. Hence, SAS is quite popular when implementing HA clusters. Note: drive differentiation is market driven, not technology driven. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files
Hi, The setting was this: Fresh installation of 2008 R2 - server backup with the backup feature - move vhd to zfs - install active directory role - backup again - move vhd to same share I am kinda confused over the change of dedup ratio from changing the record size, since it should dedup 256-bit blocks. I have to set up the opensolaris again since it died in my virtualbox (no sure why), so i cant test more server installations atm. Compression seemed to work pretty good (i used gzip-6) and i think it was compress ratio ~4, but i dont think that would work well for productive systems since you would need some serious cpu-power to work with. I will setup up another test in a few hours. Personally i am not sure if using clones might be a good idea for windows server 2008, all these problems with sid... -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to delegate zfs snapshot destroy to users?
Hi Vlad, The create-time permissions do not provide the correct permissions for destroying descendent datasets, such as clones. See example 9-5 in this section that describes how to use zfs allow -d option to grant permissions on descendent datasets: http://docs.sun.com/app/docs/doc/819-5461/gebxb?l=ena=view Example 9–5 Delegating Permissions at the Correct File System Level Delegating or granting the appropriate permissions will take some testing on the part of the administrator who is granting the permissions. I hope the examples help. Thanks, Cindy On 04/26/10 05:28, Vladimir Marek wrote: Hi, I'm trying to let zfs users to create and destroy snapshots in their zfs filesystems. So rpool/vm has the permissions: osol137 19:07 ~: zfs allow rpool/vm Permissions on rpool/vm - Permission sets: @virtual clone,create,destroy,mount,promote,readonly,receive,rename,rollback,send,share,snapshot,userprop Create time permissions: @virtual Local permissions: group staff create,mount now as regular user I do: $ zfs create rpool/vm/vm156888 $ zfs create rpool/vm/vm156888/a $ zfs snapshot rpool/vm/vm156888/a...@1 $ zfs destroy rpool/vm/vm156888/a...@1 cannot destroy 'rpool/vm/vm156888/a...@1': permission denied The only way around I found is to add 'allow' right to the @virtual group sudo zfs allow -s @virtual allow rpool/vm Now as regular user I can: zfs allow vm156888 mount,destroy rpool/vm/vm156888/a zfs destroy rpool/vm/vm156888/a...@1 I believe that I need to do this because the Create time permissions are used only as Local permissions on new filesystem, while for deleting snapshot I need them as Local+Descendent. So user if he wants to use snapshots, he has to know to grant himself mount+delete permissions first. Is this the intended way to go? Thank you ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Identifying drives
luxadm(1m) has a led_blink subcommand you might find useful. -- richard On Apr 25, 2010, at 10:21 PM, Dave Pooser wrote: I have one storage server with 24 drives, spread across three controllers and split into three RAIDz2 pools. Unfortunately, I have no idea which bay holds which drive. Fortunately, this server is used for secondary storage so I can take it offline for a bit. My plan is to use zpool export to take each pool offline and then dd to do a sustained read off each drive in turn and watch the blinking lights to see which drive is which. In a nutshell: zpool export uberdisk1 zpool export uberdisk2 zpool export uberdisk3 dd if=/dev/rdsk/c9t0d0 of=/dev/null dd if=/dev/rdsk/c9t1d0 of=/dev/null [etc. 22 more times] zpool import uberdisk1 zpool import uberdisk2 zpool import uberdisk3 Are there any glaring errors in my reasoning here? My thinking is I should probably identify these disks before any problems develop, in case of erratic read errors that are enough to make me replace the drive without being enough to make the hardware ID it as bad. -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Expand zpool capacity
Yes, it is helpful in that it reviews all the steps needed to get the replacement disk labeled properly for a root pool and is identical to what we provide in the ZFS docs. The part that is not quite accurate is the reasons for having to relabel the replacement disk with the format utility. If the replacement disk had an identical slice 0 (same size or greater) with an SMI label then no need exists to relabel the disk. In this case, he could have just attached the replacement disk, installed the boot blocks, tested booting from the replacement disk, and detached the older disk. If replacement disk had an EFI label or no slice 0, or a slice 0 that is too small, then yes, you have to perform the format steps as described in this video. Thanks, Cindy On 04/26/10 08:24, Vladimir L. wrote: It's a litle while ago, but i've found a a href=http://www.youtube.com/watch?v=tpzsSptzmyA;pretty helpful video on YT/a how to completely migrate from one harddrive to another. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?
SAS: full duplex SATA: half duplex SAS: dual port SATA: single port (some enterprise SATA has dual port) SAS: 2 active channel - 2 concurrent write, or 2 read, or 1 write and 1 read SATA: 1 active channel - 1 read or 1 write SAS: Full error detection and recovery on both read and write SATA: error detection and recovery on write, only error detection on read If you connect only one disk per port, not a big deal. If you connect multiple disks to raid card, or through backplane, expander, SAS makes big difference on reliability. If I had the money, I always go with SAS. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Help:Is zfs-fuse's performance is not good
- Tonmaus sequoiamo...@gmx.net skrev: I wonder if this is the right place to ask, as the Filesystem in User Space implementation is a separate project. In Solaris ZFS runs in kernel. FUSE implementations are slow, no doubt. Same goes for other FUSE implementations, such as for NTFS. The classic answers from (open)solaris folks would be 'Why not run (open)solaris?' and 'why don't you just try it out yourself?' The zfs fuse project will give you most of the nice zfs stuff, but it probably won't give you the same performance. I don't think opensolaris has been compared to FUSE ZFS, but it might be interesting to see that. roy ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?
On Sun, Apr 25, 2010 at 10:02 PM, Dave Pooser dave@alfordmedia.com wrote: Assuming I'm going to be using three 8-drive RAIDz2 configurations, and further assuming this server will be used for backing up home directories (lots of small writes/reads), how much benefit will I see from the SAS interface? SAS drives are generally intended to be used in a multi-drive / RAID environment, and are delivered with TLER / CCTL / ERC enabled to prevent them from falling out of arrays when they hit a read error. SAS drives will generally have a longer warranty than desktop drives. The SMART command set in ATA-7 and the ATA-8 spec should eliminate the distinction, but until it's fully supported by manufacturers desktop drives may not degrade as gracefully in an array when hitting an error. From what I've read, both WD and Seagate desktop drives ignore the ERC command. Samsung drives are reported to work, and I'm not sure about Hitachi. So far as backplanes are concerned - You can connect the backplane with SAS and still use SATA drives. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?
- Brandon High bh...@freaks.com skrev: SAS drives are generally intended to be used in a multi-drive / RAID environment, and are delivered with TLER / CCTL / ERC enabled to prevent them from falling out of arrays when they hit a read error. SAS drives will generally have a longer warranty than desktop drives. With 2TB drives priced at €150 or lower, I somehow think paying for drive lifetime is far more expensive than getting a few more drives and add redundancy Just my 2c roy ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Solaris 10 and ZFS dedupe status
- Neil Simpson neil.simp...@sun.com skrev: I'm pretty sure Solaris 10 update 9 will have zpool version 22 so WILL have dedup. Interesting - from where do you have this information? roy ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?
On 4/26/10 10:10 AM, Richard Elling richard.ell...@gmail.com wrote: SAS shines with multiple connections to one or more hosts. Hence, SAS is quite popular when implementing HA clusters. So that would be how one builds something like the active/active controller failover in standalone RAID boxes. Is there a good resource on doing something like that with an OpenSolaris storage server? I could see that as a project I might want to attempt. -- Dave Pooser, ACSA Manager of Information Services Alford Media http://www.alfordmedia.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?
On Mon, 26 Apr 2010, Roy Sigurd Karlsbakk wrote: SAS drives will generally have a longer warranty than desktop drives. With 2TB drives priced at €150 or lower, I somehow think paying for drive lifetime is far more expensive than getting a few more drives and add redundancy This really depends on if you are willing to pay in advance, or pay after the failure. Even with redundancy, the cost of a failure may be high due to loss of array performance and system administration time. Array performance may go into the toilet during resilvers, depending on the redundancy configuration and the type of drives used. All types of drives fail but typical SATA drives fail more often. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files
I found the VHD specification here: http://download.microsoft.com/download/f/f/e/ffef50a5-07dd-4cf8-aaa3-442c0673a029/Virtual%20Hard%20Disk%20Format%20Spec_10_18_06.doc I am not sure if i understand it right, but it seems like data on disk gets compressed into the vhd (no empty space), so even a slight difference in the beginning of the file will slide through and ruin the pattern for block based dedup. As I am not an expert on file systems, someone with more expertise would be appreciated to look at this. Would be a real shame. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Help:Is zfs-fuse's performance is not good
On Mon, Apr 26, 2010 at 9:43 AM, Roy Sigurd Karlsbakk r...@karlsbakk.net wrote: The zfs fuse project will give you most of the nice zfs stuff, but it probably won't give you the same performance. I don't think opensolaris has been compared to FUSE ZFS, but it might be interesting to see that. AFAIK zfs-fuse hasn't been updated recently, so it's implementing a version of zfs from over a year ago. There have been numerous performance and stability improvements in that time. If you really, really want to use zfs and linux, run OpenSolaris and set up a linux xen or Virtualbox instance. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Making ZFS better: zfshistory
On 25 apr 2010, at 20.12, Richard Elling wrote: On Apr 25, 2010, at 5:45 AM, Edward Ned Harvey wrote: From: Richard Elling [mailto:richard.ell...@gmail.com] Sent: Saturday, April 24, 2010 7:42 PM Next, mv /a/e /a/E ls -l a/e/.snapshot/snaptime ENOENT? ls -l a/E/.snapshot/snapname/d.txt this should be ENOENT because d.txt did not exist in a/E at snaptime. Incorrect. E did exist. Inode 12345 existed, but it had a different name at the time of snapshot. Therefore, a/e/.snapshot/snapname/c/d.txt is the file at the time of snapshot. But these are also the same thing: a/E/.snapshot/snapname/c/d.txt a/E/c/.snapshot/snapname/d.txt OK, I'll believe you. How about this? mv a/E/c a/c mv a/E a/c mv a/c a/E now a/E/.snapshot/snapname/c/d.txt is ENOENT, correct? Sadly I can't test it myself right now, maybe someone else can, but I'd except: [start: we have a file: a/E/c/d.txt] [snap1] mv a/E/c a/c [snap2] mv a/E a/c mv a/c a/E would result in: a/.snapshot/snap1/E/c/d.txt a/.snapshot/snap2/E/ (empty) a/.snapshot/snap2/c/d.txt a/E/.snapshot/snap1/c/d.txt a/E/.snapshot/snap2/ (empty) a/E/ (empty) Wouldn't that be logical, and what would be the problem? It would be very annoying if you could have a directory named foo which contains all the snapshots for its own history, and then mv foo bar and suddenly the snapshots all disappear. This is not the behavior. The behavior is: If you mv foo bar then the snapshots which were previously accessible under foo are now accessible under bar. However, if you look in the snapshot of foo's parent, then you will see foo and not bar. Just the way it would have looked, at the time of the snapshot. The only way I know to describe this is that the path is lost. In other words, you cannot say ../.snapshot/snapname/self is the same as self/.snapshot/snapname, thus the relationship previously described as: Snapshots are taken. You can either file.txt via any of the following: /root/.snapshot/branch/leaf/file.txt /root/branch/.snapshot/leaf/file.txt /root/branch/leaf/.snapshot/file.txt is not guaranteed to be correct. No, not if the hierarchy is changed between the snapshots, I think it was just a way to illustrate how the .snapshot directories work. It isn't in zfs either, if the example above would be a zfs, we would have: a/.zfs/snapshot/snap1/E/c/d.txt a/.zfs/snapshot/snap2/c/d.txt a/E/ (empty) I still don't understand why the OnTap model is losing more paths than zfs. I'd be happy if you could take one more shot at explaining. /ragge ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to delegate zfs snapshot destroy to users?
Hi Cindy, The create-time permissions do not provide the correct permissions for destroying descendent datasets, such as clones. See example 9-5 in this section that describes how to use zfs allow -d option to grant permissions on descendent datasets: http://docs.sun.com/app/docs/doc/819-5461/gebxb?l=ena=view Ah I was missing the fact, that subsequent snapshots inherit access modes. So simple zfs allow -d -g staff mount,destroy rpool/vm fixed things for me. Thank you ! -- Vlad ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Making an rpool smaller?
On Fri, Apr 16, 2010 at 4:41 PM, Brandon High bh...@freaks.com wrote: When I set up my opensolaris system at home, I just grabbed a 160 GB drive that I had sitting around to use for the rpool. Just to follow up, after testing in Virtualbox, my initial plan is very close to what worked. This is what I did: 1. Shutdown the system and attach the new drives. 2. Reboot from LiveCD or USB installer. 3. Run 'format' to set up the new drive(s). 4. zpool create -f -R /mnt/rpool_new rpool_new ${NEWDRIVE_DEV}s0 5. zpool import -o ro -R /mnt/rpool_old -f rpool 6. zfs send all datasets from rpool to rpool_new 7. installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/${NEWDRIVE_DEV}s0 on the ssd 8. zfs mount rpool_new/ROOT/snv_133 and delete /mnt/rpool_new/etc/zfs/zpool.cache 9. zfs export the rpool and rpool_new 10. 'zpool import -R /mnt/rpool new_rpool rpool' to rename the pool. (Not needed except to be OCD) 11. 'zpool export rpool' 12. Disconnect the original drive and boot from your new root. After that, it just worked. I tested it again with a physical box that boots off of USB thumb drives as well. The only caveat with that is you must use 'format -e' to partition the thumb drives. Oh, and wait a LONG time, because most flash drives are really, really slow. You could also do this from a non-LiveCD environment, but the name rpool may already be in use. If you move the new drive to the original's port, you don't need to delete the zpool.cache. It would be nice if there was a boot flag you could use to ignore the zpool.cache so you don't have to boot into another environment when the device moves. Another benefit of doing the above is that you can enable compression and dedup on the rpool prior to the send, which gives you creamy compressed dedup goodness on your entire rpool. No matter how tempting, don't use gzip-9 compression. I learned the hard way that grub doesn't support it. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?
This really depends on if you are willing to pay in advance, or pay after the failure. Even with redundancy, the cost of a failure may be high due to loss of array performance and system administration time. Array performance may go into the toilet during resilvers, depending on the redundancy configuration and the type of drives used. All types of drives fail but typical SATA drives fail more often. Failure ratio does not depend on interface. Enterprise grade SATA drives have the same build quality as with their SAS brothers and sisters. With RAIDz2 or -3, you're quite sure things will work fine even after a disk failure, and the performance penalty isn't that bad. Choosing SAS over SATA for a single setup must be more of a religious approach roy ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?
On Mon, Apr 26, 2010 at 01:32:33PM -0500, Dave Pooser wrote: On 4/26/10 10:10 AM, Richard Elling richard.ell...@gmail.com wrote: SAS shines with multiple connections to one or more hosts. Hence, SAS is quite popular when implementing HA clusters. So that would be how one builds something like the active/active controller failover in standalone RAID boxes. Is there a good resource on doing something like that with an OpenSolaris storage server? I could see that as a project I might want to attempt. This is interesting. I have a two-node SPARC cluster that uses a multi-initiator SCSI array for shared storage. As an application server, it need only two disks in the array. They are a ZFS mirror. This all works quite nicely under Sun Cluster. I'd like to duplicate this configuration with two small x86 servers and a small SAS array, also with only two disks. It should be easy to find a pair of 1U servers, but what's the smallest SAS array that's available? Does it need an array controller? What's needed on the servers to connect to it? -- -Gary Mills--Unix Group--Computer and Network Services- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Spare in use althought disk is healthy ?
Hello list, a pool shows some strange status: volume: zfs01vol state: ONLINE scrub: scrub completed after 1h21m with 0 errors on Sat Apr 24 04:22:38 2010 config: NAME STATE READ WRITE CKSUM zfs01vol ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t4d0 ONLINE 0 0 0 c3t4d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t5d0 ONLINE 0 0 0 c3t5d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t8d0 ONLINE 0 0 0 c3t8d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t9d0 ONLINE 0 0 0 c3t9d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t12d0ONLINE 0 0 0 spare ONLINE 0 0 0 c3t12d0 ONLINE 0 0 0 c3t21d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t13d0ONLINE 0 0 0 c3t13d0ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t16d0ONLINE 0 0 0 c3t16d0ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t17d0ONLINE 0 0 0 c3t17d0ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t20d0ONLINE 0 0 0 c3t20d0ONLINE 0 0 0 logs ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c3t0d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 cache c0t0d0 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 spares c2t21d0 AVAIL c3t21d0 INUSE currently in use The spare is in use, altought there is no failed disk in the pool. Can anyone interpret this ? Is this a bug ? Thanks, Robert -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Spare in use althought disk is healthy ?
On 04/27/10 09:41 AM, Lutz Schumann wrote: Hello list, a pool shows some strange status: volume: zfs01vol state: ONLINE scrub: scrub completed after 1h21m with 0 errors on Sat Apr 24 04:22:38 mirror ONLINE 0 0 0 c2t12d0ONLINE 0 0 0 spare ONLINE 0 0 0 c3t12d0 ONLINE 0 0 0 c3t21d0 ONLINE 0 0 0 spares c2t21d0 AVAIL c3t21d0 INUSE currently in use The spare is in use, altought there is no failed disk in the pool. Can anyone interpret this ? Is this a bug ? Was the drive c3t12d0 replaced or faulty at some point? You should be able to detach the spare. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Roy Sigurd Karlsbakk With 2TB drives priced at €150 or lower, I somehow think paying for drive lifetime is far more expensive than getting a few more drives and add redundancy If you have a 48-disk enclosure, and you've configured 6x 8disk raid-6 or raidz2 volumes, how do you add more disks to increase redundancy? Point is: Adding disks often means adding slots, and since adding slots ain't free, it would generally translate not so much as adding slots, but decreasing the number of usable drive capacities... And keeping an inventory of offline spares for the sake of immediate replacement upon failure. Also, you'll only find the cheapest generic disks available at the stated price. If you have one of those disks fail 6 months from now, you will not be able to purchase that model drive again. (God forbid you should have to replace one 3 yrs from now, when the current implementation of SAS or SATA isn't even for sale anymore, and you can't even get a suitable equivalent replacement.) I hate it whenever people over-simplify and say disk is cheap. Also, if you've got all those disks in an array, and they're MTBF is ... let's say 25,000 hours ... then 3 yrs later when they begin to fail, they have a tendency to all fail around the same time, which increases the probability of exceeding your designed level of redundancy. I recently bought 2x 1Tb disks for my sun server, for $650 each. This was enough to make me do the analysis, why am I buying sun branded overpriced disks? Here is the abridged version: We recently had an Apple XRAID system lose a disk. It's 3 yrs old. It uses 500G ATA-133 disks, which are not available from anywhere at any price... Except Apple was willing to sell us one for $1018. Naturally, we declined to make that purchase. We did find some disks available from various sources, which should be equivalent, but not Apple branded or certified; functional equivalents but not identical. Prices around $200 to $300. I asked around, apple admins who had used generic disks in their Xraid systems. About 50% said they used generic disks with no problem. The other 50% were mixed between we used generic disks, seemed to work, but had strange problems like horrible performance or disk suddenly going offline and coming back online again spontaneously and we tried to use generic disks, but the system refused to even acknowledge the disk present in the system. Also, take a look in the present mailing list, many people complaining of drives with firmwares that incorrectly acknowledge cache flushes before they're actually flushed. Even then, we're talking about high end Intel SSD's. And the consequence of incorrect firmware is data loss. Maybe even pool loss. The reason why we pay for overpriced disks is to get the manufacturer's seal of approval, the Apple or Sun or Dell branded firmware. The availability of mfgr warranties, the long-term supportability. It costs about 4x-5x more per disk to buy up front, but since you have to buy 2x as many generic disks (for the sake of spare inventory availability) you're only paying 2x overall, and you can rest much more assured in the stability. Even at the higher hardware price, the value of the data is presumed to be much greater than the cost of the hardware. So then it's easy to justify higher cost hardware, with the belief it'll be somehow lower data risk. Sometimes people will opt for cheaper. Sometimes people will opt for lower risk. I just hate it when people oversimplify and say disk is cheap. That is so over simplified, it doesn't benefit anyone. end rant begin breathe ... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Spare in use althought disk is healthy ?
Hi Lutz, You can try the following commands to see what happened: 1. Someone else replaced the disk with a spare, which would be recorded in this command: # zpool history -l zfs01vol 2. If the disk had some transient outage then maybe the spare kicked in. Use the following command to see if something happened to this disk: # fmdump -eV This command might produce a lot of output, but look for c3t12d0 occurrences. 3. If the c3t12d0 disk is okay, try detaching the spare back to the spare pool like this: # zpool detach zfs01vol c3t21d0 Thanks, Cindy On 04/26/10 15:41, Lutz Schumann wrote: Hello list, a pool shows some strange status: volume: zfs01vol state: ONLINE scrub: scrub completed after 1h21m with 0 errors on Sat Apr 24 04:22:38 2010 config: NAME STATE READ WRITE CKSUM zfs01vol ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t4d0 ONLINE 0 0 0 c3t4d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t5d0 ONLINE 0 0 0 c3t5d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t8d0 ONLINE 0 0 0 c3t8d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t9d0 ONLINE 0 0 0 c3t9d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t12d0ONLINE 0 0 0 spare ONLINE 0 0 0 c3t12d0 ONLINE 0 0 0 c3t21d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t13d0ONLINE 0 0 0 c3t13d0ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t16d0ONLINE 0 0 0 c3t16d0ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t17d0ONLINE 0 0 0 c3t17d0ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t20d0ONLINE 0 0 0 c3t20d0ONLINE 0 0 0 logs ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c3t0d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 cache c0t0d0 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 spares c2t21d0 AVAIL c3t21d0 INUSE currently in use The spare is in use, altought there is no failed disk in the pool. Can anyone interpret this ? Is this a bug ? Thanks, Robert ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Disk/Partition replacement - do partition begin/end/offsets matter?
I went through with it and it worked fine. So, I could successfully move my ZFS device to the beginning of the new disk. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Data movement across filesystems within a pool
Could this be a future enhancement for ZFS? Like provide 'zfs move fs1/path1 fs2/path2', which will do the needful without really copying anything? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files
On Mon, Apr 26, 2010 at 8:51 AM, tim Kries tim.kr...@gmx.de wrote: I am kinda confused over the change of dedup ratio from changing the record size, since it should dedup 256-bit blocks. Dedup works on the blocks or either recordsize or volblocksize. The checksum is made per block written, and those checksums are used to dedup the data. With a recordsize of 128k, two blocks with a one byte difference would not dedup. With an 8k recordsize, 15 out of 16 blocks would dedup. Repeat over the entire VHD. Setting the record size equal to a multiple of the VHD's internal block size and ensuring that the internal filesystem is block aligned will probably help to improve dedup ratios. So for an NTFS guest with 4k blocks, use a 4k, 8k or 16k record size and ensure that when you install in the VHD that its partitions are block aligned for the recordsize you're using. VHD supports fixed size and dynamic size images. If you're using a fixed image, the space is pre-allocated. This doesn't mean you'll waste unused space on ZFS with compression, since all those zeros will take up almost no space. Your VHD file should remain block-aligned however. I'm not sure that a dynamic size image will block align if there is empty space. Using compress=zle will only compress the zeros with almost no cpu penalty. Using a COMSTAR iscsi volume is probably an even better idea, since you won't have the POSIX layer in the path, and you won't have the VHD file header throwing off your block alignment. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on drives for ZIL/L2ARC?
On Mon, Apr 26, 2010 at 8:01 AM, Travis Tabbal tra...@tabbal.net wrote: At the end of my OP I mentioned that I was interested in L2ARC for dedupe. It sounds like the DDT can get bigger than RAM and slow things to a crawl. Not that I expect a lot from using an HDD for that, but I thought it might help. I'd like to get a nice SSD or two for this stuff, but that's not in the budget right now. A large DDT will require a lot of random reads, which isn't an ideal use case for a spinning disk. Plus, 10k disks are loud and hot. You can get a 30-40gb ssd for about $100 these days. It doesn't matter if a disk for the L2ARC obeys cache flushing, etc. Regardless of whether the host is shutdown cleanly or not, the L2ARC starts cold. It doesn't matter if the data is corrupted, because a failed checksum will cause the pool to go back to the data disks. As far as using 10k disks for a slog, it depends on what kind of drives are in your pool and how it's laid out. If you have a wide raidz stripe on slow disks, just about anything will help. If you've got striped mirrors on fast disks, then it probably won't help much, especially for what sounds like a server with a small number of clients. I've got an OCZ Vertex 30gb drive with a 1GB stripe used for the slog and the rest used for the L2ARC, which for ~ $100 has been a nice boost to nfs writes. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?
On 26/04/10 03:02 PM, Dave Pooser wrote: I'm building another 24-bay rackmount storage server, and I'm considering what drives to put in the bays. My chassis is a Supermicro SC846A, so the backplane supports SAS or SATA; my controllers are LSI3081E, again supporting SAS or SATA. Looking at drives, Seagate offers an enterprise (Constellation) 2TB 7200RPM drive in both SAS and SATA configurations; the SAS model offers one quarter the buffer (16MB vs 64MB on the SATA model), the same rotational speed, and costs 10% more than its enterprise SATA twin. (They also offer a Barracuda XT SATA drive; it's roughly 20% less expensive than the Constellation drive, but rated at 60% the MTBF of the others and a predicted rate of nonrecoverable errors an order of magnitude higher.) Assuming I'm going to be using three 8-drive RAIDz2 configurations, and further assuming this server will be used for backing up home directories (lots of small writes/reads), how much benefit will I see from the SAS interface? I would expect to see the SAS drives have built-in support for multipathing, with no extra hardware required. Also, hear yourself chanting but SAS is more ENTERPRISEY over and over again :-) I don't know of any other specific difference between Enterprise SATA and SAS drives. James C. McPherson -- Senior Software Engineer, Solaris Oracle http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] rpool on ssd. endurance question.
Hello. If anybody uses SSD for rpool more than half-year, can you post SMART information about HostWrites attribute? I want to see how SSD wear for system disk purposes. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?
On Mon, Apr 26, 2010 at 10:02:42AM -0700, Chris Du wrote: SAS: full duplex SATA: half duplex SAS: dual port SATA: single port (some enterprise SATA has dual port) SAS: 2 active channel - 2 concurrent write, or 2 read, or 1 write and 1 read SATA: 1 active channel - 1 read or 1 write SAS: Full error detection and recovery on both read and write SATA: error detection and recovery on write, only error detection on read SAS: Full SCSI TCQ SATA: Lame ATA NCQ -- Dan. pgpfPAxGyNIbj.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] rpool on ssd. endurance question.
On 04/26/10 11:54 PM, Yuri Vorobyev wrote: Hello. If anybody uses SSD for rpool more than half-year, can you post SMART information about HostWrites attribute? I want to see how SSD wear for system disk purposes. I'd be happy to, exactly what commands shall I run? Paul ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss