Re: [zfs-discuss] zfs send/receive as backup - reliability?
On 20/01/2010 15:45, David Dyer-Bennet wrote: On Wed, January 20, 2010 09:23, Robert Milkowski wrote: Now you rsync all the data from your clients to a dedicated filesystem per client, then create a snapshot. Is there an rsync out there that can reliably replicate all file characteristics between two ZFS/Solaris systems? I haven't found one. The ZFS ACLs seem to be beyond all of them, in particular. No it doesn't support ZFS ACLs - fortunately it is not an issue for us. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/receive as backup - reliability?
On 20/01/2010 19:20, Ian Collins wrote: Julian Regel wrote: It is actually not that easy. Compare a cost of 2x x4540 with 1TB disks to equivalent solution on LTO. Each x4540 could be configured as: 4x 11 disks in raidz-2 + 2x hot spare + 2x OS disks. The four raidz2 group form a single pool. This would provide well over 30TB of logical storage per each box. Now you rsync all the data from your clients to a dedicated filesystem per client, then create a snapshot. All snapshots are replicated to a 2nd x4540 so even if you would loose entire box/data for some reason you would still have a spare copy. Now compare it to a cost of a library, lto drives, tapes, software + licenses, support costs, ... See more details at http://milek.blogspot.com/2009/12/my-presentation-at-losug.html I've just read your presentation Robert. Interesting stuff. I've also just done a pen and paper exercise to see how much 30TB of tape would cost as a comparison to your disk based solution. Using list prices from Sun's website (and who pays list..?), an SL48 with 2 x LTO3 drives would cost £14000. I couldn't see a price on an LTO4 equipped SL48 despite the Sun website saying it's a supported option. Each LTO3 has a native capacity of 300GB and the SL48 can hold up to 48 tapes in the library (14.4TB native per library). To match the 30TB in your solution, we'd need two libraries totalling £28000. You would also need 100 LTO3 tapes to provide 30TB of native storage. I recently bought a pack of 20 tapes for £340, so five packs would be £1700. So you could provision a tape backup for just under £3 (~$49000). In comparison, the cost of one X4540 with ~ 36TB usable storage is UK list price £30900. I've not factored in backup software since you could use an open source solution such as Amanda or Bacula. A more apples to apples comparison would be to compare the storage only. Both removable drive and tape options require a server with FC or SCSI ports, so that can be excluded from the comparison. I think one should actually compare whole solutions - including servers, fc infrastructure, tape drives, robots, software costs, rack space, ... Servers like x4540 are ideal for zfs+rsync backup solution - very compact, good $/GB ratio, enough CPU power for its capacity, allow to easily scale it horizontally, and it is not too small and not too big. Then thanks to its compactness they are very easy to administer. Depending on an anvironment one could deploy them always in paris - one in one datacenter and 2nd one in other datacanter with ZFS send based replication of all backups (snapshots). Or one may replicate (cross-replicate) only selected clients if needed. -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/receive as backup - reliability?
Robert Milkowski wrote: On 20/01/2010 19:20, Ian Collins wrote: Julian Regel wrote: It is actually not that easy. Compare a cost of 2x x4540 with 1TB disks to equivalent solution on LTO. Each x4540 could be configured as: 4x 11 disks in raidz-2 + 2x hot spare + 2x OS disks. The four raidz2 group form a single pool. This would provide well over 30TB of logical storage per each box. Now you rsync all the data from your clients to a dedicated filesystem per client, then create a snapshot. All snapshots are replicated to a 2nd x4540 so even if you would loose entire box/data for some reason you would still have a spare copy. Now compare it to a cost of a library, lto drives, tapes, software + licenses, support costs, ... See more details at http://milek.blogspot.com/2009/12/my-presentation-at-losug.html I've just read your presentation Robert. Interesting stuff. I've also just done a pen and paper exercise to see how much 30TB of tape would cost as a comparison to your disk based solution. Using list prices from Sun's website (and who pays list..?), an SL48 with 2 x LTO3 drives would cost £14000. I couldn't see a price on an LTO4 equipped SL48 despite the Sun website saying it's a supported option. Each LTO3 has a native capacity of 300GB and the SL48 can hold up to 48 tapes in the library (14.4TB native per library). To match the 30TB in your solution, we'd need two libraries totalling £28000. You would also need 100 LTO3 tapes to provide 30TB of native storage. I recently bought a pack of 20 tapes for £340, so five packs would be £1700. So you could provision a tape backup for just under £3 (~$49000). In comparison, the cost of one X4540 with ~ 36TB usable storage is UK list price £30900. I've not factored in backup software since you could use an open source solution such as Amanda or Bacula. A more apples to apples comparison would be to compare the storage only. Both removable drive and tape options require a server with FC or SCSI ports, so that can be excluded from the comparison. I think one should actually compare whole solutions - including servers, fc infrastructure, tape drives, robots, software costs, rack space, ... Servers like x4540 are ideal for zfs+rsync backup solution - very compact, good $/GB ratio, enough CPU power for its capacity, allow to easily scale it horizontally, and it is not too small and not too big. Then thanks to its compactness they are very easy to administer. Until you try to pick one up and put it in a fire safe! Depending on an anvironment one could deploy them always in paris - one in one datacenter and 2nd one in other datacanter with ZFS send based replication of all backups (snapshots). Or one may replicate (cross-replicate) only selected clients if needed. Yes, I agree. That's how my client's systems are configured (pairs). We also have another with an attached tape library. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4500...need input and clarity on striped/mirrored configuration
zpool create -f testpool mirror c0t0d0 c1t0d0 mirror c4t0d0 c6t0d0 mirror c0t1d0 c1t1d0 mirror c4t1d0 c5t1d0 mirror c6t1d0 c7t1d0 mirror c0t2d0 c1t2d0 mirror c4t2d0 c5t2d0 mirror c6t2d0 c7t2d0 mirror c0t3d0 c1t3d0 mirror c4t3d0 c5t3d0 mirror c6t3d0 c7t3d0 mirror c0t4d0 c1t4d0 mirror c4t4d0 c6t4d0 mirror c0t5d0 c1t5d0 mirror c4t5d0 c5t5d0 mirror c6t5d0 c7t5d0 mirror c0t6d0 c1t6d0 mirror c4t6d0 c5t6d0 mirror c6t6d0 c7t6d0 mirror c0t7d0 c1t7d0 mirror c4t7d0 c5t7d0 mirror c6t7d0 c7t7d0 mirror c7t0d0 c7t4d0 This looks good. But you probably want to stick a spare in there, and add a SSD disk specified by log ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4500...need input and clarity on striped/mirrored configuration
Zfs does not strictly support RAID 1+0. However, your sample command will create a pool based on mirror vdevs which is written to in a load-shared fashion (not striped). This type of pool is ideal for Although it's not technically striped according to the RAID definition of striping, it does achieve the same performance result (actually better) so people will generally refer to this as striping anyway. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4500...need input and clarity on striped/mirrored configuration
On Thursday 21 January 2010 10:29:16 Edward Ned Harvey wrote: zpool create -f testpool mirror c0t0d0 c1t0d0 mirror c4t0d0 c6t0d0 mirror c0t1d0 c1t1d0 mirror c4t1d0 c5t1d0 mirror c6t1d0 c7t1d0 mirror c0t2d0 c1t2d0 mirror c4t2d0 c5t2d0 mirror c6t2d0 c7t2d0 mirror c0t3d0 c1t3d0 mirror c4t3d0 c5t3d0 mirror c6t3d0 c7t3d0 mirror c0t4d0 c1t4d0 mirror c4t4d0 c6t4d0 mirror c0t5d0 c1t5d0 mirror c4t5d0 c5t5d0 mirror c6t5d0 c7t5d0 mirror c0t6d0 c1t6d0 mirror c4t6d0 c5t6d0 mirror c6t6d0 c7t6d0 mirror c0t7d0 c1t7d0 mirror c4t7d0 c5t7d0 mirror c6t7d0 c7t7d0 mirror c7t0d0 c7t4d0 This looks good. But you probably want to stick a spare in there, and add a SSD disk specified by log May I jump in here an ask how people are using SSDs relibly in a x4500? So far we had very little success with X25-E drives and a converter from 3.5 to 2.5 inches. So far two systems have shown pretty bad instabilities with that. Anyone with a success here? Cheers Carste ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4500...need input and clarity on striped/mirrored configuration
zpool create testpool disk1 disk2 disk3 In the traditional sense of RAID, this would create a concatenated data set. The size of the data set is the size of disk1 + disk2 + disk3. However, since this is ZFS, it's not constrained to linearly assigning virtual disk blocks to physical disk blocks ... ZFS will happily write a single large file to all 3 disks simultaneously and just keep track of where all the blocks landed. As a result, you get performance which is 3x a single disk for large files (like striping) but the performance for small files has not been harmed (as it is in striping)... As an added bonus, unlike striping, you can still just add more disks to your zpool, and expand your volume on the fly. The filesystem will dynamically adjust to accommodate more space and more devices, and will intelligently optimize for performance. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/receive as backup - reliability?
Robert Milkowski wrote: I think one should actually compare whole solutions - including servers, fc infrastructure, tape drives, robots, software costs, rack space, ... Servers like x4540 are ideal for zfs+rsync backup solution - very compact, good $/GB ratio, enough CPU power for its capacity, allow to easily scale it horizontally, and it is not too small and not too big. Then thanks to its compactness they are very easy to administer. Depending on an anvironment one could deploy them always in paris - one in one datacenter and 2nd one in other datacanter with ZFS send based replication of all backups (snapshots). Or one may replicate (cross-replicate) only selected clients if needed. Something else that often sells the 4500/4540 relates to internal company politics. Often, inside a company storage has to be provisioned from the company's storage group, using very expensive SAN based storage, indeed so expensive by the time the company's storage group have added their overhead onto the already expensive SAN, that whole projects become unviable. Instead, teams find they can order 4500/4540's which slip under the radar as servers (or even PCs), and they now have affordable storage for their projects, which makes them viable once more. -- Andrew ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4500...need input and clarity on striped/mirrored configuration
Can ASM match ZFS for checksum and self healing? The reason I ask is that the x45x0 uses inexpensive (less reluable) SATA drives. Even the J4xxx paper you cite uses SAS for production data (only using SATA for Oracle Flash, although I gave my concerns about that too). The thing is, ZFS and the x45x0 seem made for eachother. The latter only makes sense to me with all the goodness and assurance added by the former. Phil On 21 Jan 2010, at 02:58, John hort...@gmail.com wrote: Have you looked at using Oracle ASM instead of or with ZFS? Recent Sun docs concerning the F5100 seem to recommend a hybrid of both. If you don't go that route, generally you should separate redo logs from actual data so they don't compete for I/O, since a redo switch lagging hangs the database. If you use archive logs, separate that on to yet another pool. Realistically, it takes lots of analysis with different configurations. Every workload and database is different. A decent overview of configuring JBOD-type storage for databases is here, though it doesn't use ASM... https://www.sun.com/offers/docs/j4000_oracle_db.pdf It's a couple years old and that might contribute to the lack of an ASM mention. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/receive as backup - reliability?
On 21/01/2010 09:07, Ian Collins wrote: Robert Milkowski wrote: On 20/01/2010 19:20, Ian Collins wrote: Julian Regel wrote: It is actually not that easy. Compare a cost of 2x x4540 with 1TB disks to equivalent solution on LTO. Each x4540 could be configured as: 4x 11 disks in raidz-2 + 2x hot spare + 2x OS disks. The four raidz2 group form a single pool. This would provide well over 30TB of logical storage per each box. Now you rsync all the data from your clients to a dedicated filesystem per client, then create a snapshot. All snapshots are replicated to a 2nd x4540 so even if you would loose entire box/data for some reason you would still have a spare copy. Now compare it to a cost of a library, lto drives, tapes, software + licenses, support costs, ... See more details at http://milek.blogspot.com/2009/12/my-presentation-at-losug.html I've just read your presentation Robert. Interesting stuff. I've also just done a pen and paper exercise to see how much 30TB of tape would cost as a comparison to your disk based solution. Using list prices from Sun's website (and who pays list..?), an SL48 with 2 x LTO3 drives would cost £14000. I couldn't see a price on an LTO4 equipped SL48 despite the Sun website saying it's a supported option. Each LTO3 has a native capacity of 300GB and the SL48 can hold up to 48 tapes in the library (14.4TB native per library). To match the 30TB in your solution, we'd need two libraries totalling £28000. You would also need 100 LTO3 tapes to provide 30TB of native storage. I recently bought a pack of 20 tapes for £340, so five packs would be £1700. So you could provision a tape backup for just under £3 (~$49000). In comparison, the cost of one X4540 with ~ 36TB usable storage is UK list price £30900. I've not factored in backup software since you could use an open source solution such as Amanda or Bacula. A more apples to apples comparison would be to compare the storage only. Both removable drive and tape options require a server with FC or SCSI ports, so that can be excluded from the comparison. I think one should actually compare whole solutions - including servers, fc infrastructure, tape drives, robots, software costs, rack space, ... Servers like x4540 are ideal for zfs+rsync backup solution - very compact, good $/GB ratio, enough CPU power for its capacity, allow to easily scale it horizontally, and it is not too small and not too big. Then thanks to its compactness they are very easy to administer. Until you try to pick one up and put it in a fire safe! Then you backup to tape from x4540 whatever data you need. In case of enterprise products you save on licensing here as you need a one client license per x4540 but in fact can backup data from many clients which are there. :) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zones and other filesystems
I'm pretty new to opensolaris. I come from FreeBSD. Naturally, after using FreeBSD forr awhile i've been big on the use of FreeBSD jails so i just had to try zones. I've figured out how to get zones running but now i'm stuck and need help. Is there anything like nullfs in opensolaris... or maybe there is a more solaris way of doing what i need to do. Basically, what i'd like to do is give a specific zone access to 2 zfs filesystems which are available to the global zone. my new zones are in: /export/home/zone1 /export/home/zone2 What i'd like to do is give them access to: /tank/nas/Video /tank/nas/JeffB i'm sure i looked over something hugely easy and important...thanks. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/receive as backup - reliability?
Until you try to pick one up and put it in a fire safe! Then you backup to tape from x4540 whatever data you need. In case of enterprise products you save on licensing here as you need a one client license per x4540 but in fact can backup data from many clients which are there. Which brings up full circle... What do you then use to backup to tape bearing in mind that the Sun-provided tools all have significant limitations? I guess you need to use a third party tool and watch carefully that they provide complete backups. JR ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zones and other filesystems
Le 21 janv. 10 à 12:33, Thomas Burgess a écrit : I'm pretty new to opensolaris. I come from FreeBSD. Naturally, after using FreeBSD forr awhile i've been big on the use of FreeBSD jails so i just had to try zones. I've figured out how to get zones running but now i'm stuck and need help. Is there anything like nullfs in opensolaris... or maybe there is a more solaris way of doing what i need to do. Basically, what i'd like to do is give a specific zone access to 2 zfs filesystems which are available to the global zone. my new zones are in: /export/home/zone1 /export/home/zone2 the path of the root of your zone is not important for that feature. What i'd like to do is give them access to: /tank/nas/Video /tank/nas/JeffB with zonecfg, you can add a configuration like this one to your zone: add fs set dir=/some/path/Video set special=/tank/nas/Video set type=lofs end add fs set dir=/some/path/JeffB set special=/tank/nas/JeffB set type=lofs end Your filesystems will appear in /some/path/Video and /some/path/JeffB in your zone, and still be accessible in the global zone. http://docs.sun.com/app/docs/doc/817-1592/z.conf.start-29?a=view This option don't let you manage the filesystems from the zone though. You must use add dataset in that case. http://docs.sun.com/app/docs/doc/819-5461/gbbst?a=view Gaëtan -- Gaëtan Lehmann Biologie du Développement et de la Reproduction INRA de Jouy-en-Josas (France) tel: +33 1 34 65 29 66fax: 01 34 65 29 09 http://voxel.jouy.inra.fr http://www.itk.org http://www.mandriva.org http://www.bepo.fr PGP.sig Description: Ceci est une signature électronique PGP ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zones and other filesystems
the path of the root of your zone is not important for that feature. \ Ok, cool with zonecfg, you can add a configuration like this one to your zone: add fs set dir=/some/path/Video set special=/tank/nas/Video set type=lofs end add fs set dir=/some/path/JeffB set special=/tank/nas/JeffB set type=lofs end Thanks, i thought i read that this wouldn't work unless it was a legacy mount So i'll be able to access the filesystem from both the global zone and my new zone? Your filesystems will appear in /some/path/Video and /some/path/JeffB in your zone, and still be accessible in the global zone. http://docs.sun.com/app/docs/doc/817-1592/z.conf.start-29?a=view guess that answers that question =) Thanks, i'll try that. This option don't let you manage the filesystems from the zone though. You must use add dataset in that case. actually, this is GOOD, i don't WANT the zone to have the ability to change anything, just the ability to create new files. Thanks for the help. http://docs.sun.com/app/docs/doc/819-5461/gbbst?a=view Gaëtan -- Gaëtan Lehmann Biologie du Développement et de la Reproduction INRA de Jouy-en-Josas (France) tel: +33 1 34 65 29 66fax: 01 34 65 29 09 http://voxel.jouy.inra.fr http://www.itk.org http://www.mandriva.org http://www.bepo.fr ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4500...need input and clarity on striped/mirrored configuration
No. But, that's where the hybrid solution comes in. ASM would be used for the database files and ZFS for the redo/archive logs and undo. Corrupt blocks in the datafiles would be repaired with data from redo during a recovery, and ZFS should give you assurance that the redo didn't get corrupted. Sun's docs on the F5100 point to this as the best solution for performance and recoverability/reliability. Message was edited by: hortnon -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zones and other filesystems
now i'm stuck again.sorry to clog the tubes with my nubishness. i can't seem to create users inside the zonei'm sure it's due to zfs privelages somewhere but i'm not exactly sure how to fix iti dont' mind if i need to manage the zfs filesystem outside of the zone, i'm just not sure WHERE i'm supposed to do it when i try to create a home dir i get this: mkdir: Failed to make directory wonslung; Operation not applicable when i try to do it via adduser i get this: UX: useradd: ERROR: Unable to create the home directory: Operation not applicable. and when i try to enter the zone home dir from the global zone i get this, even as root: bash: cd: home: Not owner have i seriously screwed up or did i again miss something vital. thanks again. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zones and other filesystems
Le 21 janv. 10 à 14:14, Thomas Burgess a écrit : now i'm stuck again.sorry to clog the tubes with my nubishness. i can't seem to create users inside the zonei'm sure it's due to zfs privelages somewhere but i'm not exactly sure how to fix iti dont' mind if i need to manage the zfs filesystem outside of the zone, i'm just not sure WHERE i'm supposed to do it when i try to create a home dir i get this: mkdir: Failed to make directory wonslung; Operation not applicable when i try to do it via adduser i get this: UX: useradd: ERROR: Unable to create the home directory: Operation not applicable. and when i try to enter the zone home dir from the global zone i get this, even as root: bash: cd: home: Not owner have i seriously screwed up or did i again miss something vital. Maybe it's because of the automounter. If you don't need that feature, try to disable it in your zone with svcadm disable autofs Gaëtan -- Gaëtan Lehmann Biologie du Développement et de la Reproduction INRA de Jouy-en-Josas (France) tel: +33 1 34 65 29 66fax: 01 34 65 29 09 http://voxel.jouy.inra.fr http://www.itk.org http://www.mandriva.org http://www.bepo.fr PGP.sig Description: Ceci est une signature électronique PGP ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zones and other filesystems
hrm...that seemed to work...i'm so new to solarisit's SO different...what exactly did i just disable? Does that mount nfs shares or something? why should that prevent me from creating home directories? thanks 2010/1/21 Gaëtan Lehmann gaetan.lehm...@jouy.inra.fr Le 21 janv. 10 à 14:14, Thomas Burgess a écrit : now i'm stuck again.sorry to clog the tubes with my nubishness. i can't seem to create users inside the zonei'm sure it's due to zfs privelages somewhere but i'm not exactly sure how to fix iti dont' mind if i need to manage the zfs filesystem outside of the zone, i'm just not sure WHERE i'm supposed to do it when i try to create a home dir i get this: mkdir: Failed to make directory wonslung; Operation not applicable when i try to do it via adduser i get this: UX: useradd: ERROR: Unable to create the home directory: Operation not applicable. and when i try to enter the zone home dir from the global zone i get this, even as root: bash: cd: home: Not owner have i seriously screwed up or did i again miss something vital. Maybe it's because of the automounter. If you don't need that feature, try to disable it in your zone with svcadm disable autofs Gaëtan -- Gaëtan Lehmann Biologie du Développement et de la Reproduction INRA de Jouy-en-Josas (France) tel: +33 1 34 65 29 66fax: 01 34 65 29 09 http://voxel.jouy.inra.fr http://www.itk.org http://www.mandriva.org http://www.bepo.fr ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zones and other filesystems
Thomas, If you're trying to make user home directories on your local machine in /home, you have to watch out because the initial Solaris config assumes that you're in an enterprise environment and the convention is to have a filer somewhere that serves everyone's home directories which, with the default automount config, get mounted onto your machine's /home. Personally, when setting up a standalone box, I don't put home directories in /home just to avoid clobbering enterprise unix conventions. Gaëtan gave you the quick solution of just shutting off the automounter, which allows you to avoid addressing the problem this time around. --jake Thomas Burgess wrote: hrm...that seemed to work...i'm so new to solarisit's SO different...what exactly did i just disable? Does that mount nfs shares or something? why should that prevent me from creating home directories? thanks 2010/1/21 Gaëtan Lehmann gaetan.lehm...@jouy.inra.fr mailto:gaetan.lehm...@jouy.inra.fr Le 21 janv. 10 à 14:14, Thomas Burgess a écrit : now i'm stuck again.sorry to clog the tubes with my nubishness. i can't seem to create users inside the zonei'm sure it's due to zfs privelages somewhere but i'm not exactly sure how to fix iti dont' mind if i need to manage the zfs filesystem outside of the zone, i'm just not sure WHERE i'm supposed to do it when i try to create a home dir i get this: mkdir: Failed to make directory wonslung; Operation not applicable when i try to do it via adduser i get this: UX: useradd: ERROR: Unable to create the home directory: Operation not applicable. and when i try to enter the zone home dir from the global zone i get this, even as root: bash: cd: home: Not owner have i seriously screwed up or did i again miss something vital. Maybe it's because of the automounter. If you don't need that feature, try to disable it in your zone with svcadm disable autofs Gaëtan -- Gaëtan Lehmann Biologie du Développement et de la Reproduction INRA de Jouy-en-Josas (France) tel: +33 1 34 65 29 66fax: 01 34 65 29 09 http://voxel.jouy.inra.fr http://www.itk.org http://www.mandriva.org http://www.bepo.fr ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zones and other filesystems
ahh, On Thu, Jan 21, 2010 at 8:55 AM, Jacob Ritorto jacob.rito...@gmail.comwrote: Thomas, If you're trying to make user home directories on your local machine in /home, you have to watch out because the initial Solaris config assumes that you're in an enterprise environment and the convention is to have a filer somewhere that serves everyone's home directories which, with the default automount config, get mounted onto your machine's /home. Personally, when setting up a standalone box, I don't put home directories in /home just to avoid clobbering enterprise unix conventions. Gaëtan gave you the quick solution of just shutting off the automounter, which allows you to avoid addressing the problem this time around. --jake yes, i just realized thisi feel quite silly now. I'm not used to the whole /home vs /export/home difference and when you add zones to the mix it's quite confusing. I'm just playing around with this zone.to learn but in the next REAL zone i'll probably: mount the home directories from the base system (this machine itself IS a file server, and the zone i intend to config will be a ftp server and possible a bit torrent client) or create a couple stand alone users which AREN't in /home This makes a lot more sense nowI also forgot to set a default router in my zone so i can't even connect to the internet right now.. When i edit it with zonecfg can i just do: add net set defrouter=192.168.1.1** end Thanks again ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zones and other filesystems
add net set defrouter=192.168.1.1** end Thanks again I must be doing something wrong...i can access the zone on my network but i can't for the life of me get the zone to access the internet I'm googling like crazy but maybe someone here knows what i'm doing wrong. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Does OpenSolaris mpt driver support LSI 2008 controller ?
Does anyone know if the current OpenSolaris mpt driver supports the recent LSI SAS2008 controller? This controller/ASIC is used in the next generation SAS-2 6Gbps PCIe cards from LSI and SuperMicro etc, e.g.: 1. SuperMicro AOC-USAS2-L8e and the AOC-USAS2-L8i 2. LSI SAS 9211-8i Cheers, Simon http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/ -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zones and other filesystems
Thomas Burgess wrote: I'm not used to the whole /home vs /export/home difference and when you add zones to the mix it's quite confusing. I'm just playing around with this zone.to learn but in the next REAL zone i'll probably: mount the home directories from the base system (this machine itself IS a file server, and the zone i intend to config will be a ftp server and possible a bit torrent client) or create a couple stand alone users which AREN't in /home This makes a lot more sense nowI also forgot to set a default router in my zone so i can't even connect to the internet right now.. When i edit it with zonecfg can i just do: add net set defrouter=192.168.1.1** end OK, so if you're the filer too, the automount system still works for you the same as it does for all other machines using automount - it'll nfs mount to itself, etc. Check out and follow the convention if you're so inclined. Then of course, it helps to become a nis or ldap expert too, which is a bit much to chew on if you're just here to check out zones, so your simplification above is fine, as is Gaëtan's original recommendation... At least until your network grows to the point that you start to notice the home dir chaos and can't hit nfs shares at will.. Then you have to go back and undo all your automount breakage. And yes, your zonecfg tweak should do the trick. But you don't have to take my word for it -- the experts hang out in zones-discuss ;) http://mail.opensolaris.org/mailman/listinfo/zones-discuss ttyl jake ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Degraded Zpool
Hi list, I have a serious issue with my zpool. My zpool consists of 4 vdevs which are assembled to 2 mirrors. One of this mirrors got degraded cause of too many errors on each vdev of the mirror. Yes, both vdevs of the mirror got degraded. According to murphys law I don't have a backup as well (I have a backup, which was made several months ago, and some backups spread across several disks) Both of these backups are not the best so I want to access my data on the zpool so I can make a backup and replace the opensolaris server. As the two faulted vdevs are connected to different controllers I assume that the problem is located on the server, not on the harddisks/ controllersone of the faulted harddisks has been replaced some weeks ago due to crc errors, so I assume the server is bad, not the disks/cables/controllers. My state is as follows: NAMESTATE READ WRITE CKSUM performance DEGRADED 0 0 8 mirrorDEGRADED 0 016 c1t1d0 DEGRADED 0 023 too many errors c2d0DEGRADED 0 024 too many errors mirrorONLINE 0 0 0 c1t2d0 ONLINE 0 0 7 c3d0ONLINE 0 0 7 The disks (at least the c1t1/2 ones,I don't see the c2/3 ones via cfgadm) are online as fare I can see via cfgadm. Is there a possibility to force the two degraded vdevs online, so I can fully acces the zpool and do a backup? I wanted to ask first, before doing any stupid things and to loose the whole pool. Any suggestions are very welcome. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4500...need input and clarity on striped/mirrored configuration
On Thu, 21 Jan 2010, Edward Ned Harvey wrote: Although it's not technically striped according to the RAID definition of striping, it does achieve the same performance result (actually better) so people will generally refer to this as striping anyway. People will say a lot of things, but that does not make them right. At some point, using the wrong terminology becomes foolish and counterproductive. Striping and load-share seem quite different to me. The difference is immediately apparent when watching the drive activity LEDs. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Dedup memory overhead
Hi all, I'm going to be trying out some tests using b130 for dedup on a server with about 1,7Tb of useable storage (14x146 in two raidz vdevs of 7 disks). What I'm trying to get a handle on is how to estimate the memory overhead required for dedup on that amount of storage. From what I gather, the dedup hash keys are held in ARC and L2ARC and as such are in competition for the available memory. So the question is how much memory or L2ARC would be necessary to ensure that I'm never going back to disk to read out the hash keys. Better yet would be some kind of algorithm for calculating the overhead. eg - averaged block size of 4K = a hash key for every 4k stored and a hash occupies 256 bits. An associated question is then how does the ARC handle competition between hash keys and regular ARC functions? Based on these estimations, I think that I should be able to calculate the following: 1,7 TB 1740,8 GB 1782579,2 MB 1825361100,8KB 4 average block size 456340275,2 blocks 256 hash key size-bits 1,16823E+11 hash key overhead - bits 1460206,4 hash key size-bytes 14260633,6 hash key size-KB 13926,4 hash key size-MB 13,6hash key overhead-GB Of course the big question on this will be the average block size - or better yet - to be able to analyze an existing datastore to see just how many blocks it uses and what is the current distribution of different block sizes. I'm currently playing around with zdb with mixed success on extracting this kind of data. That's also a worst case scenario since it's counting really small blocks and using 100% of available storage - highly unlikely. # zdb -ddbb siovale/iphone Dataset siovale/iphone [ZPL], ID 2381, cr_txg 3764691, 44.6G, 99 objects ZIL header: claim_txg 0, claim_blk_seq 0, claim_lr_seq 0 replay_seq 0, flags 0x0 Object lvl iblk dblk dsize lsize %full type 0716K16K 57.0K64K 77.34 DMU dnode 1116K 1K 1.50K 1K 100.00 ZFS master node 2116K512 1.50K512 100.00 ZFS delete queue 3216K16K 18.0K32K 100.00 ZFS directory 4316K 128K 408M 408M 100.00 ZFS plain file 5116K16K 3.00K16K 100.00 FUID table 6116K 4K 4.50K 4K 100.00 ZFS plain file 7116K 6.50K 6.50K 6.50K 100.00 ZFS plain file 8316K 128K 952M 952M 100.00 ZFS plain file 9316K 128K 912M 912M 100.00 ZFS plain file 10316K 128K 695M 695M 100.00 ZFS plain file 11316K 128K 914M 914M 100.00 ZFS plain file Now, if I'm understanding this output properly, object 4 is composed of 128KB blocks with a total size of 408MB, meaning that it uses 3264 blocks. Can someone confirm (or correct) that assumption? Also, I note that each object (as far as my limited testing has shown) has a single block size with no internal variation. Interestingly, all of my zvols seem to use fixed size blocks - that is, there is no variation in the block sizes - they're all the size defined on creation with no dynamic block sizes being used. I previously thought that the -b option set the maximum size, rather than fixing all blocks. Learned something today :-) # zdb -ddbb siovale/testvol Dataset siovale/testvol [ZVOL], ID 45, cr_txg 4717890, 23.9K, 2 objects Object lvl iblk dblk dsize lsize %full type 0716K16K 21.0K16K6.25 DMU dnode 1116K64K 064K0.00 zvol object 2116K512 1.50K512 100.00 zvol prop # zdb -ddbb siovale/tm-media Dataset siovale/tm-media [ZVOL], ID 706, cr_txg 4426997, 240G, 2 objects ZIL header: claim_txg 0, claim_blk_seq 0, claim_lr_seq 0 replay_seq 0, flags 0x0 Object lvl iblk dblk dsize lsize %full type 0716K16K 21.0K16K6.25 DMU dnode 1516K 8K 240G 250G 97.33 zvol object 2116K512 1.50K512 100.00 zvol prop ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New Supermicro SAS/SATA controller: AOC-USAS2-L8e in SOHO NAS and HD HT
That looks promising. As the main thing here is that OpenSolaris supports the LSI SAS2008 controller, I have created a new post to ask for confirmation of driver support -- see here: http://opensolaris.org/jive/thread.jspa?threadID=122156tstart=0 Cheers, Simon http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/ -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS filesystem lock after running auto-replicate.ksh - how to clear?
Hi, I found this script for replicating zfs data: http://www.infrageeks.com/groups/infrageeks/wiki/8fb35/zfs_autoreplicate_script.html - I am testing it out in the lab with b129. It error-ed out the first run with some syntax error about the send component (recursive needed?) But I have not been able to run it again - it says the destination filesystem is locked: g...@lab-zfs-01:~ 10:50am 3 # ./auto-replicate.ksh data1/vms data1 lab-zfs-02 Destination filesystem data1/vms exists Filesystem locked, quitting: data1/vms g...@lab-zfs-01:~ 10:50am 4 How do I clear the lock - I have not been able to find documentation on this... thanks! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/receive as backup - reliability?
On Jan 21, 2010, at 3:55 AM, Julian Regel wrote: Until you try to pick one up and put it in a fire safe! Then you backup to tape from x4540 whatever data you need. In case of enterprise products you save on licensing here as you need a one client license per x4540 but in fact can backup data from many clients which are there. Which brings up full circle... What do you then use to backup to tape bearing in mind that the Sun-provided tools all have significant limitations? Poor choice of words. Sun resells NetBackup and (IIRC) that which was formerly called NetWorker. Thus, Sun does provide enterprise backup solutions. If I may put on my MBA hat, the competition is not ufsdump. ufsdump has nearly zero market penetration and no prospects for improving its market share. Making another ufsdump will also gain no market share. The market leaders are the likes of EMC, IBM, and Symantec with their heterogenous backup support. If Sun wanted to provide a better solution that might gain market share against the others, then it would also need to be heterogenous. So I think it would be hard to make a business case for a whole new backup solution. A less costly and less risky approach is to work with the market leaders to better integrate with dataset replication. Caveat: this may already be available, I haven't looked recently. I guess you need to use a third party tool and watch carefully that they provide complete backups. This is a good idea anyway. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool
On Jan 20, 2010, at 4:17 PM, Daniel Carosone wrote: On Wed, Jan 20, 2010 at 03:20:20PM -0800, Richard Elling wrote: Though the ARC case, PSARC/2007/618 is unpublished, I gather from googling and the source that L2ARC devices are considered auxiliary, in the same category as spares. If so, then it is perfectly reasonable to expect that it gets picked up regardless of the GUID. This also implies that it is shareable between pools until assigned. Brief testing confirms this behaviour. I learn something new every day :-) So, I suspect Lutz sees a race when both pools are imported onto one node. This still makes me nervous though... Yes. What if device reconfiguration renumbers my controllers, will l2arc suddenly start trashing a data disk? The same problem used to be a risk for swap, but less so now that we swap to named zvol. This will not happen unless the labels are rewritten on your data disk, and if that occurs, all bets are off. There's work afoot to make l2arc persistent across reboot, which implies some organised storage structure on the device. Fixing this shouldn't wait for that. Upon further review, the ruling on the field is confirmed ;-) The L2ARC is shared amongst pools just like the ARC. What is important is that at least one pool has a cache vdev. I suppose one could make the case that a new command is needed in addition to zpool and zfs (!) to manage such devices. But perhaps we can live with the oddity for a while? As such, for Lutz's configuration, I am now less nervous. If I understand correctly, you could add the cache vdev to rpool and forget about how it works with the shared pools. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup memory overhead
On Jan 21, 2010, at 8:04 AM, erik.ableson wrote: Hi all, I'm going to be trying out some tests using b130 for dedup on a server with about 1,7Tb of useable storage (14x146 in two raidz vdevs of 7 disks). What I'm trying to get a handle on is how to estimate the memory overhead required for dedup on that amount of storage. From what I gather, the dedup hash keys are held in ARC and L2ARC and as such are in competition for the available memory. ... and written to disk, of course. For ARC sizing, more is always better. So the question is how much memory or L2ARC would be necessary to ensure that I'm never going back to disk to read out the hash keys. Better yet would be some kind of algorithm for calculating the overhead. eg - averaged block size of 4K = a hash key for every 4k stored and a hash occupies 256 bits. An associated question is then how does the ARC handle competition between hash keys and regular ARC functions? AFAIK, there is no special treatment given to the DDT. The DDT is stored like other metadata and (currently) not easily accounted for. Also the DDT keys are 320 bits. The key itself includes the logical and physical block size and compression. The DDT entry is even larger. I think it is better to think of the ARC as caching the uncompressed DDT blocks which were written to disk. The number of these will be data dependent. zdb -S poolname will give you an idea of the number of blocks and how well dedup will work on your data, but that means you already have the data in a pool. -- richard Based on these estimations, I think that I should be able to calculate the following: 1,7 TB 1740,8GB 1782579,2 MB 1825361100,8 KB 4 average block size 456340275,2 blocks 256 hash key size-bits 1,16823E+11 hash key overhead - bits 1460206,4 hash key size-bytes 14260633,6hash key size-KB 13926,4 hash key size-MB 13,6 hash key overhead-GB Of course the big question on this will be the average block size - or better yet - to be able to analyze an existing datastore to see just how many blocks it uses and what is the current distribution of different block sizes. I'm currently playing around with zdb with mixed success on extracting this kind of data. That's also a worst case scenario since it's counting really small blocks and using 100% of available storage - highly unlikely. # zdb -ddbb siovale/iphone Dataset siovale/iphone [ZPL], ID 2381, cr_txg 3764691, 44.6G, 99 objects ZIL header: claim_txg 0, claim_blk_seq 0, claim_lr_seq 0 replay_seq 0, flags 0x0 Object lvl iblk dblk dsize lsize %full type 0716K16K 57.0K64K 77.34 DMU dnode 1116K 1K 1.50K 1K 100.00 ZFS master node 2116K512 1.50K512 100.00 ZFS delete queue 3216K16K 18.0K32K 100.00 ZFS directory 4316K 128K 408M 408M 100.00 ZFS plain file 5116K16K 3.00K16K 100.00 FUID table 6116K 4K 4.50K 4K 100.00 ZFS plain file 7116K 6.50K 6.50K 6.50K 100.00 ZFS plain file 8316K 128K 952M 952M 100.00 ZFS plain file 9316K 128K 912M 912M 100.00 ZFS plain file 10316K 128K 695M 695M 100.00 ZFS plain file 11316K 128K 914M 914M 100.00 ZFS plain file Now, if I'm understanding this output properly, object 4 is composed of 128KB blocks with a total size of 408MB, meaning that it uses 3264 blocks. Can someone confirm (or correct) that assumption? Also, I note that each object (as far as my limited testing has shown) has a single block size with no internal variation. Interestingly, all of my zvols seem to use fixed size blocks - that is, there is no variation in the block sizes - they're all the size defined on creation with no dynamic block sizes being used. I previously thought that the -b option set the maximum size, rather than fixing all blocks. Learned something today :-) # zdb -ddbb siovale/testvol Dataset siovale/testvol [ZVOL], ID 45, cr_txg 4717890, 23.9K, 2 objects Object lvl iblk dblk dsize lsize %full type 0716K16K 21.0K16K6.25 DMU dnode 1116K64K 064K0.00 zvol object 2116K512 1.50K512 100.00 zvol prop # zdb -ddbb siovale/tm-media Dataset siovale/tm-media [ZVOL], ID 706, cr_txg 4426997, 240G, 2 objects ZIL header: claim_txg 0, claim_blk_seq 0, claim_lr_seq 0 replay_seq 0, flags 0x0 Object lvl iblk dblk dsize lsize %full type 0716K16K 21.0K16K6.25 DMU dnode 1516K 8K 240G 250G 97.33 zvol object 2116K512 1.50K512 100.00 zvol prop ___
Re: [zfs-discuss] zfs send/receive as backup - reliability?
Julian Regel wrote: Until you try to pick one up and put it in a fire safe! Then you backup to tape from x4540 whatever data you need. In case of enterprise products you save on licensing here as you need a one client license per x4540 but in fact can backup data from many clients which are there. Which brings up full circle... What do you then use to backup to tape bearing in mind that the Sun-provided tools all have significant limitations? In addition to Richard's comments, I doubt many medium to large businesses would use ufsdump/restore as their backup solution. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Does OpenSolaris mpt driver support LSI 2008 controller ?
On 22/01/10 12:28 AM, Simon Breden wrote: Does anyone know if the current OpenSolaris mpt driver supports the recent LSI SAS2008 controller? This controller/ASIC is used in the next generation SAS-2 6Gbps PCIe cards from LSI and SuperMicro etc, e.g.: 1. SuperMicro AOC-USAS2-L8e and the AOC-USAS2-L8i 2. LSI SAS 9211-8i No, the 2nd generation non-RAID LSI SAS controllers make use of the mpt_sas(7d). Second generation RAID LSI SAS controllers use mr_sas(7d). Code for both of these drivers is Open and you can find it on src.opensolaris.org. James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] 2gig file limit on ZFS?
Hi Folks, Situation, 64 bit Open Solaris on AMD. 2009-6 111b - I can't successfully update the OS. I've got three external 1.5 Tb drives in a raidz pool connected via USB. Hooked on to an IDE channel is a 750gig hard drive that I'm copying the data off. It is an ext3 drive from an Ubuntu server. Copying is being done on the machine using the cp command as root. So far, two files have failed... /mirror2/applications/Microsoft/Operating Systems/Virtual PC/vm/XP-SP2/XP-SP2 Hard Disk.vhd: File too large /mirror2/applications/virtualboximages/xp/xp.tar.bz2: File too large The files are... -rwxr-x--- 1 adminapplications 4177570654 Nov 4 08:02 xp.tar.bz2 -rwxr-x--- 1 adminapplications 2582259712 Feb 14 2007 XP-SP2 Hard Disk.vhd The system is a home server and contains files of all types and sizes. Any ideas please? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 2gig file limit on ZFS?
CC'ed to ext3-disc...@opensolaris.org because this is an ext3 on Solaris issue. ZFS has no problem with large files, but the older ext3 did. See also the ext3 project page and documentation, especially http://hub.opensolaris.org/bin/view/Project+ext3/Project_status -- richard On Jan 21, 2010, at 11:58 AM, Michelle Knight wrote: Hi Folks, Situation, 64 bit Open Solaris on AMD. 2009-6 111b - I can't successfully update the OS. I've got three external 1.5 Tb drives in a raidz pool connected via USB. Hooked on to an IDE channel is a 750gig hard drive that I'm copying the data off. It is an ext3 drive from an Ubuntu server. Copying is being done on the machine using the cp command as root. So far, two files have failed... /mirror2/applications/Microsoft/Operating Systems/Virtual PC/vm/XP-SP2/XP-SP2 Hard Disk.vhd: File too large /mirror2/applications/virtualboximages/xp/xp.tar.bz2: File too large The files are... -rwxr-x--- 1 adminapplications 4177570654 Nov 4 08:02 xp.tar.bz2 -rwxr-x--- 1 adminapplications 2582259712 Feb 14 2007 XP-SP2 Hard Disk.vhd The system is a home server and contains files of all types and sizes. Any ideas please? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Does OpenSolaris mpt driver support LSI 2008 controller
Thanks a lot for the info James. For the benefit of myself and others then: 1. mpt_sas driver is used for the SuperMicro AOC-USAS2-L8e 2. mr_sas driver is used for the SuperMicro AOC-USAS2-L8i and LSI SAS 9211-8i And how does the maturity/robustness of the mpt_sas mr_sas drivers compare to the mpt driver which I'm currently using for my LSI 1068-based AOC-USAS-L8i card? (in the default IT mode) It might be hard to answer that one, but I thought I'd ask anyway, as it would make choosing new kit for OpenSolaris + ZFS a bit easier. Cheers, Simon http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/ -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup memory overhead
On Thu, Jan 21, 2010 at 10:00 PM, Richard Elling richard.ell...@gmail.com wrote: On Jan 21, 2010, at 8:04 AM, erik.ableson wrote: Hi all, I'm going to be trying out some tests using b130 for dedup on a server with about 1,7Tb of useable storage (14x146 in two raidz vdevs of 7 disks). What I'm trying to get a handle on is how to estimate the memory overhead required for dedup on that amount of storage. From what I gather, the dedup hash keys are held in ARC and L2ARC and as such are in competition for the available memory. ... and written to disk, of course. For ARC sizing, more is always better. So the question is how much memory or L2ARC would be necessary to ensure that I'm never going back to disk to read out the hash keys. Better yet would be some kind of algorithm for calculating the overhead. eg - averaged block size of 4K = a hash key for every 4k stored and a hash occupies 256 bits. An associated question is then how does the ARC handle competition between hash keys and regular ARC functions? AFAIK, there is no special treatment given to the DDT. The DDT is stored like other metadata and (currently) not easily accounted for. Also the DDT keys are 320 bits. The key itself includes the logical and physical block size and compression. The DDT entry is even larger. Looking at dedupe code, I noticed that on-disk DDT entries are compressed less efficiently than possible: key is not compressed at all (I'd expect roughly 2:1 compression ration with sha256 data), while other entry data is currently passed through zle compressor only (I'd expect this one to be less efficient than off-the-shelf compressors, feel free to correct me if I'm wrong). Is this v1, going to be improved in the future? Further, with huge dedupe memory footprint and heavy performance impact when DDT entries need to be read from disk, it might be worthwhile to consider compression of in-core ddt entries (specifically for DDTs or, more generally, making ARC/L2ARC compression-aware). Has this been considered? Regards, Andrey I think it is better to think of the ARC as caching the uncompressed DDT blocks which were written to disk. The number of these will be data dependent. zdb -S poolname will give you an idea of the number of blocks and how well dedup will work on your data, but that means you already have the data in a pool. -- richard Based on these estimations, I think that I should be able to calculate the following: 1,7 TB 1740,8 GB 1782579,2 MB 1825361100,8 KB 4 average block size 456340275,2 blocks 256 hash key size-bits 1,16823E+11 hash key overhead - bits 1460206,4 hash key size-bytes 14260633,6 hash key size-KB 13926,4 hash key size-MB 13,6 hash key overhead-GB Of course the big question on this will be the average block size - or better yet - to be able to analyze an existing datastore to see just how many blocks it uses and what is the current distribution of different block sizes. I'm currently playing around with zdb with mixed success on extracting this kind of data. That's also a worst case scenario since it's counting really small blocks and using 100% of available storage - highly unlikely. # zdb -ddbb siovale/iphone Dataset siovale/iphone [ZPL], ID 2381, cr_txg 3764691, 44.6G, 99 objects ZIL header: claim_txg 0, claim_blk_seq 0, claim_lr_seq 0 replay_seq 0, flags 0x0 Object lvl iblk dblk dsize lsize %full type 0 7 16K 16K 57.0K 64K 77.34 DMU dnode 1 1 16K 1K 1.50K 1K 100.00 ZFS master node 2 1 16K 512 1.50K 512 100.00 ZFS delete queue 3 2 16K 16K 18.0K 32K 100.00 ZFS directory 4 3 16K 128K 408M 408M 100.00 ZFS plain file 5 1 16K 16K 3.00K 16K 100.00 FUID table 6 1 16K 4K 4.50K 4K 100.00 ZFS plain file 7 1 16K 6.50K 6.50K 6.50K 100.00 ZFS plain file 8 3 16K 128K 952M 952M 100.00 ZFS plain file 9 3 16K 128K 912M 912M 100.00 ZFS plain file 10 3 16K 128K 695M 695M 100.00 ZFS plain file 11 3 16K 128K 914M 914M 100.00 ZFS plain file Now, if I'm understanding this output properly, object 4 is composed of 128KB blocks with a total size of 408MB, meaning that it uses 3264 blocks. Can someone confirm (or correct) that assumption? Also, I note that each object (as far as my limited testing has shown) has a single block size with no internal variation. Interestingly, all of my zvols seem to use fixed size blocks - that is, there is no variation in the block sizes - they're all the size defined on creation with no dynamic block sizes being used. I previously thought that the -b option set the maximum size, rather than fixing all blocks. Learned something today :-) # zdb -ddbb
Re: [zfs-discuss] 2gig file limit on ZFS?
Aplogies for not explaining myself correctly, I'm copying from ext3 on to ZFS - it appears to my amateur eyes, that it is ZFS that is having the problem. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Does OpenSolaris mpt driver support LSI 2008 controller
On 22/01/10 06:14 AM, Simon Breden wrote: Thanks a lot for the info James. For the benefit of myself and others then: 1. mpt_sas driver is used for the SuperMicro AOC-USAS2-L8e 2. mr_sas driver is used for the SuperMicro AOC-USAS2-L8i and LSI SAS 9211-8i Correct. I only know the internal chip code names, not what the actual shipping products are called :| And how does the maturity/robustness of the mpt_sas mr_sas drivers compare to the mpt driver which I'm currently using for my LSI 1068-based AOC-USAS-L8i card? (in the default IT mode) It might be hard to answer that one, but I thought I'd ask anyway, as it would make choosing new kit for OpenSolaris + ZFS a bit easier. I really don't have any specs re maturity or robustness, sorry, but I can tell you that (a) these two drivers were joint development efforts between Sun and LSI, (b) the requirements list that we had for the drivers is extensive, [note that MPxIO is on by default with mpt_sas] (c) we went through an insane amount of testing (and with some very rigorous tools) at every stage of the cycle before integration, and (d) we're confident that you'll find these drivers and chips to be up to the task. If you do come across problems, please bring it up in storage-discuss or zfs-discuss, and if necessary file a bug on bugs.opensolaris.org solaris/driver/mpt-sas, and solaris/driver/mr_sas are the two subcats that you'll need in that case. James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool
On Thu, Jan 21, 2010 at 09:36:06AM -0800, Richard Elling wrote: On Jan 20, 2010, at 4:17 PM, Daniel Carosone wrote: On Wed, Jan 20, 2010 at 03:20:20PM -0800, Richard Elling wrote: Though the ARC case, PSARC/2007/618 is unpublished, I gather from googling and the source that L2ARC devices are considered auxiliary, in the same category as spares. If so, then it is perfectly reasonable to expect that it gets picked up regardless of the GUID. This also implies that it is shareable between pools until assigned. Brief testing confirms this behaviour. I learn something new every day :-) So, I suspect Lutz sees a race when both pools are imported onto one node. This still makes me nervous though... Yes. What if device reconfiguration renumbers my controllers, will l2arc suddenly start trashing a data disk? The same problem used to be a risk for swap, but less so now that we swap to named zvol. This will not happen unless the labels are rewritten on your data disk, and if that occurs, all bets are off. It occurred to me later yesterday, while offline, that the pool in question might have autoreplace=on set. If that were true, it would explain why a disk in the same controller slot was overwritten and used. Lutz, is the pool autoreplace property on? If so, god help us all is no longer quite so necessary. There's work afoot to make l2arc persistent across reboot, which implies some organised storage structure on the device. Fixing this shouldn't wait for that. Upon further review, the ruling on the field is confirmed ;-) The L2ARC is shared amongst pools just like the ARC. What is important is that at least one pool has a cache vdev. Wait, huh? That's a totally separate issue from what I understood from the discussion. What I was worried about was that disk Y, that happened to have the same cLtMdN address as disk X on another node, was overwritten and trashed on import to become l2arc. Maybe I missed some other detail in the thread and reached the wrong conclusion? As such, for Lutz's configuration, I am now less nervous. If I understand correctly, you could add the cache vdev to rpool and forget about how it works with the shared pools. The fact that l2arc devices could be caching data from any pool in the system is .. a whole different set of (mostly performance) wrinkles. For example, if I have a pool of very slow disks (usb or remote iscsi), and a pool of faster disks, and l2arc for the slow pool on the same faster disks, it's pointless having the faster pool using l2arc on the same disks or even the same type of disks. I'd need to set the secondarycache properties of one pool according to the configuration of another. I suppose one could make the case that a new command is needed in addition to zpool and zfs (!) to manage such devices. But perhaps we can live with the oddity for a while? This part, I expect, will be resolved or clarified as part of the l2arc persistence work, since then their attachment to specific pools will need to be clear and explicit. Perhaps the answer is that the cache devices become their own pool (since they're going to need filesystem-like structured storage anyway). The actual cache could be a zvol (or new object type) within that pool, and then (if necessary) an association is made between normal pools and the cache (especially if I have multiple of them). No new top-level commands needed. -- Dan. pgp0MK26F4Jvy.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 2gig file limit on ZFS?
Fair enough. So where do you think my problem lies? Do you think it could be a limitation of the driver I loaded to read the ext3 partition? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Does OpenSolaris mpt driver support LSI 2008 controller
Correct. I only know the internal chip code names, not what the actual shipping products are called :| Now 'knew' ;-) It's reassuring to hear your points a thru d regarding the development/test cycle. I could always use the 'try before you buy' approach: others try it, and if it works, I buy it ;-) Thanks a lot. Simon http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/ -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 2gig file limit on ZFS?
Michelle Knight wrote: Fair enough. So where do you think my problem lies? Do you think it could be a limitation of the driver I loaded to read the ext3 partition? Without knowing exactly what commands you typed and exactly what error messages they produced, and which directories/files are on which types of file systems, we're limited to guessing. -- Andrew ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup memory overhead
On Thu, Jan 21, 2010 at 05:04:51PM +0100, erik.ableson wrote: What I'm trying to get a handle on is how to estimate the memory overhead required for dedup on that amount of storage. We'd all appreciate better visibility of this. This requires: - time and observation and experience, and - better observability tools and (probably) data exposed for them So the question is how much memory or L2ARC would be necessary to ensure that I'm never going back to disk to read out the hash keys. I think that's a wrong-goal for optimisation. For performance (rather than space) issues, I look at dedup as simply increasing the size of the working set, with a goal of reducing the amount of IO (avoided duplicate writes) in return. If saving one large async write costs several small sync reads, you fall off a very steep performance cliff, especially for IOPS-limited seeking media. However, it doesn't matter whether those reads are for DDT entries or other filesystem metadata necessary to complete the write. Nor does it even matter if those reads are data reads, for other processes that have been pushed out of ARC because of the larger working set. So I think it's right that arc doesn't treat DDT entries specially. The trouble is that the hash function produces (we can assume) random hits across the DDT, so the working set depends on the amount of data and the rate of potentially dedupable writes as well as the actual dedup hit ratio. A high rate of writes also means a large amount of data in ARC waiting to be written at the same time. This makes analysis very hard (and pushes you very fast towards that very steep cliff, as we've all seen). Separately, what might help is something like dedup=opportunistic that would keep the working set smaller: - dedup the block IFF the DDT entry is already in (l2)arc - otherwise, just write another copy - maybe some future async dedup cleaner, using bp-rewrite, to tidy up later. I'm not sure what, in this scheme, would ever bring DDT entries into cache, though. Reads for previously dedup'd data? I also think a threshold on the size of blocks to try deduping would help. If I only dedup blocks (say) 64k and larger, i might well get most of the space benefit for much less overhead. -- Dan. pgpfZ1iTPb0nB.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 2gig file limit on ZFS?
The error messages are in the original post. They are... /mirror2/applications/Microsoft/Operating Systems/Virtual PC/vm/XP-SP2/XP-SP2 Hard Disk.vhd: File too large /mirror2/applications/virtualboximages/xp/xp.tar.bz2: File too large The system installed to read the EXT3 system is here - http://blogs.sun.com/pradhap/entry/mount_ntfs_ext2_ext3_in The ZFS partition is on /mirror The EXT3 partition is on /mirror2 The command to start the copy is... cp -R /mirror2/* . ...while being CD'd to /mirror and logged in as root. Anything else I can get that would help this? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 2gig file limit on ZFS?
On Thu, Jan 21, 2010 at 01:55:53PM -0800, Michelle Knight wrote: The error messages are in the original post. They are... /mirror2/applications/Microsoft/Operating Systems/Virtual PC/vm/XP-SP2/XP-SP2 Hard Disk.vhd: File too large /mirror2/applications/virtualboximages/xp/xp.tar.bz2: File too large The system installed to read the EXT3 system is here - http://blogs.sun.com/pradhap/entry/mount_ntfs_ext2_ext3_in The ZFS partition is on /mirror The EXT3 partition is on /mirror2 Which is the path in the error filename. You're having trouble reading the file off ext3 - you can verify this by trying something like cat /dev/null. The command to start the copy is... cp -R /mirror2/* . ...while being CD'd to /mirror and logged in as root. Anything else I can get that would help this? Best would be to plug the ext3 disk into something that can read it fully, and copy over the network. Linux, NetBSD, maybe newer opensolaris. Note that this could be running in a VM on the same box, if necessary. -- Dan. pgpUExChlbKhx.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Does OpenSolaris mpt driver support LSI 2008 controller
We tried the new LSI controllers in our configuration trying to replace Areca 1680 controllers. The tests were done on 2009.06 Unlike the mpt drivers which were rock solid (but obviously do not support the new chips), the mr_sas was a complete disaster. (We got ours from LSI website). Timeouts, missing drives, errors in /var/adm/messages. The driver may have stabilized since then, but I wouldn't use it in production yet. 2010/03 - may be, but not 2009.06 -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 2gig file limit on ZFS?
| On Thu, Jan 21, 2010 at 01:55:53PM -0800, Michelle Knight wrote: Anything else I can get that would help this? split(1)? :-) -- bda cyberpunk is dead. long live cyberpunk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] need a few suggestions for a poor man's ZIL/SLOG device
PS: For data that you want to mostly archive, consider using Amazon Web Services (AWS) S3 service. Right now there is no charge to push data into the cloud and its $0.15/gigabyte to keep it there. Do a quick (back of the napkin) calculation on what storage you can get for $30/month and factor in bandwidth costs (to pull the data when/if you need it). My napkin calculations tell me that I cannot compete with AWS S3 for up to 100Gb of storage available 7x24. Even the electric utility bill would be more than AWS charges - especially when you consider UPS and air conditioning. And thats not including any hardware (capital equipment) costs! see: http://aws.amazon.com/s3/ When going the amazon route, you always need to take into account retrieval time/bandwidth cost. If you were to store 100GB on Amazon - how fast can you get your data back, or how much would bandwidth cost you to retrieve it in a timely manner. It is all a matter of requirements of course. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Zpool is a bit Pessimistic at failures
Hello, Anyone else noticed that zpool is kind of negative when reporting back from some error conditions? Like: cannot import 'zpool01': I/O error Destroy and re-create the pool from a backup source. or even worse: cannot import 'rpool': pool already exists Destroy and re-create the pool from a backup source. The first one i got when doing some failure testing on my new storage node, i've pulled several disks from a raidz2 to simulate loss off connectivity, lastly I pulled a third one which as expected made the pool unusable and later exported the pool. But when I reconnected one of the previous two drives and tried a import I got this message, the pool was fine once I reconnected the last disk to fail, so the messages seems a bit pessimistic. The second one i got when importing a old rpool with altroot but forgot to specify a new name for the pool, the solution to just add a new name to the pool was much better than recreating the pool and restoring from backup. I think this could scare or even make new users do terrible things, even if the errors could be fixed. I think I'll file a bug, agree? Henrik http://sparcv9.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup memory overhead
On Fri, Jan 22, 2010 at 08:55:16AM +1100, Daniel Carosone wrote: For performance (rather than space) issues, I look at dedup as simply increasing the size of the working set, with a goal of reducing the amount of IO (avoided duplicate writes) in return. I should add and avoided future duplicate reads in those parentheses as well. A CVS checkout, with identical CVS/Root files in every directory, is a great example. Every one of those files is read on cvs update. Developers often have multiple checkouts (different branches) from the same server. Good performance gains can be had by avoiding potentially many thousands of extra reads and cache entries, whether with dedup or simply by hardlinking them all together. I've hit the 64k limit on hardlinks to the one file more than once with this, on bsd FFS. It's not a great example for my suggestion of a threshold lower blocksize for dedup, however :-/ -- Dan. pgpleAwmVO8zb.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] need a few suggestions for a poor man's ZIL/SLOG device
On Thu, Jan 21, 2010 at 02:11:31PM -0800, Moshe Vainer wrote: PS: For data that you want to mostly archive, consider using Amazon Web Services (AWS) S3 service. Right now there is no charge to push data into the cloud and its $0.15/gigabyte to keep it there. Do a quick (back of the napkin) calculation on what storage you can get for $30/month and factor in bandwidth costs (to pull the data when/if you need it). My napkin calculations tell me that I cannot compete with AWS S3 for up to 100Gb of storage available 7x24. Even the electric utility bill would be more than AWS charges - especially when you consider UPS and air conditioning. And thats not including any hardware (capital equipment) costs! see: http://aws.amazon.com/s3/ When going the amazon route, you always need to take into account retrieval time/bandwidth cost. If you were to store 100GB on Amazon - how fast can you get your data back, or how much would bandwidth cost you to retrieve it in a timely manner. It is all a matter of requirements of course. Don't forget asymmetric upload/download bandwidth. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] hard drive choice, TLER/ERC/CCTL
+1 I agree 100% I have a website whose ZFS Home File Server articles are read around 1 million times a year, and so far I have recommended Western Digital drives wholeheartedly, as I have found them to work flawlessly within my RAID system using ZFS. With this recent action by Western Digital of disabling the ability to time-limit the error reporting period, thus effectively forcing consumer RAID users to buy their RAID-version drives at 50%-100% price premium, I have decided not to use Western Digital drives any longer, and have explained why here: http://breden.org.uk/2009/05/01/home-fileserver-a-year-in-zfs/ (look in the Drives section) Like yourself, I too am searching for consumer-priced drives where it's still possible to set the error reporting period. I'm also looking at the Samsung models at the moment -- either the HD154UI 1.5TB drive or the HD203WI 2TB drives... and if it's possible to set the error reporting time then these will be my next purchase. They have quite good user ratings at newegg.com... If WD lose money over this, they might rethink their strategy. Until then, bye bye WD. Cheers, Simon http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/ -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool
[Richard makes a hobby of confusing Dan :-)] more below.. On Jan 21, 2010, at 1:13 PM, Daniel Carosone wrote: On Thu, Jan 21, 2010 at 09:36:06AM -0800, Richard Elling wrote: On Jan 20, 2010, at 4:17 PM, Daniel Carosone wrote: On Wed, Jan 20, 2010 at 03:20:20PM -0800, Richard Elling wrote: Though the ARC case, PSARC/2007/618 is unpublished, I gather from googling and the source that L2ARC devices are considered auxiliary, in the same category as spares. If so, then it is perfectly reasonable to expect that it gets picked up regardless of the GUID. This also implies that it is shareable between pools until assigned. Brief testing confirms this behaviour. I learn something new every day :-) So, I suspect Lutz sees a race when both pools are imported onto one node. This still makes me nervous though... Yes. What if device reconfiguration renumbers my controllers, will l2arc suddenly start trashing a data disk? The same problem used to be a risk for swap, but less so now that we swap to named zvol. This will not happen unless the labels are rewritten on your data disk, and if that occurs, all bets are off. It occurred to me later yesterday, while offline, that the pool in question might have autoreplace=on set. If that were true, it would explain why a disk in the same controller slot was overwritten and used. Lutz, is the pool autoreplace property on? If so, god help us all is no longer quite so necessary. I think this is a different issue. But since the label in a cache device does not associate it with a pool, it is possible that any pool which expects a cache will find it. This seems to be as designed. There's work afoot to make l2arc persistent across reboot, which implies some organised storage structure on the device. Fixing this shouldn't wait for that. Upon further review, the ruling on the field is confirmed ;-) The L2ARC is shared amongst pools just like the ARC. What is important is that at least one pool has a cache vdev. Wait, huh? That's a totally separate issue from what I understood from the discussion. What I was worried about was that disk Y, that happened to have the same cLtMdN address as disk X on another node, was overwritten and trashed on import to become l2arc. Maybe I missed some other detail in the thread and reached the wrong conclusion? As such, for Lutz's configuration, I am now less nervous. If I understand correctly, you could add the cache vdev to rpool and forget about how it works with the shared pools. The fact that l2arc devices could be caching data from any pool in the system is .. a whole different set of (mostly performance) wrinkles. For example, if I have a pool of very slow disks (usb or remote iscsi), and a pool of faster disks, and l2arc for the slow pool on the same faster disks, it's pointless having the faster pool using l2arc on the same disks or even the same type of disks. I'd need to set the secondarycache properties of one pool according to the configuration of another. Don't use slow devices for L2ARC. Secondarycache is a dataset property, not a pool property. You can definitely manage the primary and secondary cache policies for each dataset. I suppose one could make the case that a new command is needed in addition to zpool and zfs (!) to manage such devices. But perhaps we can live with the oddity for a while? This part, I expect, will be resolved or clarified as part of the l2arc persistence work, since then their attachment to specific pools will need to be clear and explicit. Since the ARC is shared amongst all pools, it makes sense to share L2ARC amongst all pools. Perhaps the answer is that the cache devices become their own pool (since they're going to need filesystem-like structured storage anyway). The actual cache could be a zvol (or new object type) within that pool, and then (if necessary) an association is made between normal pools and the cache (especially if I have multiple of them). No new top-level commands needed. I propose a best practice of adding the cache device to rpool and be happy. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 2gig file limit on ZFS?
On Thu, Jan 21, 2010 at 02:54:21PM -0800, Richard Elling wrote: + support file systems larger then 2GiB include 32-bit UIDs a GIDs file systems, but what about individual files within? -- Dan. pgpw54qWyHczW.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Does OpenSolaris mpt driver support LSI 2008 controller
Ouch. Was that on the original 2009.06 vanilla install, or a later updated build? Hopefully a lot of the original bugs have been fixed by now, or soon will be. Has anyone got any from the trenches experience of using the mpt_sas driver? Any comments? Cheers, Simon http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/ -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zpool is a bit Pessimistic at failures
On Thu, Jan 21, 2010 at 11:14:33PM +0100, Henrik Johansson wrote: I think this could scare or even make new users do terrible things, even if the errors could be fixed. I think I'll file a bug, agree? Yes, very much so. -- Dan. pgp7OGc773Bqe.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 2gig file limit on ZFS?
On Jan 21, 2010, at 6:47 PM, Daniel Carosone d...@geek.com.au wrote: On Thu, Jan 21, 2010 at 02:54:21PM -0800, Richard Elling wrote: + support file systems larger then 2GiB include 32-bit UIDs a GIDs file systems, but what about individual files within? I think the original author meant files bigger then 2GiB and files systems bigger then 2TiB. I don't know why that wasn't builtin from the start it's been out for a long, long time now, between 5-10 years if I had to guess. -Ross ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool
On Thu, Jan 21, 2010 at 03:33:28PM -0800, Richard Elling wrote: [Richard makes a hobby of confusing Dan :-)] Heh. Lutz, is the pool autoreplace property on? If so, god help us all is no longer quite so necessary. I think this is a different issue. I agree. For me, it was the main issue, and I still want clarity on it. However, at this point I'll go back to the start of the thread and look at what was actually reported again in more detail. But since the label in a cache device does not associate it with a pool, it is possible that any pool which expects a cache will find it. This seems to be as designed. Hm. My recollection was that node b's disk in that controller slot was totally unlabelled, but perhaps I'm misremembering.. as above. For example, if I have a pool of very slow disks (usb or remote iscsi), and a pool of faster disks, and l2arc for the slow pool on the same faster disks, it's pointless having the faster pool using l2arc on the same disks or even the same type of disks. I'd need to set the secondarycache properties of one pool according to the configuration of another. Don't use slow devices for L2ARC. Slow is entirely relative, as we discussed here just recently. They just need to be faster than the pool devices I want to cache. The wrinkle here is that it's now clear they should be faster than the devices in all other pools as well (or I need to take special measures). Faster is better regardless, and suitable l2arc ssd's are cheap enough now. It's mostly academic that, previously, faster/local hard disks were fast enough, since now you can have both. Secondarycache is a dataset property, not a pool property. You can definitely manage the primary and secondary cache policies for each dataset. Yeah, properties of the root fs and of the pool are easily conflated. such devices. But perhaps we can live with the oddity for a while? This part, I expect, will be resolved or clarified as part of the l2arc persistence work, since then their attachment to specific pools will need to be clear and explicit. Since the ARC is shared amongst all pools, it makes sense to share L2ARC amongst all pools. Of course it does - apart from the wrinkles we now know we need to watch out for. Perhaps the answer is that the cache devices become their own pool (since they're going to need filesystem-like structured storage anyway). The actual cache could be a zvol (or new object type) within that pool, and then (if necessary) an association is made between normal pools and the cache (especially if I have multiple of them). No new top-level commands needed. I propose a best practice of adding the cache device to rpool and be happy. It is *still* not that simple. Forget my slow disks caching an even slower pool (which is still fast enough for my needs, thanks to the cache and zil). Consider a server config thus: - two MLC SSDs (x25-M, OCZ Vertex, whatever) - SSDs partitioned in two, mirrored rpool 2x l2arc - a bunch of disks for a data pool This is a likely/common configuration, commodity systems being limited mostly by number of sata ports. I'd even go so far as to propose it as another best practice, for those circumstances. Now, why would I waste l2arc space, bandwidth, and wear cycles to cache rpool to the same ssd's that would be read on a miss anyway? So, there's at least one more step required for happiness: # zfs set secondarycache=none rpool (plus relying on property inheritance through the rest of rpool) -- Dan. pgph2OAJgbY6C.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] hard drive choice, TLER/ERC/CCTL
And I agree as well. WD was about to get upwards of $500-$700 of my money, and is now getting zero over this issue alone moving me to look harder for other drives. I'm sure a WD rep would tell us about how there are extra unseen goodies in the RE line. Maybe. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] hard drive choice, TLER/ERC/CCTL
Thanks! Yep, I was about to buy six or so WD15EADS or WD15EARS drives, but it looks like I will not be ordering them now. The bad news is that after looking at the Samsungs it too seems that they have no way of changing the error reporting time in the 'desktop' drives. I hope I'm wrong though. I refuse to pay silly money for 'raid editions' of these drives. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Does OpenSolaris mpt driver support LSI 2008 controller
Vanilla 2009.06, mr_sas drivers from LSI website. To answer your other question - the mpt driver is very solid on 2009.06 -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Does OpenSolaris mpt driver support LSI 2008 controller
On Thu, Jan 21, 2010 at 7:37 PM, Moshe Vainer mvai...@doyenz.com wrote: Vanilla 2009.06, mr_sas drivers from LSI website. To answer your other question - the mpt driver is very solid on 2009.06 Are you sure those are the open source drivers he's referring to? LSI has a habit of releasing their own drivers with similar names. It sounds to me like that's what you were using. On that front, exactly where did you find the driver? They have nothing listed on the downloads page: http://lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/internal/sas9211-8i/index.html?locale=ENremote=1 -- --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool
On Jan 21, 2010, at 4:32 PM, Daniel Carosone wrote: I propose a best practice of adding the cache device to rpool and be happy. It is *still* not that simple. Forget my slow disks caching an even slower pool (which is still fast enough for my needs, thanks to the cache and zil). Consider a server config thus: - two MLC SSDs (x25-M, OCZ Vertex, whatever) - SSDs partitioned in two, mirrored rpool 2x l2arc - a bunch of disks for a data pool This is a likely/common configuration, commodity systems being limited mostly by number of sata ports. I'd even go so far as to propose it as another best practice, for those circumstances. Now, why would I waste l2arc space, bandwidth, and wear cycles to cache rpool to the same ssd's that would be read on a miss anyway? So, there's at least one more step required for happiness: # zfs set secondarycache=none rpool (plus relying on property inheritance through the rest of rpool) I agree with this, except for the fact that the most common installers (LiveCD, Nexenta, etc.) use the whole disk for rpool[1]. So the likely and common configuration today is moving towards one whole root disk. That could change in the future. [1] Solaris 10? well... since installation hard anyway, might as well do this. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Does OpenSolaris mpt driver support LSI 2008 controller
On Thu, Jan 21, 2010 at 8:05 PM, Moshe Vainer mvai...@doyenz.com wrote: http://lsi.com/storage_home/products_home/internal_raid/megaraid_sas/6gb_s_value_line/sas9260-8i/index.html 2009.06 didn't have the drivers integrated, so those aren't the open source ones. As i said, it is possible that 2010.03 will resolve this. But we do not put development releases in production. You should probably make that clear from the start then. You just bashed the opensource drivers based on your experience with something completely different. -- --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool
On Thu, Jan 21, 2010 at 05:52:57PM -0800, Richard Elling wrote: I agree with this, except for the fact that the most common installers (LiveCD, Nexenta, etc.) use the whole disk for rpool[1]. Er, no. You certainly get the option of whole disk or make partitions, at least with the opensolaris livecd. -- Dan. pgpBWoV2Vz5kt.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs zvol available space vs used space vs reserved space
Hello all, I have a small issue with zfs. I create a volume 1TB. # zfs get all tank/test01 NAMEPROPERTY VALUE SOURCE tank/test01 type volume - tank/test01 creation Thu Jan 21 15:05 2010 - tank/test01 used 1T - tank/test01 available 2.26T - tank/test01 referenced79.4G - tank/test01 compressratio 1.00x - tank/test01 reservation none default tank/test01 volsize 1T - tank/test01 volblocksize 8K - tank/test01 checksum on default tank/test01 compression offdefault tank/test01 readonly offdefault tank/test01 shareiscsioffdefault tank/test01 copies1 default tank/test01 refreservation1T local tank/test01 primarycache alldefault tank/test01 secondarycachealldefault tank/test01 usedbysnapshots 0 - tank/test01 usedbydataset 79.4G - tank/test01 usedbychildren0 - tank/test01 usedbyrefreservation 945G - What bugs me is the available:2.26T. Any ideas on why is that? Thanks, Younes -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs zvol available space vs used space vs reserved space
On Thu, Jan 21, 2010 at 07:33:47PM -0800, Younes wrote: Hello all, I have a small issue with zfs. I create a volume 1TB. # zfs get all tank/test01 NAMEPROPERTY VALUE SOURCE tank/test01 used 1T - tank/test01 available 2.26T - tank/test01 referenced79.4G - tank/test01 reservation none default tank/test01 refreservation1T local tank/test01 usedbydataset 79.4G - tank/test01 usedbychildren0 - tank/test01 usedbyrefreservation 945G - I've trimmed some not relevant properties. What bugs me is the available:2.26T. Any ideas on why is that? That's the available space in the rest of the pool. This includes space that could be used (ie, available for) potential snapshots of the volume (which would show in usedbychildren), since the volume size is a refreservation not a reservation. -- Dan. pgpxZF537tqix.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] question about which build to install
I installed b130 on my server, and i'm being hit by this bug: http://defect.opensolaris.org/bz/show_bug.cgi?id=13540 Where i can't log into gnome. I've bee trying to deal with it hoping that i a workaround would show up.. if there IS a workaround, i'd love to have it...if not, i'm wondering: is there another version i can downgrade to? I'm pretty new to opensolaris and i've tried to google to find this answer but i can't find it. my zpool is version 22. thanks for any help. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS loses configuration
I have just installed EON .599 on a machine with a 6 disk raidz2 configuration. I run updimg after creating a zpool. When I reboot, and attempt to run 'zpool list' it returns 'no pools configured'. I've checked /etc/zfs/zpool.cache, and it appears to have configuration information about the disks in place. If I run zpool import, it loads properly, but for whatever reason with EON, it's not saving the configuration. Any ideas where I should start looking? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup memory overhead
On Thu, Jan 21, 2010 at 2:51 PM, Andrey Kuzmin andrey.v.kuz...@gmail.com wrote: Looking at dedupe code, I noticed that on-disk DDT entries are compressed less efficiently than possible: key is not compressed at all (I'd expect roughly 2:1 compression ration with sha256 data), A cryptographic hash such as sha256 should not be compressible. A trivial example shows this to be the case: for i in {1..1} ; do echo $i | openssl dgst -sha256 -binary done /tmp/sha256 $ gzip -c sha256 sha256.gz $ compress -c sha256 sha256.Z $ bzip2 -c sha256 sha256.bz2 $ ls -go sha256* -rw-r--r-- 1 32 Jan 22 04:13 sha256 -rw-r--r-- 1 428411 Jan 22 04:14 sha256.Z -rw-r--r-- 1 321846 Jan 22 04:14 sha256.bz2 -rw-r--r-- 1 320068 Jan 22 04:14 sha256.gz -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] question about which build to install
Wrong list But anyhow I was able to install b128 and then upgrade to b130. I had relink some OpenGL files to get Compiz to work but apart from that it looks OK. /peter On 2010-01-22 11.03, Thomas Burgess wrote: I installed b130 on my server, and i'm being hit by this bug: http://defect.opensolaris.org/bz/show_bug.cgi?id=13540 Where i can't log into gnome. I've bee trying to deal with it hoping that i a workaround would show up.. if there IS a workaround, i'd love to have it...if not, i'm wondering: is there another version i can downgrade to? I'm pretty new to opensolaris and i've tried to google to find this answer but i can't find it. my zpool is version 22. thanks for any help. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4500...need input and clarity on striped/mirrored configuration
Did you buy the SSDs directly from Sun? I've heard there could possibly be firmware that's vendor specific for the X25-E. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4500...need input and clarity on striped/mirrored configuration
Hi On Friday 22 January 2010 07:04:06 Brad wrote: Did you buy the SSDs directly from Sun? I've heard there could possibly be firmware that's vendor specific for the X25-E. No. So far I've heard that they are not readily available as certification procedures are still underway (apart from this the 8850 firmware should be ok, but that's just what I've heard). C ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss