Re: [zfs-discuss] Pool vdev imbalance
Ian Collins wrote: I was running zpool iostat on a pool comprising a stripe of raidz2 vdevs that appears to be writing slowly and I notice a considerable imbalance of both free space and write operations. The pool is currently feeding a tape backup while receiving a large filesystem. Is this imbalance normal? I would expect a more even distribution as the poll configuration hasn't been changed since creation. The second and third ones are pretty much full, with the others having well over 10 times more free space, so I wouldn't expect many writes to the full ones. Have the others ever been in a degraded state? That might explain why the fill level has become unbalanced. The system is running Solaris 10 update 7. capacity operationsbandwidth pool used avail read write read write - - - - - - tank 15.9T 2.19T 87119 2.34M 1.88M raidz2 2.90T 740G 24 27 762K 95.5K c0t1d0- - 14 13 273K 18.6K c1t1d0- - 15 13 263K 18.3K c4t2d0- - 17 14 288K 18.2K spare - - 17 20 104K 17.2K c5t2d0 - - 16 13 277K 17.6K c7t5d0 - - 0 14 0 17.6K c6t3d0- - 15 12 242K 18.7K c7t3d0- - 15 12 242K 17.6K c6t4d0- - 16 12 272K 18.1K c1t0d0- - 15 13 275K 16.8K raidz2 3.59T 37.8G 20 0 546K 0 c0t2d0- - 11 0 184K361 c1t3d0- - 10 0 182K361 c4t5d0- - 14 0 237K361 c5t5d0- - 13 0 220K361 c6t6d0- - 12 0 155K361 c7t6d0- - 11 0 149K361 c7t4d0- - 14 0 219K361 c4t0d0- - 14 0 213K361 raidz2 3.58T 44.1G 27 0 1.01M 0 c0t5d0- - 16 0 290K361 c1t6d0- - 15 0 301K361 c4t7d0- - 20 0 375K361 c5t1d0- - 19 0 374K361 c6t7d0- - 17 0 285K361 c7t7d0- - 15 0 253K361 c0t0d0- - 18 0 328K361 c6t0d0- - 18 0 348K361 raidz2 3.05T 587G 7 47 24.9K 1.07M c0t4d0- - 3 21 254K 187K c1t2d0- - 3 22 254K 187K c4t3d0- - 5 22 350K 187K c5t3d0- - 5 21 350K 186K c6t2d0- - 4 22 265K 187K c7t1d0- - 4 21 271K 187K c6t1d0- - 5 22 345K 186K c4t1d0- - 5 24 333K 184K raidz2 2.81T 835G 8 45 30.9K 733K c0t3d0- - 5 16 339K 126K c1t5d0- - 5 16 333K 126K c4t6d0- - 6 16 441K 127K c5t6d0- - 6 17 435K 126K c6t5d0- - 4 18 294K 126K c7t2d0- - 4 18 282K 124K c0t6d0- - 7 19 446K 124K c5t7d0- - 7 21 452K 122K - - - - - - -- Andrew ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] device mixed-up while tying to import.
Hi, Thanks for the reply. I can arrange the lost SSD buy I already formatted it. Second, even the external HDD is for instance /dev/rdsk/c16t0d0, when I try to debug using zdb It shows me another “path”: path='/dev/dsk/c11t0d0s0' devid='id1,s...@tst31500341as2ger66y7/a' phys_path='/p...@0,0/pci8086,2...@1e/pci1458,1...@6/u...@00203702003490ab/d...@0 ,0:a' Is there away to fix it? Regards -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Clear vdev information from disk
Hello list, it is damn difficult to destroy ZFS labels :) I try to remove the vedev labels of disks used in a pool before. According to http://hub.opensolaris.org/bin/download/Community+Group+zfs/docs/ondiskformat0822.pdf I created a script that removes the first 512 KB and the last 512 KB, however I always miss the last labels .. LABEL 0 failed to unpack label 0 LABEL 1 failed to unpack label 1 LABEL 2 version=14 name='pool1' state=1 txg=26 LABEL 3 version=14 name='pool1' state=1 How can I calculate or determine the location of the superblocks described ion the document above ? Has that changed over the versions of ZFS ? (I now that vdevs now do not need to be exactly the same size in later OpenSolaris releases, so maybe something has changed) Any hints appreshiated .. p.s. Clearing the whole disk is troublesome, because those are a bunch of 1 TB disks and deletion should be fast. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] slow zfs scrub?
hi all I have a server running svn_131 and the scrub is very slow. I have a cron job for starting it every week and now it's been running for a while, and it's very, very slow scrub: scrub in progress for 40h41m, 12.56% done, 283h14m to go The configuration is listed below, consisting of three raidz2 groups with seven 2TB drives each. The root fs is on a pair of X25M (gen 1) SSDs and another set of similar SSDs be used for Zil and L2ARC (mirror for zil and stripe for l2arc). Is this correct behaviour? according to the zpool status, opensolaris is supposed to use something like 14 days to scrub the dpool... roy NAME STATE READ WRITE CKSUM dpoolONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 c7t6d0 ONLINE 0 0 0 c7t7d0 ONLINE 0 0 0 c8t0d0 ONLINE 0 0 0 raidz2-1 ONLINE 0 0 0 c8t1d0 ONLINE 0 0 0 c8t2d0 ONLINE 0 0 0 c8t3d0 ONLINE 0 0 0 c8t4d0 ONLINE 0 0 0 c8t5d0 ONLINE 0 0 0 c8t6d0 ONLINE 0 0 0 c8t7d0 ONLINE 0 0 0 raidz2-2 ONLINE 0 0 0 c9t0d0 ONLINE 0 0 0 c9t1d0 ONLINE 0 0 0 c9t2d0 ONLINE 0 0 0 c9t3d0 ONLINE 0 0 0 c9t4d0 ONLINE 0 0 0 c9t5d0 ONLINE 0 0 0 c9t6d0 ONLINE 0 0 0 logs mirror-3 ONLINE 0 0 0 c10d1s0 ONLINE 0 0 0 c11d0s0 ONLINE 0 0 0 cache c10d1s1ONLINE 0 0 0 c11d0s1ONLINE 0 0 0 spares c9t7d0 AVAIL Vennlige hilsener roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Who is using ZFS ACL's in production?
Paul B. Henson hen...@acm.org writes: On Fri, 26 Feb 2010, David Dyer-Bennet wrote: I think of using ACLs to extend extra access beyond what the permission bits grant. Are you talking about using them to prevent things that the permission bits appear to grant? Because so long as they're only granting extended access, losing them can't expose anything. Consider the example of creating a file in a directory which has an inheritable ACL for new files: why are you doing this? it's inherently insecure to rely on ACL's to restrict access. do as David says and use ACL's to *grant* access. if needed, set permission on the file to 000 and use umask 777. drwx--s--x+ 2 henson csupomona 4 Feb 27 09:21 . owner@:rwxpdDaARWcC--:-di---:allow owner@:rwxpdDaARWcC--:--:allow group@:--x---a-R-c---:-di---:allow group@:--x---a-R-c---:--:allow everyone@:--x---a-R-c---:-di---:allow everyone@:--x---a-R-c---:--:allow owner@:rwxpdDaARWcC--:f-i---:allow group@:--:f-i---:allow everyone@:--:f-i---:allow When the ACL is respected, then regardless of the requested creation mode or the umask, new files will have the following ACL: -rw---+ 1 henson csupomona 0 Feb 27 09:26 foo owner@:rw-pdDaARWcC--:--:allow group@:--:--:allow everyone@:--:--:allow Now, let's say a legacy application used a requested creation mode of 0644, and the current umask was 022, and the application calculated the resultant mode and explicitly set it with chmod(0644): why is umask 022 when you want 077? *that's* your problem. -- Kjetil T. Homme Redpill Linpro AS - Changing the game ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Large scale ZFS deployments out there (200 disks)
Speaking of long boot times, Ive heard that IBM power servers boot in 90 minutes or more. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Clear vdev information from disk
On Feb 28, 2010, at 5:05 AM, Lutz Schumann wrote: Hello list, it is damn difficult to destroy ZFS labels :) Some people seem to have a knack of doing it accidentally :-) I try to remove the vedev labels of disks used in a pool before. According to http://hub.opensolaris.org/bin/download/Community+Group+zfs/docs/ondiskformat0822.pdf I created a script that removes the first 512 KB and the last 512 KB, however I always miss the last labels .. LABEL 0 failed to unpack label 0 LABEL 1 failed to unpack label 1 LABEL 2 version=14 name='pool1' state=1 txg=26 LABEL 3 version=14 name='pool1' state=1 How can I calculate or determine the location of the superblocks described ion the document above ? Has that changed over the versions of ZFS ? (I now that vdevs now do not need to be exactly the same size in later OpenSolaris releases, so maybe something has changed) It has not changed. The labels are aligned to 256KB boundaries, so your script needs to find the correct end. You can also measure this directly using something like iosnoop when running zdb -l. -- richard Any hints appreshiated .. p.s. Clearing the whole disk is troublesome, because those are a bunch of 1 TB disks and deletion should be fast. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance http://nexenta-atlanta.eventbrite.com (March 16-18, 2010) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Pool vdev imbalance
Andrew Gabriel wrote: Ian Collins wrote: I was running zpool iostat on a pool comprising a stripe of raidz2 vdevs that appears to be writing slowly and I notice a considerable imbalance of both free space and write operations. The pool is currently feeding a tape backup while receiving a large filesystem. Is this imbalance normal? I would expect a more even distribution as the poll configuration hasn't been changed since creation. The second and third ones are pretty much full, with the others having well over 10 times more free space, so I wouldn't expect many writes to the full ones. Have the others ever been in a degraded state? That might explain why the fill level has become unbalanced. We had to swap a drive in the second one and I've seen hot spares kick in as in the first one here. These have always been as a result of phantom errors from that wretched Marvell driver (the box is an x4500). Nothing has been degraded for long and I stop all copies to the box when scrubbing or resilvering is in progress (receives still restart scrubs in update 7). The system is running Solaris 10 update 7. capacity operationsbandwidth pool used avail read write read write - - - - - - tank 15.9T 2.19T 87119 2.34M 1.88M raidz2 2.90T 740G 24 27 762K 95.5K c0t1d0- - 14 13 273K 18.6K c1t1d0- - 15 13 263K 18.3K c4t2d0- - 17 14 288K 18.2K spare - - 17 20 104K 17.2K c5t2d0 - - 16 13 277K 17.6K c7t5d0 - - 0 14 0 17.6K c6t3d0- - 15 12 242K 18.7K c7t3d0- - 15 12 242K 17.6K c6t4d0- - 16 12 272K 18.1K c1t0d0- - 15 13 275K 16.8K raidz2 3.59T 37.8G 20 0 546K 0 c0t2d0- - 11 0 184K361 c1t3d0- - 10 0 182K361 c4t5d0- - 14 0 237K361 c5t5d0- - 13 0 220K361 c6t6d0- - 12 0 155K361 c7t6d0- - 11 0 149K361 c7t4d0- - 14 0 219K361 c4t0d0- - 14 0 213K361 raidz2 3.58T 44.1G 27 0 1.01M 0 c0t5d0- - 16 0 290K361 c1t6d0- - 15 0 301K361 c4t7d0- - 20 0 375K361 c5t1d0- - 19 0 374K361 c6t7d0- - 17 0 285K361 c7t7d0- - 15 0 253K361 c0t0d0- - 18 0 328K361 c6t0d0- - 18 0 348K361 raidz2 3.05T 587G 7 47 24.9K 1.07M c0t4d0- - 3 21 254K 187K c1t2d0- - 3 22 254K 187K c4t3d0- - 5 22 350K 187K c5t3d0- - 5 21 350K 186K c6t2d0- - 4 22 265K 187K c7t1d0- - 4 21 271K 187K c6t1d0- - 5 22 345K 186K c4t1d0- - 5 24 333K 184K raidz2 2.81T 835G 8 45 30.9K 733K c0t3d0- - 5 16 339K 126K c1t5d0- - 5 16 333K 126K c4t6d0- - 6 16 441K 127K c5t6d0- - 6 17 435K 126K c6t5d0- - 4 18 294K 126K c7t2d0- - 4 18 282K 124K c0t6d0- - 7 19 446K 124K c5t7d0- - 7 21 452K 122K - - - - - - -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Clear vdev information from disk
On Sun, Feb 28, 2010 at 1:12 PM, Richard Elling richard.ell...@gmail.comwrote: On Feb 28, 2010, at 5:05 AM, Lutz Schumann wrote: Hello list, it is damn difficult to destroy ZFS labels :) Some people seem to have a knack of doing it accidentally :-) I try to remove the vedev labels of disks used in a pool before. According to http://hub.opensolaris.org/bin/download/Community+Group+zfs/docs/ondiskformat0822.pdfI created a script that removes the first 512 KB and the last 512 KB, however I always miss the last labels .. LABEL 0 failed to unpack label 0 LABEL 1 failed to unpack label 1 LABEL 2 version=14 name='pool1' state=1 txg=26 LABEL 3 version=14 name='pool1' state=1 How can I calculate or determine the location of the superblocks described ion the document above ? Has that changed over the versions of ZFS ? (I now that vdevs now do not need to be exactly the same size in later OpenSolaris releases, so maybe something has changed) It has not changed. The labels are aligned to 256KB boundaries, so your script needs to find the correct end. You can also measure this directly using something like iosnoop when running zdb -l. -- richard Any hints appreshiated .. p.s. Clearing the whole disk is troublesome, because those are a bunch of 1 TB disks and deletion should be fast. -- This message posted from opensolaris.org Perhaps this has already been suggested, but it would seem to me to make a lot more sense to have some sort of zfs label clear type command to quickly and easily clear labels... --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] device mixed-up while tying to import.
On Sun, Feb 28, 2010 at 2:06 PM, Yariv Graf ya...@walla.net.il wrote: Hi, Thanks for the reply. I can arrange the lost SSD buy I already formatted it. Second, even the external HDD is for instance /dev/rdsk/c16t0d0, when I try to debug using zdb It shows me another “path”: path='/dev/dsk/c11t0d0s0' devid='id1,s...@tst31500341as2ger66y7/a' phys_path='/p...@0,0/pci8086,2...@1e/pci1458,1...@6/u...@00203702003490ab/d...@0 ,0:a' Is there away to fix it? Yariv, In short you need not. That 'another' path won't fool ZFS, since it ultimately trusts the device GUIDs. You can reshuffle disks in your pool and put it back in random order and ZFS will find its way. On successful import all the recorded paths will be updated. So, back to square 1 - you need import your pool first. -- Regards, Cyril ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS compression and deduplication on root pool on SSD
I am running my root pool on a 60 GB SLC SSD (OCZ Agility EX). At present, my rpool/ROOT has no compression, and no deduplication. I was wondering about whether it would be a good idea, from a performance and data integrity standpoint, to use one, the other, or both, on the root pool. My current problem is that I'm starting to run out of space on the SSD, and based on a send|receive I did to a backup server, I should be able to compress by about a factor of 1.5x. If I enable both on the rpool filesystem, then clone the boot environment, that should enable it on the new BE (which would be a child of rpool/ROOT), right? Also, I don't have the numbers to prove this, but it seems to me that the actual size of rpool/ROOT has grown substantially since I did a clean install of build 129a (I'm now at build133). WIthout compression, either, that was around 24 GB, but things seem to have accumulated by an extra 11 GB or so. Or am I imagining things? Is there a way to get rid of all of the legacy stuff that's in there? I already deleted the old snapshots and boot environments that were taking up much space. Thanks! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slow zfs scrub?
On Sat, 27 Feb 2010, Roy Sigurd Karlsbakk wrote: hi all I have a server running svn_131 and the scrub is very slow. I have a cron job for starting it every week and now it's been running for a while, and it's very, very slow Have you checked the output of 'iostat -xe' to see if there are unusually slow (or overloaded) disks or increasing error counts? Is the CPU load unusually high? Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS compression and deduplication on root pool on SSD
On Sun, 28 Feb 2010, valrh...@gmail.com wrote: backup server, I should be able to compress by about a factor of 1.5x. If I enable both on the rpool filesystem, then clone the boot environment, that should enable it on the new BE (which would be a child of rpool/ROOT), right? If by 'clone' you are talking about zfs's clone, I don't think that this will immediately save you any space since zfs clone is done by block references to existing blocks. You would need to actually copy (or re-write) the files in order for them to be compressed. Using lzjb compression sounds like a good idea. I doubt that GRUB supports gzip compression so take care that you use a compression algorithm that GRUB understands or your system won't boot. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] What's the advantage of using multiple filesystems in a pool
Hi guys, on my home server I have a variety of directories under a single pool/filesystem, Cloud. Things like cloud/movies - 4TB cloud/music - 100Gig cloud/winbackups - 1TB cloud/data - 1TB etc. After doing some reading, I see recomendations to have separate filesystem to improve performance...but not sure how as it's the same pool? Can someone help me understand if/why I should use separate file systems for these? ta. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zpool import as unavailable when mpxio disabled
I have 1 host with Solaris 10 update 8 and it linked with stk6540 array(the host type set to traffic manager), and host have 4 paths and 2 linked to the controller A and the rest 2 linked to controller B, when I disabled the MPxIO and the host reboot, then I checked the zpool status, the testpool imported as unavailable, the reason is zpool still try to import zpool with the MPxIO device files, actually they were not available again, and if I export this pool and re-import it again, the pool can be imported with the correct device files. But I did the same test with one Solaris 10 U8 host linked to IBM ess800, and when MPxIO disable, the zpool can import correctly with the correct device files. The scenario confuse me, and do anybody have experiment on it?Any reply is very appreciated. Thanks,Ming -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS compression and deduplication on root pool on SSD
On 02/28/10 15:58, valrh...@gmail.com wrote: Also, I don't have the numbers to prove this, but it seems to me that the actual size of rpool/ROOT has grown substantially since I did a clean install of build 129a (I'm now at build133). WIthout compression, either, that was around 24 GB, but things seem to have accumulated by an extra 11 GB or so. One common source for this is slowly accumulating files under /var/pkg/download. Clean out /var/pkg/download and delete all but the most recent boot environment to recover space (you need to do this to get the space back because the blocks are referenced by the snapshots used by each clone as its base version). To avoid this in the future, set PKG_CACHEDIR in your environment to point at a filesystem which isn't cloned by beadm -- something outside rpool/ROOT, for instance. On several systems which have two pools (root data) I've relocated it to the data pool - it doesn't have to be part of the root pool. This has significantly slimmed down my root filesystem on systems which are chasing the dev branch of opensolaris. At present, my rpool/ROOT has no compression, and no deduplication. I was wondering about whether it would be a good idea, from a performance and data integrity standpoint, to use one, the other, or both, on the root pool. I've used the combination of copies=2 and compression=yes on rpool/ROOT for a while and have been happy with the result. On one system I recently moved to an ssd root, I also turned on dedup and it seems to be doing just fine: NAME SIZE ALLOC FREECAP DEDUP HEALTH ALTROOT r2 37G 14.7G 22.3G39% 1.31x ONLINE - (the relatively high dedup ratio is because I have one live upgrade BE with nevada build 130, and a beadm BE with opensolaris build 130, which is mostly the same) - Bill ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] What's the advantage of using multiple filesystems in a pool
tomwaters wrote: Hi guys, on my home server I have a variety of directories under a single pool/filesystem, Cloud. Things like cloud/movies - 4TB cloud/music - 100Gig cloud/winbackups - 1TB cloud/data - 1TB etc. After doing some reading, I see recomendations to have separate filesystem to improve performance...but not sure how as it's the same pool? Can someone help me understand if/why I should use separate file systems for these? ta. Obviously, having different filesystems gives you the ability to set different values for attributes, which may substantially improve performance or storage space depending on the data in that filesystem. As an example above, I would consider turning compression on for your cloud/winbackups and possibly for cloud/data, but definitely not for either cloud/movies (assuming mpeg4 or similar files) or cloud/music. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] suggested ssd for zil
If anyone has specific SSD drives they would recommend for ZIL use would you mind a quick response to the list? My understanding is I need to look for: 1) Respect cache flush commands (which is my real question...the answer to this isn't very obvious in most cases) 2) Fast on small writes It seems even the smallest sizes should be sufficient. This is for a home NAS where most write work is for iSCSI volumes hosting backups for OS X Time Machine. There is also some small amount of MySQL (InnoDB) shared via NFS. From what I can gather workable options would be: - Stec which are in the 7000 series and extremely expensive - Mtron Pro 7500 16GB SLC which seem to respect the cache flush but aren't particularly fast doing it http://opensolaris.org/jive/thread.jspa?messageID=459872tstart=0 - Intel X-25E with the cache turned off which seems to be like the Mtron - Seagate's marketing page for their new SSD implies it has a capacitor to protect data in cache like I believe the Stec does. But I don't think they are available at retail yet. Power loss data protection to ensure against data loss upon power failure http://www.seagate.com/www/en-us/products/servers/pulsar/pulsar/ And what won't work are: - Intel X-25M - Most/all of the consumer drives prices beneath the X-25M all because they use capacitors to get write speed w/o respecting cache flush requests. Is there anything that is safe to use as a ZIL, faster than the Mtron but more appropriate for home than a Stec? Maybe the answer is to wait on Seagate, but I thought maybe someone has other ideas. Thanks, Ware ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] suggested ssd for zil
On Feb 28, 2010, at 11:51 PM, rwali...@washdcmail.com wrote: And what won't work are: - Intel X-25M - Most/all of the consumer drives prices beneath the X-25M all because they use capacitors to get write speed w/o respecting cache flush requests. Sorry, meant to say they use cache to get write speed w/o respecting cache flush requests. --Ware ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] suggested ssd for zil
Is there anything that is safe to use as a ZIL, faster than the Mtron but more appropriate for home than a Stec? ACARD ANS-9010, as mentioned several times here recently (also sold as hyperdrive5) -- Dan. pgpeFYm43bUlS.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS compression and deduplication on root pool on SSD
On Sun, Feb 28, 2010 at 07:36:30PM -0800, Bill Sommerfeld wrote: To avoid this in the future, set PKG_CACHEDIR in your environment to point at a filesystem which isn't cloned by beadm -- something outside rpool/ROOT, for instance. +1 - I've just used a dataset mounted at /var/pkg/download, don't know when this knbob appeared but i'only heard about it recently and not yet bothered to rearrange stuff accordingly. On one system I recently moved to an ssd root, I also turned on dedup and it seems to be doing just fine: I have had compress=on,dedup=on on several rpools for some time, including a little netbook with a 7.5G slow-as ssd. -- Dan. pgp9IxlyKOn8j.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] suggested ssd for zil
On Mar 1, 2010, at 12:05 AM, Daniel Carosone wrote: Is there anything that is safe to use as a ZIL, faster than the Mtron but more appropriate for home than a Stec? ACARD ANS-9010, as mentioned several times here recently (also sold as hyperdrive5) You are right. I saw that in a recent thread. In my case I don't have a spare bay for it. I'm similarly constrained on some of the PCI solutions that have either battery backup or external power. But this seems like a good solution if someone has the space. Thanks, Ware ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] suggested ssd for zil
rwali...@washdcmail.com wrote: On Feb 28, 2010, at 11:51 PM, rwali...@washdcmail.com wrote: And what won't work are: - Intel X-25M - Most/all of the consumer drives prices beneath the X-25M all because they use capacitors to get write speed w/o respecting cache flush requests. Sorry, meant to say they use cache to get write speed w/o respecting cache flush requests. --Ware Actually, the bigger strike against the X-25M and similar MLC-based SSDs is their relatively poor small random writes performance. I'm pretty sure that all SandForce-based SSDs don't use DRAM as their cache, but take a hunk of flash to use as scratch space instead. Which means that they'll be OK for ZIL use. OCZ's Vertex 2 EX and Vertex 2 both use the controller, but they'll not be available for another month or so, in all likelihood. http://www.techspot.com/review/242-ocz-vertex2-pro-ssd/ Also, it looks like the Vertex Limited Edition is SandForce-based, too. http://www.legitreviews.com/article/1222/2/ Though, according to the article, without the capacitor, you still might loose some data stored in the SandForce controller's internal buffer. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] device mixed-up while tying to import.
Hi Cyril, Thanks for the response. In simple words this is what been done. 1- zpool import HD (external HDD[single drive]) 2- zpool add HD log c0t4d0 (SSD drive) 3- play with it a bit. 4 zpool export HD 5- reinstall opensolaris on SSD drive (ex slog above). Is there any chance to recover the HD zpool? I can use the SSD drive as slog for recovery if needed. Many thanks Yariv -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sizing for L2ARC and dedup...
On Feb 28, 2010, at 7:11 PM, Erik Trimble wrote: I'm finally at the point of adding an SSD to my system, so I can get reasonable dedup performance. The question here goes to sizing of the SSD for use as an L2ARC device. Noodling around, I found Richard's old posing on ARC-L2ARC memory requirements, which is mighty helpful in making sure I don't overdo the L2ARC side. (http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg34677.html) I don't know of an easy way to see the number of blocks, which is what you need to complete a capacity plan. OTOH, it doesn't hurt to have an L2ARC, just beware of wasting space if you have a small RAM machine. What I haven't found is a reasonable way to determine how big I'll need an L2ARC to fit all the relevant data for dedup. I've seen several postings back in Jan about this, and there wasn't much help, as was acknowledged at the time. What I'm after is exactly what needs to be stored extra for DDT? I'm looking at the 200-byte header in ARC per L2ARC entry, and assuming that is for all relevant info stored in the L2ARC, whether it's actual data or metadata. My question is this: the metadata for a slab (record) takes up how much space? With DDT turned on, I'm assuming that this metadata is larger than with it off (or, is it the same now for both)? There has to be some way to do a back-of-the-envelope calc that says (X) pool size = (Y) min L2ARC size = (Z) min ARC size If you know the number of blocks and the size distribution you can calculate this. In other words, it isn't very easy to do in advance unless you have a fixed-size workload (eg database that doesn't grow :-) For example, if you have a 10 GB database with 8KB blocks, then you can calculate how much RAM would be required to hold the headers for a 10 GB L2ARC device: headers = 10 GB / 8 KB RAM needed ~ 200 bytes * headers for media, you can reasonably expect 128KB blocks. The DDT size can be measured with zdb -D poolname but you can expect that to grow over time, too. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance http://nexenta-atlanta.eventbrite.com (March 16-18, 2010) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sizing for L2ARC and dedup...
Richard Elling wrote: On Feb 28, 2010, at 7:11 PM, Erik Trimble wrote: I'm finally at the point of adding an SSD to my system, so I can get reasonable dedup performance. The question here goes to sizing of the SSD for use as an L2ARC device. Noodling around, I found Richard's old posing on ARC-L2ARC memory requirements, which is mighty helpful in making sure I don't overdo the L2ARC side. (http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg34677.html) I don't know of an easy way to see the number of blocks, which is what you need to complete a capacity plan. OTOH, it doesn't hurt to have an L2ARC, just beware of wasting space if you have a small RAM machine. I haven't found a good way, either. And I've looked. ;-) What I haven't found is a reasonable way to determine how big I'll need an L2ARC to fit all the relevant data for dedup. I've seen several postings back in Jan about this, and there wasn't much help, as was acknowledged at the time. What I'm after is exactly what needs to be stored extra for DDT? I'm looking at the 200-byte header in ARC per L2ARC entry, and assuming that is for all relevant info stored in the L2ARC, whether it's actual data or metadata. My question is this: the metadata for a slab (record) takes up how much space? With DDT turned on, I'm assuming that this metadata is larger than with it off (or, is it the same now for both)? There has to be some way to do a back-of-the-envelope calc that says (X) pool size = (Y) min L2ARC size = (Z) min ARC size If you know the number of blocks and the size distribution you can calculate this. In other words, it isn't very easy to do in advance unless you have a fixed-size workload (eg database that doesn't grow :-) For example, if you have a 10 GB database with 8KB blocks, then you can calculate how much RAM would be required to hold the headers for a 10 GB L2ARC device: headers = 10 GB / 8 KB RAM needed ~ 200 bytes * headers for media, you can reasonably expect 128KB blocks. The DDT size can be measured with zdb -D poolname but you can expect that to grow over time, too. -- richar That's good, but I'd like a way to pre-calculate my potential DDT size (which, I'm assuming, will sit in L2ARC, right?) Once again, I'm assuming that each DDT entry corresponds to a record (slab), so to be exact, I would need to know the number of slabs (which doesn't currently seem possible). I'd be satisfied with a guesstimate based on what my expected average block size it. But what I need to know is how big a DDT entry is for each record. I'm trying to parse the code, and I don't have it in a sufficiently intelligent IDE right now to find all the cross-references. I've got as far as (in ddt.h) struct ddt_entry { ddt_key_tdde_key; ddt_phys_tdde_phys[DDT_PHYS_TYPES]; zio_t*dde_lead_zio[DDT_PHYS_TYPES]; void*dde_repair_data; enum ddt_typedde_type; enum ddt_classdde_class; uint8_tdde_loading; uint8_tdde_loaded; kcondvar_tdde_cv; avl_node_tdde_node; }; Any idea what these structure size actually are? -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss