Re: [zfs-discuss] ZFS Performance compared to UFS VxFS - offtopic
On Tue, Aug 22, 2006 at 06:15:08AM -0700, Tony Galway wrote: A question (well lets make it 3 really) ??? Is vdbench a useful tool when testing file system performance of a ZFS file system? Secondly - is ZFS write performance really much worse than UFS or VxFS? and Third - what is a good benchmarking tool to test ZFS vs UFS vs VxFS? Are vdbench and SWAT going to be released to the public ? przemol ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] 3510 - some new tests
Hello zfs-discuss, Server is v440, Solaris 10U2 + patches. Each test repeated at least two times and two results posted. Server connected with dual-ported FC card with MPxIO using FC-AL (DAS). 1. 3510, RAID-10 using 24 disks from two enclosures, random optimization, 32KB stripe width, write-back, one LUN 1.1 filebench/varmail for 60s a. ZFS on top of LUN, atime=off IO Summary: 490054 ops 8101.6 ops/s, (1246/1247 r/w) 39.9mb/s, 291us cpu/op, 6.1ms latency IO Summary: 492274 ops 8139.6 ops/s, (1252/1252 r/w) 40.1mb/s, 303us cpu/op, 6.1ms latency b. ZFS on top of LUN, atime=off WRITE CACHE OFF (write-thru) IO Summary: 281048 ops 4647.0 ops/s, (715/715 r/w) 22.8mb/s,298us cpu/op, 10.7ms latency IO Summary: 282200 ops 4665.3 ops/s, (718/718 r/w) 23.0mb/s,298us cpu/op, 10.6ms latency c. UFS on top of LUN, noatime, maxcontig set to 48 IO Summary: 383262 ops 6337.1 ops/s, (975/975 r/w) 31.2mb/s,566us cpu/op, 7.9ms latency IO Summary: 381706 ops 6310.4 ops/s, (971/971 r/w) 31.1mb/s,560us cpu/op, 7.9ms latency d. UFS on top of LUN, noatime, maxcontig set to 48, WRITE CACHE OFF (write-thru) IO Summary: 148825 ops 2460.0 ops/s, (378/379 r/w) 12.1mb/s,772us cpu/op, 20.9ms latency IO Summary: 151152 ops 2498.4 ops/s, (384/385 r/w) 12.4mb/s,758us cpu/op, 20.5ms latency 2. 3510, 2x (4x RAID-0(3disks)), 32KB stripe width, random optimization, write back. 4 R0 groups are in one enclosure and assigned to primary controller then another 4 R0 groups are in other enclosure and assigned to secondary controller. Then RAID-10 is created with mirror groups between controllers. 24x disks total as in #1. 2.1 filebench/varmail 60s a. ZFS RAID-10, atime=off IO Summary: 379284 ops 6273.4 ops/s, (965/965 r/w) 30.9mb/s,314us cpu/op, 8.0ms latency IO Summary: 383917 ops 6346.9 ops/s, (976/977 r/w) 31.4mb/s,316us cpu/op, 7.8ms latency b. ZFS RAID-10, atime=off WRITE CACHE OFF (write-thru) IO Summary: 275490 ops 4549.9 ops/s, (700/700 r/w) 22.3mb/s,327us cpu/op, 11.0ms latency IO Summary: 276027 ops 4567.8 ops/s, (703/703 r/w) 22.5mb/s,319us cpu/op, 11.0ms latency -- Best regards, Robert mailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zpool import: snv_33 to S10 6/06
Hi, [EMAIL PROTECTED] cat /etc/release Solaris Nevada snv_33 X86 Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 06 February 2006 I have zfs running well on this box. Now, I want to upgrade to Solaris 10 6/06 release. Question: Will the 6/06 release recognize the zfs created by snv_33? I seem to recall something about being at a certain release level for 6/06 to be able to import without problems.. I searched the archives but I can't find where I read that anymore. TIA, James ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Porting ZFS file system to FreeBSD.
Ricardo Correia wrote: Wow, congratulations, nice work! I'm the one porting ZFS to FUSE and seeing you doing such progress so fast is very very encouraging :) I'd like to throw a me too into the pile of thank-you messages! I spent part of the weekend expanding and manipulating a set of LVM volumes on a pair of RHEL4-ish Linux servers... And I kept grumbling to myself if this were ZFS, I could be done by now! Not only that, but I could have matched the configuration to the needs of the users more closely.[0] I look forward to ZFS on both Linux and FreeBSD. It will be a powerful addition to both platforms! Thanks, -Luke [0] Changing a production server from an RHEL4 clone to Solaris isn't something that I'm likely to just-do in a couple of hours over the weekend on a cross-platform domain where I'm just assisting. If I were the sysadmin there, though, it would be practical. smime.p7s Description: S/MIME Cryptographic Signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Issue with zfs snapshot replication from version2 to version3 pool.
I've filed a bug for the problem Tim mentions below. 6463140 zfs recv with a snapshot name that has 2 @@ in a row succeeds This is most likely due to the order in which we call zfs_validate_name in the zfs recv code, which would explain why other snapshot commands like 'zfs snapshot' will fail out and refuse to create a snapshot with 2 @@ in a row. I'll look into it and update the bug further. Noel On Aug 22, 2006, at 11:45 AM, Shane Milton wrote: Just updating the discussion with some email chains. After more digging, this does not appear to be a version 2 or 3 replicatiion issues. I believe it to be an invalid named snapshot that causes zpool and zfs commands to core. Tim mentioned it may be similiar to bug 6450219. I agree it seems similiar to 6450219, but I'm not so sure it's the same as the related bug of 6446512. At least the description of ...mistakenly trying to copy a file or directory... I do not believe to apply in this case. However, I'm still testing things so it very well may produce the same error. -Shane -- To: Tim Foster , Eric Schrock Date: Aug 22, 2006 10:37 AM Subject: Re: [zfs-discuss] Issue with zfs snapshot replication from version2 to version3 pool. Looks like the problem is that 'zfs recieve' will accept invalid snapshot names. In this case two @ signs This causes most other zfs and zpool commands that look up the snapshot object type to core dump. Reproduced on x64 Build44 system with the following command. zfs send t0/[EMAIL PROTECTED] | zfs recv t1/fs0@@snashot_in [EMAIL PROTECTED]:/var/tmp/] $ zfs list -r t1 internal error: Invalid argument Abort(coredump) dtrace output 1 51980 zfs_ioc_objset_stats:entry t1 1 51981 zfs_ioc_objset_stats:return 0 1 51980 zfs_ioc_objset_stats:entry t1/fs0 1 51981 zfs_ioc_objset_stats:return 0 1 51980 zfs_ioc_objset_stats:entry t1/fs0 1 51981 zfs_ioc_objset_stats:return 0 1 51980 zfs_ioc_objset_stats:entry t1/fs0@@snashot_in 1 51981 zfs_ioc_objset_stats:return22 This may need to be filed as a bug again zfs recv. Thank you for your time, -Shane From: Tim Foster To: shane milton Cc: Eric Schrock Date: Aug 22, 2006 10:56 AM Subject: Re: [zfs-discuss] Issue with zfs snapshot replication from version2 to version3 pool. Hi Shane, On Tue, 2006-08-22 at 10:37 -0400, shane milton wrote: Looks like the problem is that 'zfs recieve' will accept invalid snapshot names. In this case two @ signs This causes most other zfs and zpool commands that look up the snapshot object type to core dump. Thanks for that! I believe this is the same as http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6450219 (but I'm open to corrections :-) cheers, tim This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance compared to UFS VxFS
Tony Galway wrote: A question (well lets make it 3 really) – Is vdbench a useful tool when testing file system performance of a ZFS file system? Not really. VDBench simply reads and writes from the allocated file. Filesystem tests do things like create files, read files, delete files, move files, create directories, remove directories with their contents, etc. You also will see different results based on the inner workings of the filesystem itself. [For Sun folks we've been bashing this one around on the vdbench list for a few weeks] Third - what is a good benchmarking tool to test ZFS vs UFS vs VxFS? I've been recommending filebench http://www.solarisinternals.com/si/tools/filebench/index.php though depending on the app your customer runs it might be easier to just run the app for a bit. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Question on Zones and memory usage (65120349)
Hi all, Customer has another questions. I'm resending : snip I guess since the zones we are working with are running /acting as Oracle 10 database servers, the 100% memory usage prstat is not accurate. Also, from the text below it seems that rcapd is not the way to go to segregate memory in zones and to wait for LDOMs which we cannot do. Also I read the following about FSS; Q: Can I use the Solaris 10 FSS (Fair Share Scheduler) with Oracle in a Solaris Container? A: There are currently (June 2006) two distinct concerns regarding the use of FSS in a Container when running Oracle databases: In testing - Oracle processes use internal methods to prioritize themselves to improve inefficiency. It is possible that these methods might not work well in conjunction with the Solaris FSS. Although there are no known problems with non-RAC configurations, Sun and Oracle are testing this type of configuration to discover any negative interactions. This testing should be completed soon. Still not sure what to do to pin a certain amount of memory to my production oracle server zone. snip Jeff Victor wrote On 08/12/06 13:48,: Mike Gerdts wrote: On 8/11/06, Irma Garcia [EMAIL PROTECTED] wrote: ZONEID NPROC SIZE RSS MEMORY TIME CPU ZONE 15 188 169G 163G 100% 0:46:00 48% fmtest 0 54 708M 175M 0.1% 2:23:40 0.1% global 12 27 112M 51M 0.0% 0:02:48 0.0% fmprod 4 27 281M 66M 0.0% 0:14:13 0.0% fmstage Questions? Does the 100% memory usage on each mean that the fmtest zone is using all the memory. How come when I run the top command I see different result for memory usage. The %mem column is the sum of the %mem that each process uses. Unfortuantely, that value seems to include the pages that are shared between many processes (e.g. database files, libc, etc.) without dividing by the number of processes that have that memory mapped. In other words, if you have 50 database processes that have used mmap() on the same 1 GB database, prstat will think that 50 GB of RAM is used when only 1 GB is really used. Good observation, Mike. FYI, this is bug 4754856 ( http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=4754856 ) Irma, are the apps in fmtest using alot of shared memory? I *think* that rcapd suffers from the same problem that prstat does and may cause undesirable behavior. Because of the way that it works, I fully expect that if rcapd begins to force pages out, the paging activity for the piggy workload will cause severe performance degredation for everything on the machine. My personal opinion (not backed by extensive testing) is that rcapd is more likely to do more harm than good. It is plausible, though not always practical, to measure the amount of shared pages for a particular zone during normal use, and factor that into the limits you specify to rcapd. It *is* easier to use rcapd safely with applications that do not use much shared memory. Bug the folks that are working on memory sets and swap sets to get this code out sooner than later. We are working very hard on those two feature sets. We have made a great deal of progress, especially on memory sets, which is the higher priority of the two. However, memory sets turned out to be more challenging than first expected. If running on sun4v, consider LDOM's when they are available (November?). LDOM's will avoid the problems described above, at the cost of some flexibility in resource efficiency - the same cost paid by all consolidation solutions that use muliple OS instances. For example, less RAM is used by sparse-root zones because multiple instances of a program (e.g. /bin/ls) share common memory pages. LDOMs (and other multi-OS-instance solutions) cannot do that. -- Jeff VICTOR Sun Microsystemsjeff.victor @ sun.com OS AmbassadorSr. Technical Specialist Solaris 10 Zones FAQ:http://www.opensolaris.org/os/community/zones/faq -- -- Irma Garcia Technical Support Engineer Phone:303-272-6420 [EMAIL PROTECTED] Submit/View/Update Cases at: http://www.sun.com/service/online ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Tape backup
I understand Legato doesn't work with ZFS yet. I looked through the email archives, cpio and tar were mentioned. What's is my best option if I want to dump approx 40G to tape? -Karen -- NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Tape backup
Karen Chau wrote: I understand Legato doesn't work with ZFS yet. I looked through the email archives, cpio and tar were mentioned. What's is my best option if I want to dump approx 40G to tape? Am I correct in saying that the issue was not getting the files to tape, but properly storing complex permissions and information about the filesystems? My read of the thread was that if you use classical Unix permissions (or don't mind manually resetting ACLs), and don't mind recreating all of the volumes manually, any traditional backup solution (like tar) will work fine. After all, you can stat and read the files on a zfs volume! -Luke smime.p7s Description: S/MIME Cryptographic Signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import: snv_33 to S10 6/06
On Wed, Aug 23, 2006 at 09:57:04AM -0400, James Foronda wrote: Hi, [EMAIL PROTECTED] cat /etc/release Solaris Nevada snv_33 X86 Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 06 February 2006 I have zfs running well on this box. Now, I want to upgrade to Solaris 10 6/06 release. Question: Will the 6/06 release recognize the zfs created by snv_33? I seem to recall something about being at a certain release level for 6/06 to be able to import without problems.. I searched the archives but I can't find where I read that anymore. Yes, new releases of Solaris can seamlessly access any ZFS pools created with Solaris Nevada or 10 (but not pools from before ZFS was integrated into Solaris, in October 2005). However, once you upgrade to build 35 or later (including S10 6/06), do not downgrade back to build 34 or earlier, per the following message: Summary: If you use ZFS, do not downgrade from build 35 or later to build 34 or earlier. This putback (into Solaris Nevada build 35) introduced a backwards- compatable change to the ZFS on-disk format. Old pools will be seamlessly accessed by the new code; you do not need to do anything special. However, do *not* downgrade from build 35 or later to build 34 or earlier. If you do so, some of your data may be inaccessible with the old code, and attemts to access this data will result in an assertion failure in zap.c. We have fixed the version-checking code so that if a similar change needs to be made in the future, the old code will fail gracefully with an informative error message. After upgrading, you should consider running 'zpool upgrade' to enable the latest features of ZFS, including ditto blocks, hot spares, and double-parity RAID-Z. --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Tape backup
Luke Scharf wrote: Karen Chau wrote: I understand Legato doesn't work with ZFS yet. I looked through the email archives, cpio and tar were mentioned. What's is my best option if I want to dump approx 40G to tape? Am I correct in saying that the issue was not getting the files to tape, but properly storing complex permissions and information about the filesystems? My read of the thread was that if you use classical Unix permissions (or don't mind manually resetting ACLs), and don't mind recreating all of the volumes manually, any traditional backup solution (like tar) will work fine. After all, you can stat and read the files on a zfs volume! That is not correct, at least with respect to Legato. The Legato software aborts the entire backup when it receives ENOSYS from the acl(2) syscall. Legato receives the ENOSYS because it was trying to find out how many POSIX draft ACL entries exist on a given file. Since ZFS doesn't support POSIX draft ACLs it returns ENOSYS. Whereas, other backup software take the ENOSYS to imply an unsupported operation and will continue to backup the data without ACLs. Netbackup will work, but it will silently drop ACLs on the floor. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Tape backup
The Legato software aborts the entire backup when it receives ENOSYS from the acl(2) syscall. Legato receives the ENOSYS because it was trying to find out how many POSIX draft ACL entries exist on a given file. Since ZFS doesn't support POSIX draft ACLs it returns ENOSYS. Whereas, other backup software take the ENOSYS to imply an unsupported operation and will continue to backup the data without ACLs. For those folks that like to live just *over* the edge and would like to use ACL-less backups on ZFS with existing networker clients, what is the possibility of creating a pre-loadable library that wrapped acl(2)? Have it just hand off the same result except return 0 when the actual call was an error set to ENOSYS. Backups would still have to mess with either legacy mounts or explicit save set specification, but those are much easier tasks. -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area This line left intentionally blank to confuse you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Tape backup
Luke Scharf [EMAIL PROTECTED] wrote: Karen Chau wrote: I understand Legato doesn't work with ZFS yet. I looked through the email archives, cpio and tar were mentioned. What's is my best option if I want to dump approx 40G to tape? Am I correct in saying that the issue was not getting the files to tape, but properly storing complex permissions and information about the filesystems? cpio is not a good idea, it is amongst other problems limited to 8 GB per file with the POSIX format, the SVr4 cpio format is limited to 4 GB per file. My read of the thread was that if you use classical Unix permissions (or don't mind manually resetting ACLs), and don't mind recreating all of the volumes manually, any traditional backup solution (like tar) will work fine. After all, you can stat and read the files on a zfs volume! If you don't need ZFS ACLs, I recommend star. It supports true incremental backups using a similar strategy than ufsdump for the incrementals. Star does not yet support ZFS ACLs but it does support UFS ACLs. Another feature of star star is of interest for backups is multi volume support in a way that allows you to start restoring with any volume (in case you don't like to do an incremental restore but are just looking for single files). Jörg -- EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin [EMAIL PROTECTED](uni) [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Tape backup
On Wed, 2006-08-23 at 14:38 -0700, Darren Dunham wrote: For those folks that like to live just *over* the edge and would like to use ACL-less backups on ZFS with existing networker clients, what is the possibility of creating a pre-loadable library that wrapped acl(2)? I may regret admitting this, but I've managed to implement something very much like this. Have it just hand off the same result except return 0 when the actual call was an error set to ENOSYS. That's what I thought, but networker gets upset when it's handed a zero-element acl. UFS provides a 4-element acl conveying the same information as file owner, group, and mode; my LD_PRELOAD hack had to do likewise. - Bill ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import: snv_33 to S10 6/06
On 24/08/2006, at 6:40 AM, Matthew Ahrens wrote: However, once you upgrade to build 35 or later (including S10 6/06), do not downgrade back to build 34 or earlier, per the following message: Summary: If you use ZFS, do not downgrade from build 35 or later to build 34 or earlier. This putback (into Solaris Nevada build 35) introduced a backwards- compatable change to the ZFS on-disk format. Old pools will be seamlessly accessed by the new code; you do not need to do anything special. However, do *not* downgrade from build 35 or later to build 34 or earlier. If you do so, some of your data may be inaccessible with the old code, and attemts to access this data will result in an assertion failure in zap.c. This reminds me of something that I meant to ask when this came up the first time. Isn't the whole point of the zpool upgrade process to allow users to decide when they want to remove the fall back to old version option? In other words shouldn't any change that eliminates going back to an old rev require an explicit zpool upgrade? Boyd ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import: snv_33 to S10 6/06
On Thu, Aug 24, 2006 at 08:12:34AM +1000, Boyd Adamson wrote: Isn't the whole point of the zpool upgrade process to allow users to decide when they want to remove the fall back to old version option? In other words shouldn't any change that eliminates going back to an old rev require an explicit zpool upgrade? Yes, that is exactly the case. Unfortunately, builds prior to 35 had some latent bugs which made implementation of 'zpool upgrade' nontrivial. Thus we issued this one-time do not downgrade message and promptly implemented 'zpool upgrade'. --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Need Help: didn't create the pool as radiz but stripes
I need help on this and don't know what to give to customer. System is V40z running Solaris 10 x86 and customer is trying to create 3 disks as Raidz. After creating the pool, looking at the disk space and configuration, he thinks that this is not raidz pool but rather stripes. THis is what exactly he told me so i'm not sure if this makes sense to all of you. Any assistance and help is greatly appreciated. THank you in advance, Arlina NOTE: Please email me directly as i'm not on this alias. Below are more informations. = Command used: # zpool create pool raidz c1t2d0 c1t3d0 c1t4d0 From the format command: 0. c1t0d0 DEFAULT cyl 8921 alt 2 hd 255 sec 63 /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci17c2,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 1. c1t2d0 FUJITSU-MAT3073NC-0104-68.49GB /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci17c2,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 2. c1t3d0 FUJITSU-MAT3073NC-0104-68.49GB /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci17c2,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 3. c1t4d0 FUJITSU-MAT3073NC-0104-68.49GB /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci17c2,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 The pool status: # zpool status pool: pool state: ONLINE scrub: none requested config: NAMESTATE (BREAD WRITE CKSUM poolONLINE 0 0 0 raidz ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 errors: No known data errors The df -k output of te newly created pool as raidz. # df -k Filesystemkbytesused avail capacity Mounted on pool 210567168 49 210567033 1%/pool I can create a file that is large as the stripe of the 3 disks. So the information reported is correct. Also, if I pull a disk out, the whole zpool fails! There is no degraded pools, it just fails. === ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Need Help: didn't create the pool as radiz but stripes
On 8/23/06, Arlina Goce-Capiral [EMAIL PROTECTED] wrote: I need help on this and don't know what to give to customer. System is V40z running Solaris 10 x86 and customer is trying to create 3 disks as Raidz. After creating the pool, looking at the disk space and configuration, he thinks that this is not raidz pool but rather stripes. THis is what exactly he told me so i'm not sure if this makes sense to all of you. Any assistance and help is greatly appreciated. THank you in advance, Arlina NOTE: Please email me directly as i'm not on this alias. Below are more informations. = Command used: # zpool create pool raidz c1t2d0 c1t3d0 c1t4d0 From the format command: 0. c1t0d0 DEFAULT cyl 8921 alt 2 hd 255 sec 63 /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci17c2,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 1. c1t2d0 FUJITSU-MAT3073NC-0104-68.49GB /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci17c2,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 2. c1t3d0 FUJITSU-MAT3073NC-0104-68.49GB /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci17c2,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 3. c1t4d0 FUJITSU-MAT3073NC-0104-68.49GB /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci17c2,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 The pool status: # zpool status pool: pool state: ONLINE scrub: none requested config: this right here shows its a raidz pool, there is a bug in update 2 that makes new pools show up with the wrong availible disk space, as he adds files to the pool, it will fix it self, i beleve teh fix is slated to go into update 3. NAMESTATE (BREAD WRITE CKSUM poolONLINE 0 0 0 raidz ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 James Dickens uadmin.blogspot.com errors: No known data errors The df -k output of te newly created pool as raidz. # df -k Filesystemkbytesused avail capacity Mounted on pool 210567168 49 210567033 1%/pool I can create a file that is large as the stripe of the 3 disks. So the information reported is correct. Also, if I pull a disk out, the whole zpool fails! There is no degraded pools, it just fails. === ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Need Help: didn't create the pool as radiz but stripes
Hello James, Thanks for the response. Yes. I got the bug id# and forwarded that to customer. But cu said that he can create a large file that is large as the stripe of the 3 disks. And if he pull a disk, the whole zpool failes, so there's no degraded pools, just fails. Any idea on this? Thank you,. Arlina- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] space accounting with RAID-Z
I just realized that I forgot to send this message to zfs-discuss back in May when I fixed this bug. Sorry for the delay. The putback of the following bug fix to Solaris Nevada build 42 and Solaris 10 update 3 build 3 (and coinciding with the change to ZFS on-disk version 3) changes the behavior of space accounting when using pools with raid-z: 6288488 du reports misleading size on RAID-Z The old behavior is that on raidz vdevs, the space used and available includes the space used to store the data redundantly (ie. the parity blocks). On mirror vdevs, and all other products' RAID-4/5 implementations, it does not, leading to confustion. Customers are accustomed to the redundant space not being reported, so this change makes zfs do that for raid-z devices as well. The new behavior applies to: (a) newly created pools (with version 3 or later) (b) old (version 1 or 2) pools which, when 'zpool upgrade'-ed, did not have any raid-z vdevs (but have since 'zpool add'-ed a raid-z vdev) Note that the space accounting behavior will never change on old raid-z pools. If the new behavior is desired, these pools must be backed up, destroyed, and re-'zpool create'-ed. The 'zpool list' output is unchanged (ie. it still includes the space used for parity information). This is bug 6308817 discrepancy between zfs and zpool space accounting. The reported space used may be slightly larger than the parity-free size because the amount of space used to store parity with RAID-Z varies somewhat with blocksize (eg. even small blocks need at least 1 sector of parity). On most workloads[*], the overwhelming majority of space is stored in 128k blocks, so this effect is typically not very pronounced. --matt [*] One workload where this effect can be noticable is when the 'recordsize' property has be decreased, eg. for a database or zvol. However, in this situation the rounding error space can be completely eliminated by using an appropriate number of disks in the raid-z group, according to the following table: exact optimal num. disks recordsize raidz1 raidz2 8k+ 3, 5 or 9 6, 10 or 18 4k3 or 5 6 or 10 2k3 6 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Need Help: didn't create the pool as radiz but stripes
On 24/08/2006, at 10:14 AM, Arlina Goce-Capiral wrote: Hello James, Thanks for the response. Yes. I got the bug id# and forwarded that to customer. But cu said that he can create a large file that is large as the stripe of the 3 disks. And if he pull a disk, the whole zpool failes, so there's no degraded pools, just fails. Any idea on this? The output of your zpool command certainly shows a raidz pool. It may be that the failing pool and the size issues are unrelated. How are they creating a huge file? It's not sparse is it? Compression involved? As to the failure mode, you may like to include any relevant /var/adm/ messages lines and errors from fmdump -e. Boyd ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss