Re: [zfs-discuss] Validating alignment of NTFS/VMDK/ZFS blocks
I have only heard of alignment being discussed in reference to block-based storage (like DASD/iSCSI/FC). I'm not really sure how it would work out over NFS. I do see why you are asking though. My understanding is that VMDK files are basically 'aligned' but the partitions inside of them may not be. You don't state what OS you are using in your guests. Windows XP/2003 and older create mis-alligned partitions by default (within a VMDK). You would need to manually create/adjust NTFS partitions in those cases in order for them to properly fall on a 4k boundary. This could be a cause of the problem you are describing. This doc from VMware is aimed at block-based storage but it has some concepts that might be helpful as well as info on aligning guest OS partitions: http://www.vmware.com/pdf/esx3_partition_align.pdf -Brian Chris Murray wrote: Good evening, I understand that NTFS VMDK do not relate to Solaris or ZFS, but I was wondering if anyone has any experience of checking the alignment of data blocks through that stack? I have a VMware ESX 4.0 host using storage presented over NFS from ZFS filesystems (recordsize 4KB). Within virtual machine VMDK files, I have formatted NTFS filesystems, block size 4KB. Dedup is turned on. When I run ZDB -DD, i see a figure of unique blocks which is higher than I expect, which makes me wonder whether any given 4KB in the NTFS filesystem is perfectly aligned with a 4KB block in ZFS? e.g. consider two virtual machines sharing lots of the same blocks. Assuming there /is/ a misalignment between NTFS VMDK/VMDK ZFS, if they're not in the same order within NTFS, they don't align, and will actually produce different blocks in ZFS: VM1 NTFS1---2---3--- ZFS 1---2---3---4--- ZFS blocks are AA, AABB and so on ... Then in another virtual machine, the blocks are in a different order: VM2 NTFS1---2---3--- ZFS 1---2---3---4--- ZFS blocks for this VM would be CC, CCAA, AABB etc. So, no overlap between virtual machines, and no benefit from dedup. I may have it wrong, and there are indeed 30,785,627 unique blocks in my setup, but if there's a mechanism for checking alignment, I'd find that very helpful. Thanks, Chris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS upgrade.
James Lever wrote: Is there a way to upgrade my current ZFS version. I show the version could be as high as 22. The version of Solaris you are running only suport ZFS versions up to version 15 as demonstrated by your zfs upgrade -v output. You probably need a newer version of Solaris, but I cannot tell you if any newer versions support later zfs versions. John, You are already running the Update 8 kernel (141444-09). That is the latest version of ZFS that is available for Solaris 10. -Brian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs/io performance on Netra X1
Bob Friesenhahn wrote: On Fri, 13 Nov 2009, Tim Cook wrote: If it is using parallel SCSI, perhaps there is a problem with the SCSI bus termination or a bad cable? SCSI? Try PATA ;) Is that good? I don't recall ever selecting that option when purchasing a computer. It seemed safer to stick with SCSI than to try exotic technologies. I hope you're being facetious. :-) http://en.wikipedia.org/wiki/Parallel_ATA The Netra X1 has two IDE channels, so it should be able to handle 2 disks without contention so long as only one disk is on each channel. OTOH, that machine is basically a desktop machine in a rack mount case (similar to a Blade 100) and is also vintage 2001. I wouldn't expect much performance out of it regardless. -Brian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20
Roland Rambau wrote: Richard, Tim, yes, one might envision the X4275 as OpenStorage appliances, but they are not. Exadata 2 is - *all* Sun hardware - *all* Oracle software (*) and that combination is now an Oracle product: a database appliance. Is there any reason the X4275 couldn't be an OpenStorage appliance? It seems like it would be a good fit. It doesn't seem specific to Exadata2. The F20 accelerator card isn't something specific to Exadata2 either is it? It looks like something that would benefit any kind of storage server. When I saw the F20 on the Sun site the other day, my first thought was Oh cool, they reinvented Prestoserve! -Brian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Why did my zvol shrink ?
I'm doing a little testing and I hit a strange point. Here is a zvol (clone) pool1/volclone type volume - pool1/volclone origin pool1/v...@diff1 - pool1/volclone reservation none default pool1/volclone volsize 191G - pool1/volclone volblocksize 8K - The zvol has UFS on it. It has always been 191G and we've never attempted to resize it. However, if I just try to grow it, it gives me an error: -bash-3.00# growfs /dev/zvol/rdsk/pool1/volclone 400555998 sectors current size of 400556032 sectors Is the zvol is somehow smaller than it was originally? How/why? It fsck ok, so UFS doesn't seem to notice. This is solaris 10 u6 currently, the machine (and zpool) have gone through a few update releases since creation. Thanks for any input, -Brian -- --- Brian H. Nelson Youngstown State University System Administrator Media and Academic Computing bnelson[at]cis.ysu.edu --- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Performance issue with zfs send of a zvol (Again)
Nobody can comment on this? -Brian Brian H. Nelson wrote: I noticed this issue yesterday when I first started playing around with zfs send/recv. This is on Solaris 10U6. It seems that a zfs send of a zvol issues 'volblocksize' reads to the physical devices. This doesn't make any sense to me, as zfs generally consolidates read/write requests to improve performance. Even the dd case with the same snapshot does not exhibit this behavior. It seems to be specific to zfs send. I checked with 8k, 64k, and 128k volblocksize, and the reads generated by zfs send always seem to follow that size, while the reads with dd do not. The small reads seems to hurt performance of zfs send. I tested with a mirror, but on another machine with a 7 disk raidz, the performance is MUCH worse because the 8k reads get broken up into even smaller reads and spread across the raidz. Is this a bug, or can someone explain why this is happening? Thanks -Brian Using 8k volblocksize: -bash-3.00# zfs send pool1/vo...@now /dev/null capacity operationsbandwidth pool used avail read write read write --- - - - - - - pool14.01G 274G 1.88K 0 15.0M 0 mirror 4.01G 274G 1.88K 0 15.0M 0 c0t9d0 - -961 0 7.46M 0 c0t11d0 - -968 0 7.53M 0 --- - - - - - - == ~8k reads to pool and drives -bash-3.00# dd if=/dev/zvol/dsk/pool1/vo...@now of=/dev/null bs=8k capacity operationsbandwidth pool used avail read write read write --- - - - - - - pool14.01G 274G 2.25K 0 17.9M 0 mirror 4.01G 274G 2.25K 0 17.9M 0 c0t9d0 - -108 0 9.00M 0 c0t11d0 - -109 0 8.92M 0 --- - - - - - - == ~8k reads to pool, ~85k reads to drives Using volblocksize of 64k: -bash-3.00# zfs send pool1/vol...@now /dev/null capacity operationsbandwidth pool used avail read write read write --- - - - - - - pool16.01G 272G378 0 23.5M 0 mirror 6.01G 272G378 0 23.5M 0 c0t9d0 - -189 0 11.8M 0 c0t11d0 - -189 0 11.7M 0 --- - - - - - - == ~64k reads to pool and drives -bash-3.00# dd if=/dev/zvol/dsk/pool1/vol...@now of=/dev/null bs=64k capacity operationsbandwidth pool used avail read write read write --- - - - - - - pool16.01G 272G414 0 25.7M 0 mirror 6.01G 272G414 0 25.7M 0 c0t9d0 - -107 0 12.9M 0 c0t11d0 - -106 0 12.8M 0 --- - - - - - - == ~64k reads to pool, ~124k reads to drives Using volblocksize of 128k: -bash-3.00# zfs send pool1/vol1...@now /dev/null capacity operationsbandwidth pool used avail read write read write --- - - - - - - pool14.01G 274G188 0 23.3M 0 mirror 4.01G 274G188 0 23.3M 0 c0t9d0 - - 94 0 11.7M 0 c0t11d0 - - 93 0 11.7M 0 --- - - - - - - == ~128k reads to pool and drives -bash-3.00# dd if=/dev/zvol/dsk/pool1/vol1...@now of=/dev/null bs=128k capacity operationsbandwidth pool used avail read write read write --- - - - - - - pool14.01G 274G247 0 30.8M 0 mirror 4.01G 274G247 0 30.8M 0 c0t9d0 - -122 0 15.3M 0 c0t11d0 - -123 0 15.5M 0 --- - - - - - - == ~128k reads to pool and drives -- --- Brian H. Nelson Youngstown State University System Administrator Media and Academic Computing bnelson[at]cis.ysu.edu --- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs destroy is taking a long time...
David Smith wrote: I was wondering if anyone has any experience with how long a zfs destroy of about 40 TB should take? So far, it has been about an hour... Is there any good way to tell if it is working or if it is hung? Doing a zfs list just hangs. If you do a more specific zfs list, then it is okay... zfs list pool/another-fs Thanks, David I can't voice to something like 40 TB, but I can share a related story (on Solaris 10u5). A couple days ago, I tried to zfs destroy a clone of a snapshot of a 191 GB zvol. It didn't complete right away, but the machine appeared to continue working on it, so I decided to let it go overnight (it was near the end of the day). Well, by about 4:00 am the next day, the machine had completely ran out of memory and hung. When I came in, I forced a sync from prom to get it back up. While it was booting, it stopped during (I think) the zfs initialization part, where it ran the disks for about 10 minutes before continuing. When the machine was back up, everything appeared to be ok. The clone was still there, although usage had changed to zero. I ended up patching the machine up to the latest u6 kernel + zfs patch (13-01 + 139579-01). After that, the zfs destroy went off without a hitch. I turned up bug 6606810 'zfs destroy volume is taking hours to complete' which is supposed to be fixed by 139579-01. I don't know if that was the cause of my issue or not. I've got a 2GB kernel dump if anyone is interested in looking. -Brian -- --- Brian H. Nelson Youngstown State University System Administrator Media and Academic Computing bnelson[at]cis.ysu.edu --- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Performance issue with zfs send of a zvol
I noticed this issue yesterday when I first started playing around with zfs send/recv. This is on Solaris 10U6. It seems that a zfs send of a zvol issues 'volblocksize' reads to the physical devices. This doesn't make any sense to me, as zfs generally consolidates read/write requests to improve performance. Even the dd case with the same snapshot does not exhibit this behavior. It seems to be specific to zfs send. I checked with 8k, 64k, and 128k volblocksize, and the reads generated by zfs send always seem to follow that size, while the reads with dd do not. The small reads seems to hurt performance of zfs send. I tested with a mirror, but on another machine with a 7 disk raidz, the performance is MUCH worse because the 8k reads get broken up into even smaller reads and spread across the raidz. Is this a bug, or can someone explain why this is happening? Thanks -Brian Using 8k volblocksize: -bash-3.00# zfs send pool1/vo...@now /dev/null capacity operationsbandwidth pool used avail read write read write --- - - - - - - pool14.01G 274G 1.88K 0 15.0M 0 mirror 4.01G 274G 1.88K 0 15.0M 0 c0t9d0 - -961 0 7.46M 0 c0t11d0 - -968 0 7.53M 0 --- - - - - - - == ~8k reads to pool and drives -bash-3.00# dd if=/dev/zvol/dsk/pool1/vo...@now of=/dev/null bs=8k capacity operationsbandwidth pool used avail read write read write --- - - - - - - pool14.01G 274G 2.25K 0 17.9M 0 mirror 4.01G 274G 2.25K 0 17.9M 0 c0t9d0 - -108 0 9.00M 0 c0t11d0 - -109 0 8.92M 0 --- - - - - - - == ~8k reads to pool, ~85k reads to drives Using volblocksize of 64k: -bash-3.00# zfs send pool1/vol...@now /dev/null capacity operationsbandwidth pool used avail read write read write --- - - - - - - pool16.01G 272G378 0 23.5M 0 mirror 6.01G 272G378 0 23.5M 0 c0t9d0 - -189 0 11.8M 0 c0t11d0 - -189 0 11.7M 0 --- - - - - - - == ~64k reads to pool and drives -bash-3.00# dd if=/dev/zvol/dsk/pool1/vol...@now of=/dev/null bs=64k capacity operationsbandwidth pool used avail read write read write --- - - - - - - pool16.01G 272G414 0 25.7M 0 mirror 6.01G 272G414 0 25.7M 0 c0t9d0 - -107 0 12.9M 0 c0t11d0 - -106 0 12.8M 0 --- - - - - - - == ~64k reads to pool, ~124k reads to drives Using volblocksize of 128k: -bash-3.00# zfs send pool1/vol1...@now /dev/null capacity operationsbandwidth pool used avail read write read write --- - - - - - - pool14.01G 274G188 0 23.3M 0 mirror 4.01G 274G188 0 23.3M 0 c0t9d0 - - 94 0 11.7M 0 c0t11d0 - - 93 0 11.7M 0 --- - - - - - - == ~128k reads to pool and drives -bash-3.00# dd if=/dev/zvol/dsk/pool1/vol1...@now of=/dev/null bs=128k capacity operationsbandwidth pool used avail read write read write --- - - - - - - pool14.01G 274G247 0 30.8M 0 mirror 4.01G 274G247 0 30.8M 0 c0t9d0 - -122 0 15.3M 0 c0t11d0 - -123 0 15.5M 0 --- - - - - - - == ~128k reads to pool and drives -- --- Brian H. Nelson Youngstown State University System Administrator Media and Academic Computing bnelson[at]cis.ysu.edu --- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Inexpensive ZFS home server
Jonathan Loran wrote: David Evans wrote: For anyone looking for a cheap home ZFS server... Dell is having a sale on their PowerEdge SC440 for $199 (regular $598) through 11/12/2008. http://www.dell.com/content/products/productdetails.aspx/pedge_sc440?c=uscs=04l=ens=bsd Its got Dual Core IntelĀ® PentiumĀ®E2180, 2.0GHz, 1MB Cache, 800MHz FSB and you can upgrade the memory (ECC too) to 2gb for 19$ bucks. @$199, I just ordered 2. dce I don't think the Pentium E2180 has the lanes to use ECC RAM. I'm also not confident the system board for this machine would make use of ECC memory either, which is not good from a ZFS perspective. How many SATA plugs are there on the MB in this guy? Jon ECC support is a function of the chipset AFAIK. That system has an Intel 3000 chipset which is stated to have ECC support. The Dell literature also states ECC support. I don't see any reason it wouldn't work as such. From the manual, it appears to have 4 SATA ports. For anyone contemplating buying one for home use, note that it has only PCIe x8, not x16 (for graphics cards). The SC440 is basically just a re-badged workstation. Nothing too exciting, but $199 is not a bad deal. -Brian -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs patches in latest sol10 u2 patch bundle
Manyam wrote: Hi ZFS gurus -- I have a v240 with solaris10 u2 release and ZFs - could you please tell me if by applying the latest patch bundle of update 2 -- I will get the all the ZFS patches installed as well ? It is possible to patch your way up to the U5 kernel and related patches, which should give you all the latest ZFS bits (available in Solaris anyways). I have done this from U3, but I believe coming from U2 wouldn't be much different. I assume that the required patches are in the latest bundle, but I believe 'smpatch update' is the prescribed method these days. Be aware that there is at least one obsolete patch that must be installed by hand in order to satisfy a dependency. I don't recall the patch number, but I know the dependant patch will print out a notice as such if the required patch is not installed. You will have to go through several patch-reboot iterations (one for each kernel patch, U2-U5) in order to get all the way there. Once your done patching, you should be able to do a 'zpool upgrade' to the current version (4). Depending on your situation though, it may just be easier to do an upgrade :) -Brian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to identify zpool version
Peter Hawkins wrote: Can zpool on U3 be patched to V4? I've applied the latest cluster and it still seems to be V3. Yes, you can patch your way up to the Sol 10 U4 kernel (or even U5 kernel) which will give you zpool v4 support. The particular patch you need is 120011-14 or 120012-14 (sparc or x86). There is at least one dependency patch that is obsolete (122660-10/122661-10) but must still be installed before the kernel patch will go in, so you may need to install one or two patches manually to get it working. http://mail.opensolaris.org/pipermail/zfs-discuss/2007-October/043331.html -Brian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [SOLVED] USB hard to ZFS
Andrius wrote: That is true, but # kill -HUP `pgrep vold` usage: kill [ [ -sig ] id ... | -l ] I think you already did this as per a previous message: # svcadm disable volfs As such, vold isn't running. Re-enable the service and you should be fine. -Brian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to identify zpool version
S10 U4 and U5 both use ZFS v4 (you specified your U4 machine as using v3). If you have access to both machines, you can do 'zpool upgrade -v' to confirm which versions are being used. -Brian Peter Hawkins wrote: By the way I'm sure the pool was created using S10 Update 5 This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] The ZFS inventor and Linus sitting in a tree?
Keith Bierman wrote: Not being a lawyer, and this not being a Legal forum ... can we leave license analysis alone? The GNU _itself_ states that it is not allowable in plain English. Why people continue to argue about it is beyond me :-) Common Development and Distribution License (CDDL) http://www.opensolaris.org/os/licensing/cddllicense.txt This is a free software license. It has a copyleft with a scope that's similar to the one in the Mozilla Public License, which makes it incompatible with the GNU GPL http://www.gnu.org/licenses/gpl.html. This means a module covered by the GPL and a module covered by the CDDL cannot legally be linked together. We urge you not to use the CDDL for this reason. Also unfortunate in the CDDL is its use of the term intellectual property http://www.gnu.org/philosophy/not-ipr.html. (from http://www.gnu.org/licenses/license-list.html#SoftwareLicenses) -Brian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Need help with a dead disk
Here's a bit more info. The drive appears to have failed at 22:19 EST but it wasn't until 1:30 EST the next day that the system finally decided that it was bad. (Why?) Here's some relevant log stuff (with lots of repeated 'device not responding' errors removed) I don't know if it will be useful: Feb 11 22:19:09 maxwell scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/SUNW,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd32): Feb 11 22:19:09 maxwell SCSI transport failed: reason 'incomplete': retrying command Feb 11 22:19:10 maxwell scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/SUNW,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd32): Feb 11 22:19:10 maxwell disk not responding to selection ... Feb 11 22:21:08 maxwell scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/SUNW,[EMAIL PROTECTED] (isp0): Feb 11 22:21:08 maxwell SCSI Cable/Connection problem. Feb 11 22:21:08 maxwell scsi: [ID 107833 kern.notice] Hardware/Firmware error. Feb 11 22:21:08 maxwell scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/SUNW,[EMAIL PROTECTED] (isp0): Feb 11 22:21:08 maxwell Fatal error, resetting interface, flg 16 ... (Why did this take so long?) Feb 12 01:30:05 maxwell scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/SUNW,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd32): Feb 12 01:30:05 maxwell offline ... Feb 12 01:30:22 maxwell fmd: [ID 441519 daemon.error] SUNW-MSG-ID: ZFS-8000-D3, TYPE: Fault, VER: 1, SEVERITY: Major Feb 12 01:30:22 maxwell EVENT-TIME: Tue Feb 12 01:30:22 EST 2008 Feb 12 01:30:22 maxwell PLATFORM: SUNW,Ultra-250, CSN: -, HOSTNAME: maxwell Feb 12 01:30:22 maxwell SOURCE: zfs-diagnosis, REV: 1.0 Feb 12 01:30:22 maxwell EVENT-ID: 7f48f376-2eb1-ccaf-afc5-e56f5bf4576f Feb 12 01:30:22 maxwell DESC: A ZFS device failed. Refer to http://sun.com/msg/ZFS-8000-D3 for more information. Feb 12 01:30:22 maxwell AUTO-RESPONSE: No automated response will occur. Feb 12 01:30:22 maxwell IMPACT: Fault tolerance of the pool may be compromised. Feb 12 01:30:22 maxwell REC-ACTION: Run 'zpool status -x' and replace the bad device. One thought I had was to unconfigure the bad disk with cfgadm. Would that force the system back into the 'offline' response? Thanks, -Brian Brian H. Nelson wrote: Ok. I think I answered my own question. ZFS _didn't_ realize that the disk was bad/stale. I power-cycled the failed drive (external) to see if it would come back up and/or run diagnostics on it. As soon as I did that, ZFS put the disk ONLINE and started using it again! Observe: bash-3.00# zpool status pool: pool1 state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: none requested config: NAME STATE READ WRITE CKSUM pool1ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c0t9d0 ONLINE 0 0 0 c0t10d0 ONLINE 0 0 0 c0t11d0 ONLINE 0 0 0 c0t12d0 ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c2t2d0 ONLINE 2.11K 20.09 0 errors: No known data errors Now I _really_ have a problem. I can't offline the disk myself: bash-3.00# zpool offline pool1 c2t2d0 cannot offline c2t2d0: no valid replicas I don't understand why, as 'zpool status' says all the other drives are OK. What's worse, if I just power off the drive in question (trying to get back to where I started) the zpool hangs completely! I let it go for about 7 minutes thinking maybe there was some timeout, but still nothing. Any command that would access the zpool (including 'zpool status') hangs. The only way to fix is to power the external disk back on upon which everything starts working like nothing has happened. Nothing gets logged other than lots of these only while the drive is powered off: Feb 12 11:49:32 maxwell scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/SUNW,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd32): Feb 12 11:49:32 maxwell disk not responding to selection Feb 12 11:49:32 maxwell scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/SUNW,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd32): Feb 12 11:49:32 maxwell offline or reservation conflict Feb 12 11:49:32 maxwell scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/SUNW,[EMAIL PROTECTED]/[EMAIL PROTECTED],0
[zfs-discuss] Need help with a dead disk (was: ZFS keeps trying to open a dead disk: lots of logging)
Ok. I think I answered my own question. ZFS _didn't_ realize that the disk was bad/stale. I power-cycled the failed drive (external) to see if it would come back up and/or run diagnostics on it. As soon as I did that, ZFS put the disk ONLINE and started using it again! Observe: bash-3.00# zpool status pool: pool1 state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: none requested config: NAME STATE READ WRITE CKSUM pool1ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c0t9d0 ONLINE 0 0 0 c0t10d0 ONLINE 0 0 0 c0t11d0 ONLINE 0 0 0 c0t12d0 ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c2t2d0 ONLINE 2.11K 20.09 0 errors: No known data errors Now I _really_ have a problem. I can't offline the disk myself: bash-3.00# zpool offline pool1 c2t2d0 cannot offline c2t2d0: no valid replicas I don't understand why, as 'zpool status' says all the other drives are OK. What's worse, if I just power off the drive in question (trying to get back to where I started) the zpool hangs completely! I let it go for about 7 minutes thinking maybe there was some timeout, but still nothing. Any command that would access the zpool (including 'zpool status') hangs. The only way to fix is to power the external disk back on upon which everything starts working like nothing has happened. Nothing gets logged other than lots of these only while the drive is powered off: Feb 12 11:49:32 maxwell scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/SUNW,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd32): Feb 12 11:49:32 maxwell disk not responding to selection Feb 12 11:49:32 maxwell scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/SUNW,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd32): Feb 12 11:49:32 maxwell offline or reservation conflict Feb 12 11:49:32 maxwell scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/SUNW,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd32): Feb 12 11:49:32 maxwell i/o to invalid geometry What's going on here? What can I do to make ZFS let go of the bad drive? This is a production machine and I'm getting concerned. I _really_ don't like the fact that ZFS is using a suspect drive, but I can't seem to make it stop! Thanks, -Brian Brian H. Nelson wrote: This is Solaris 10U3 w/127111-05. It appears that one of the disks in my zpool died yesterday. I got several SCSI errors finally ending with 'device not responding to selection'. That seems to be all well and good. ZFS figured it out and the pool is degraded: maxwell /var/adm zpool status pool: pool1 state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-D3 scrub: none requested config: NAME STATE READ WRITE CKSUM pool1DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 c0t9d0 ONLINE 0 0 0 c0t10d0 ONLINE 0 0 0 c0t11d0 ONLINE 0 0 0 c0t12d0 ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c2t2d0 UNAVAIL 1.88K 17.98 0 cannot open errors: No known data errors My question is why does ZFS keep attempting to open the dead device? At least that's what I assume is happening. About every minute, I get eight of these entries in the messages log: Feb 12 10:15:54 maxwell scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/SUNW,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd32): Feb 12 10:15:54 maxwell disk not responding to selection I also got a number of these thrown in for good measure: Feb 11 22:21:58 maxwell scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/SUNW,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd32): Feb 11 22:21:58 maxwell SYNCHRONIZE CACHE command failed (5) Since the disk died last night (at about 11:20pm EST) I now have over 15K of similar entries in my log. What gives? Is this expected behavior? If ZFS knows the device is having problems, why does it not just leave
[zfs-discuss] ZFS keeps trying to open a dead disk: lots of logging
This is Solaris 10U3 w/127111-05. It appears that one of the disks in my zpool died yesterday. I got several SCSI errors finally ending with 'device not responding to selection'. That seems to be all well and good. ZFS figured it out and the pool is degraded: maxwell /var/adm zpool status pool: pool1 state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-D3 scrub: none requested config: NAME STATE READ WRITE CKSUM pool1DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 c0t9d0 ONLINE 0 0 0 c0t10d0 ONLINE 0 0 0 c0t11d0 ONLINE 0 0 0 c0t12d0 ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c2t2d0 UNAVAIL 1.88K 17.98 0 cannot open errors: No known data errors My question is why does ZFS keep attempting to open the dead device? At least that's what I assume is happening. About every minute, I get eight of these entries in the messages log: Feb 12 10:15:54 maxwell scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/SUNW,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd32): Feb 12 10:15:54 maxwell disk not responding to selection I also got a number of these thrown in for good measure: Feb 11 22:21:58 maxwell scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/SUNW,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd32): Feb 11 22:21:58 maxwell SYNCHRONIZE CACHE command failed (5) Since the disk died last night (at about 11:20pm EST) I now have over 15K of similar entries in my log. What gives? Is this expected behavior? If ZFS knows the device is having problems, why does it not just leave it alone and wait for user intervention? Also, I noticed that the 'action' says to attach the device and 'zpool online' it. Am I correct in assuming that a 'zpool replace' is what would really be needed, as the data on the disk will be outdated? Thanks, -Brian -- --- Brian H. Nelson Youngstown State University System Administrator Media and Academic Computing bnelson[at]cis.ysu.edu --- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Do we have a successful installation method for patch 120011-14?
Manually installing the obsolete patch 122660-10 has worked fine for me. Until sun fixes the patch dependencies, I think that is the easiest way. -Brian Bruce Shaw wrote: It fails on my machine because it requires a patch that's deprecated. This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- --- Brian H. Nelson Youngstown State University System Administrator Media and Academic Computing bnelson[at]cis.ysu.edu --- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Do we have a successful installation method for patch 120011-14?
It was 120272-12 that caused ths snmp.conf problem and was withdrawn. 120272-13 has replaced it and has that bug fixed. 122660-10 does not have any issues that I am aware of. It is only obsolete, not withdrawn. Additionally, it appears that the circular patch dependency is by design if you read this BugID: 6574472 U4 feature Ku's need to hard require a patch that enforces zoneadmd patch is installed So hacking the prepatch script for 125547-02/125548-02 to bypass the dependency check (as others have recommended) is a BAD THING and you may wind up with a broken system. -Brian Rob Windsor wrote: Yeah, the only thing wrong with that patch is that it eats /etc/sma/snmp/snmpd.conf All is not lost, your original is copied to /etc/sma/snmp/snmpd.conf.save in the process. Rob++ Brian H. Nelson wrote: Manually installing the obsolete patch 122660-10 has worked fine for me. Until sun fixes the patch dependencies, I think that is the easiest way. -Brian -- --- Brian H. Nelson Youngstown State University System Administrator Media and Academic Computing bnelson[at]cis.ysu.edu --- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS array NVRAM cache?
Vincent Fox wrote: It seems like ZIL is a separate issue. I have read that putting ZIL on a separate device helps, but what about the cache? OpenSolaris has some flag to disable it. Solaris 10u3/4 do not. I have dual-controllers with NVRAM and battery backup, why can't I make use of it? Would I be wasting my time to mess with this on 3310 and 3510 class equipment? I would think it would help but perhaps not. I'm probably being really daft in thinking that everyone is overlooking the obvious, but... Is this what you're referring to? http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Cache_Flushes -Brian -- --- Brian H. Nelson Youngstown State University System Administrator Media and Academic Computing bnelson[at]cis.ysu.edu --- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
Stephen Usher wrote: Brian H. Nelson: I'm sure it would be interesting for those on the list if you could outline the gotchas so that the rest of us don't have to re-invent the wheel... or at least not fall down the pitfalls. I believe I ran into one or both of these bugs: 6429996 zvols don't reserve enough space for requisite meta data 6430003 record size needs to affect zvol reservation size on RAID-Z Basically what happened was that the zpool filled to 100% and broke UFS with 'no space left on device' errors. This was quite strange to sort out since the UFS zvol had 30GB of free space. I never got any replies to my request for more info and/or workarounds for the above bugs. My workaround and recommendation is to leave a 'healthy' amount of un-allocated space in the zpool. I don't know what a good level for 'healthy' is. Currently I've left about 1% (2GB) on a 200GB raid-z pool. -Brian -- --- Brian H. Nelson Youngstown State University System Administrator Media and Academic Computing bnelson[at]cis.ysu.edu --- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
Mike Gerdts wrote: The UFS on zvols option sounds intriguing to me, but I would guess that the following could be problems: 1) Double buffering: Will ZFS store data in the ARC while UFS uses traditional file system buffers? This is probably an issue. You also have the journal+COW combination issue. I'm guessing that both would be performance concerns. My application is relatively low bandwidth, so I haven't dug deep into this area. 2) Boot order dependencies. How does the startup of zfs compare to processing of /etc/vfstab? I would guess that this is OK due to legacy mount type supported by zfs. If this is OK, then dfstab processing is probably OK. Zvols by nature are not available under ZFS automatic mounting. You would need to add the /dev/zvol/dsk/... lines to /etc/vfstab just as you would for any other /dev/dsk... or /dev/md/dsk/... devices. If you are not using the z_pool_ for anything else, I would remove the automatic mount point for it. -Brian -- --- Brian H. Nelson Youngstown State University System Administrator Media and Academic Computing bnelson[at]cis.ysu.edu --- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
Stephen Usher wrote: Brian H. Nelson: I'm sure it would be interesting for those on the list if you could outline the gotchas so that the rest of us don't have to re-invent the wheel... or at least not fall down the pitfalls. Also, here's a link to the ufs on zvol blog where I originally found the idea: http://blogs.sun.com/scottdickson/entry/fun_with_zvols_-_ufs -Brian -- --- Brian H. Nelson Youngstown State University System Administrator Media and Academic Computing bnelson[at]cis.ysu.edu --- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?
Mike Gerdts wrote: Having worked in academia and multiple Fortune 100's, the problem seems to be most prevalent in academia, although possibly a minor inconvenience in some engineering departments in industry. In the .edu where I used to manage the UNIX environment, I would have a tough time weighing the complexities of quotas he mentions vs. the other niceties. My guess is that unless I had something that was really broken, I would stay with UFS or VxFS waiting for a fix. UFS on a zvol is a pretty good compromise. You get lots of the nice ZFS stuff (checksums, raidz/z2, snapshots, growable pool, etc) with no changes in userland. There are a couple gotcha's but as long as you're aware of them, it works pretty good. We've been using it since January. -Brian -- --- Brian H. Nelson Youngstown State University System Administrator Media and Academic Computing bnelson[at]cis.ysu.edu --- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS overhead killed my ZVOL
Can anyone comment? -Brian Brian H. Nelson wrote: Adam Leventhal wrote: On Tue, Mar 20, 2007 at 06:01:28PM -0400, Brian H. Nelson wrote: Why does this happen? Is it a bug? I know there is a recommendation of 20% free space for good performance, but that thought never occurred to me when this machine was set up (zvols only, no zfs proper). It sounds like this bug: 6430003 record size needs to affect zvol reservation size on RAID-Z Adam Could be, but 6429996 sounds like a more likely candidate: zvols don't reserve enough space for requisite meta data. I can create some large files (2GB) and the 'available' space only decreases by .01-.04GB for each file. The raidz pool is 7x36GB disks, with the default 8k volblocksize. Would/should 6430003 affect me? I don't understand what determines minimum allocatable size and the number of 'skipped' sectors for a given situation. Either way, my main concern is that I can address the problem so that the same situation does not reoccur. Are there workarounds for these bugs? How can I determine how much space needs to be reserved? How much (if any) of the remaining free space could be used for an additional zvol (with its own allocation of reserved space)? Thanks, Brian -- --- Brian H. Nelson Youngstown State University System Administrator Media and Academic Computing bnelson[at]cis.ysu.edu --- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- --- Brian H. Nelson Youngstown State University System Administrator Media and Academic Computing bnelson[at]cis.ysu.edu --- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS overhead killed my ZVOL
Dear list, Solaris 10 U3 on SPARC. I had a 197GB raidz storage pool. Within that pool, I had allocated a 191GB zvol (filesystem A), and a 6.75GB zvol (filesystem B). These used all but a couple hundred K of the zpool. Both zvols contained UFS filesystems with logging enabled. The (A) filesystem was about 79% full. (B) was also nearly full, but unmounted and not being used. This configuration worked happily for a bit over two months. Then the other day, a user decided to copy (cp) about 11GB worth of video files within (A). This caused UFS to choke as such: Mar 9 17:34:43 maxwell ufs: [ID 702911 kern.warning] WARNING: Error writing master during ufs log roll Mar 9 17:34:43 maxwell ufs: [ID 127457 kern.warning] WARNING: ufs log for /export/home/engr changed state to Error Mar 9 17:34:43 maxwell ufs: [ID 616219 kern.warning] WARNING: Please umount(1M) /export/home/engr and run fsck(1M) I do as the message says: unmount and attempt to fsck. I am then bombarded with thousands of errors, BUT fsck can not fix them due to 'no space left on device'. That's right, the filesystem with about 30GB free didn't have enough free space to fsck. Strange. After messing with the machine all weekend, rebooting, calling coworkers (other sys admins), calling sun, scratching my head, etc.. The solution ended up being to _delete the (B) zvol_ (which contained only junk data). Once that was done, fsck ran all the way through without problems (besides wiping all my ACLs) and things were happy again. So I surmised that ZFS ran out of space to do it's thing, and for whatever reason, that 'out of space' got pushed down into the zvol as well, causing fsck to choke. I _have_ been able to reproduce the situation on a test machine, but not reliably. It basically comprises of setting up two zvols that take up almost all of the pool space, newfsing them, filling one up to about 90% full, then looping though copys of 1/2 of the remaining space until it dies. (So for a 36GB pool, create a 34GB zvol and a 2.xxGB zvol. newfs them. Mount the larger one. Create a 30GB junk file. Create a directory of say 5 files worth about 2GB total. Then do 'while true; do copy -r dira dirb;done' until it fails. Sometimes it does, sometimes not.) Why does this happen? Is it a bug? I know there is a recommendation of 20% free space for good performance, but that thought never occurred to me when this machine was set up (zvols only, no zfs proper). I think it is a bug simply because it _allowed_ me to create a configuration that didn't leave enough room for overhead. There isn't a whole lot of info surrounding zvol. Does the 80% free rule still apply to the underlining zfs if only zvols are used? That would be really unfortunate. I think most people wanting to use a zvol would want to use 100% of a pool toward the zvol. -Brian -- --- Brian H. Nelson Youngstown State University System Administrator Media and Academic Computing bnelson[at]cis.ysu.edu --- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] UFS on zvol: volblocksize and maxcontig
Hi all! First off, if this has been discussed, please point me in that direction. I have searched high and low and really can't find much info on the subject. We have a large-ish (200gb) UFS file system on a Sun Enterprise 250 that is being shared with samba (lots of files, mostly random IO). OS is Solaris 10u3. Disk set is 7x36gb 10k scsi, 4 internal 3 external. For several reasons we currently need to stay on UFS and can't switch to ZFS proper. So instead we have opted to do UFS on a zvol using raid-z, in lieu of UFS on SVM using raid5 (we want/need raid protection). This decision was made because of the ease of disk set portability of zpools, and also the [assumed] performance benefit vs SVM. Anyways, I've been pondering the volblocksize parameter, and trying to figure out how it interacts with UFS. When the zvol was setup, I took the default 8k size. Since UFS uses an 8k blocksize, this seemed to be a reasonable choice. I've been thinking more about it lately, and have also read that UFS will do R/W in bigger than 8k blocks when it can, up to maxcontig (default of 16, ie 128k). This presented me with several questions: Would a volblocksize of 128k and maxcontig 16 provide better UFS performance? Overall, or only in certain situations (ie only for sequential IO)? Would increasing the maxcontig beyond 16 make any difference (good, bad or indifferent) if the underlying device is limited to 128k blocks? What exactly does volblocksize control? My observations thus far indicate that it simply sets a max block size for the [virtual] zvol device. Changing volblocksize does NOT seem to have an impact on IOs to the underlying physical disks, which always seem to float in the 50-110k range). How does volblocksize affect IO that is not of a set block size? Finally, why does volblocksize only affects raidz and mirror devices? It seems to have no effect on 'simple' devices, even though I presume striping is still used there. That is also assuming that volblocksize interacts with striping. Any answers or input is greatly appreciated. Thanks much! -Brian -- --- Brian H. Nelson Youngstown State University System Administrator Media and Academic Computing bnelson[at]cis.ysu.edu --- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] UFS on zvol: volblocksize and maxcontig
Darren J Moffat wrote: Brian H. Nelson wrote: For several reasons we currently need to stay on UFS and can't switch to ZFS proper. So instead we have opted to do UFS on a zvol using raid-z, Can you state what those reasons are please ? I know that isn't answering the question you are asking but it is worth making sure you have the correct info. I'd also like to understand why UFS works for you but ZFS as a filesystem does not. I knew someone would ask that :) The primary reason is that our backup software (EMC/Legato Networker 7.2) does not appear to support zfs. We don't have the funds currently to upgrade to the new version that does. The other reason is that the machine has been around for years, already using UFS and quotas extensively. Over winter break we had time to upgrade to Solaris 10 and migrate the volume from svm to zvol, but not much more.There are a few thousand users on the machine. The thought of transitioning to that many zfs 'partitions' in order to have per-user quotas seemed daunting, not to mention the administrative re-training needed (edquota doesn't work. du is reporting 3000 filesystems?! etc). IMO, the quota-per-file-system approach seems inconvenient when you get past a handful of file systems. Unless I'm really missing something, it just seems like a nightmare to have to deal with such a ridiculous number of file systems. -Brian -- --- Brian H. Nelson Youngstown State University System Administrator Media and Academic Computing bnelson[at]cis.ysu.edu --- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] UFS on zvol: volblocksize and maxcontig
[EMAIL PROTECTED] wrote: *snip* IMO, the quota-per-file-system approach seems inconvenient when you get past a handful of file systems. Unless I'm really missing something, it just seems like a nightmare to have to deal with such a ridiculous number of file systems. Why? What additional per-filesystem overhead from a maintenance perspective are you seeing? Casper The obvious example would be /var/mail . UFS quotas are easy. Doing the same thing with ZFS would be (I think) impossible. You would have to completely convert and existing system to a maildir or home directory mail storage setup. Other file-system-specific software could also have issues. Networker for instance does backups per filesystem. In that situation I could then possibly have ~3000 backup sets DAILY for a single machine (worst case, that each file system has changes). Granted, that may not be better or worse, just 'different' and not what I'm used to. On the other hand, I could certainly see where that could add a ton of overhead to backup processing. Don't get me wrong, zfs quotas are a good thing, and could certainly be useful in many situations. I just don't think I agree that they are a one to one replacement for ufs quotas in terms of usability in all situations. -Brian -- --- Brian H. Nelson Youngstown State University System Administrator Media and Academic Computing bnelson[at]cis.ysu.edu --- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss