Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
I'm thinking that the issue is simply with zfs destroy, not with dedup or compression. Yesterday I decided to do some iscsi testing, I created a new dataset in my pool, 1TB. I did not use compression or dedup. After copying about 700GB of data from my windows box (NTFS on top of the iscsi disk), I decided I didn't want to use it, so I attempted to delete the dataset. Once again, the command froze. I removed the zfs cache file and am now trying to import my pool... again. This time, the memory fills up QUICKLY, I hit 8GB used in about an hour, then the box completely freezes. iostat shows each of my disks being read at about 10 megs/S up until the freeze. It does not matter if I limit l2arc size in /etc/system, the behavior is the same. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] (Practical) limit on the number of snapshots?
By having a snapshot you are not releasing the space forcing zfs to allocate new space from other parts of a disk drive. This may lead (depending on workload) to more fragmentation, less localized data (more and longer seeks). ZFS uses COW (copy on write) during writes. This means that it first has to find a new location for the data and when this data is written, the original block is released. When using snapshots, the original block is not released. I don't think the use of snapshots will alter the way data is fragmented or localized on disk. --- PeterVG -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] $100 SSD = 5x faster dedupe
--- On Thu, 1/7/10, Tiernan OToole lsmart...@gmail.com wrote: Sorry to hijack the thread, but can you explain your setup? Sounds interesting, but need more info... This is just a home setup to amuse me and placate my three boys, each of whom has several Windows instances running under Virtualbox. Server is a Sun v40z: quad 2.4 GHz Opteron with 16GB. Internal bays hold a pair of 73GB drives as a mirrored rpool and a pair of 36GB drives for spares to the array plus a 146GB drive I use as cache to the usb pool (a single 320GB sata drive). The array is an HP MSA30 with 14x36GB drives configured as RAIDZ3 using the spares listed above with auto snapshots as the tank pool. Tank is synchronized hourly to the usb pool. It's all connected via four HP 4000M switches (one at the server and one at each workstation) which are meshed via gigabit fiber. Two workstations are triple-head sunrays. One station is a single sunray 150 integrated unit. This is a work in progress with plenty of headroom to grow. I started the build in November and have less than $1200 into it so far. Thanks for letting me hijack the thread by sharing! Cheers, Marty ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [zones-discuss] Zones on shared storage - a warning
On Wed, 23 Dec 2009 03:02:47 +0100, Mike Gerdts mger...@gmail.com wrote: I've been playing around with zones on NFS a bit and have run into what looks to be a pretty bad snag - ZFS keeps seeing read and/or checksum errors. This exists with S10u8 and OpenSolaris dev build snv_129. This is likely a blocker for anything thinking of implementing parts of Ed's Zones on Shared Storage: http://hub.opensolaris.org/bin/view/Community+Group+zones/zoss The OpenSolaris example appears below. The order of events is: 1) Create a file on NFS, turn it into a zpool 2) Configure a zone with the pool as zonepath 3) Install the zone, verify that the pool is healthy 4) Boot the zone, observe that the pool is sick [...] r...@soltrain19# zoneadm -z osol boot r...@soltrain19# zpool status osol pool: osol state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: none requested config: NAME STATE READ WRITE CKSUM osol DEGRADED 0 0 0 /mnt/osolzone/root DEGRADED 0 0 117 too many errors errors: No known data errors Hey Mike, you're not the only victim of these strange CHKSUM errors, I hit the same during my slightely different testing, where I'm NFS mounting an entire, pre-existing remote file living in the zpool on the NFS server and use that to create a zpool and install zones into it. I've filed today: 6915265 zpools on files (over NFS) accumulate CKSUM errors with no apparent reason here's the relevant piece worth investigating out of it (leaving out the actual setup etc..) as in your case, creating the zpool and installing the zone into it still gives a healthy zpool, but immediately after booting the zone, the zpool served over NFS accumulated CHKSUM errors. of particular interest are the 'cksum_actual' values as reported by Mike for his test case here: http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg33041.html if compared to the 'chksum_actual' values I got in the fmdump error output on my test case/system: note, the NFS servers zpool that is serving and sharing the file we use is healthy. zone halted now on my test system, and checking fmdump: osoldev.batschul./export/home/batschul.= fmdump -eV | grep cksum_actual | sort | uniq -c | sort -n | tail 2cksum_actual = 0x4bea1a77300 0xf6decb1097980 0x217874c80a8d9100 0x7cd81ca72df5ccc0 2cksum_actual = 0x5c1c805253 0x26fa7270d8d2 0xda52e2079fd74 0x3d2827dd7ee4f21 6cksum_actual = 0x28e08467900 0x479d57f76fc80 0x53bca4db5209300 0x983ddbb8c4590e40 *A 6cksum_actual = 0x348e6117700 0x765aa1a547b80 0xb1d6d98e59c3d00 0x89715e34fbf9cdc0 *B 7cksum_actual = 0x0 0x0 0x0 0x0 *C 11cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00 0x280934efa6d20f40 *D 14cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400 0x7e0aef335f0c7f00 *E 17cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800 0xd4f1025a8e66fe00 *F 20cksum_actual = 0xbaddcafe00 0x5dcc54647f00 0x1f82a459c2aa00 0x7f84b11b3fc7f80 *G 25cksum_actual = 0x5d6ee57f00 0x178a70d27f80 0x3fc19c3a19500 0x82804bc6ebcfc0 osoldev.root./export/home/batschul.= zpool status -v pool: nfszone state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: none requested config: NAMESTATE READ WRITE CKSUM nfszone DEGRADED 0 0 0 /nfszone DEGRADED 0 0 462 too many errors errors: No known data errors == now compare this with Mike's error output as posted here: http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg33041.html # fmdump -eV | grep cksum_actual | sort | uniq -c | sort -n | tail 2cksum_actual = 0x14c538b06b6 0x2bb571a06ddb0 0x3e05a7c4ac90c62 0x290cbce13fc59dce *D 3cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400 0x7e0aef335f0c7f00 *E 3cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800 0xd4f1025a8e66fe00 *B 4cksum_actual = 0x0 0x0 0x0 0x0 4cksum_actual = 0x1d32a7b7b00 0x248deaf977d80 0x1e8ea26c8a2e900 0x330107da7c4bcec0 5cksum_actual = 0x14b8f7afe6 0x915db8d7f87 0x205dc7979ad73 0x4e0b3a8747b8a8 *C 6cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00 0x280934efa6d20f40 *A 6
Re: [zfs-discuss] [zones-discuss] Zones on shared storage - a warning
Frank Batschulat (Home) wrote: This just can't be an accident, there must be some coincidence and thus there's a good chance that these CHKSUM errors must have a common source, either in ZFS or in NFS ? What are you using for on the wire protection with NFS ? Is it shared using krb5i or do you have IPsec configured ? If not I'd recommend trying one of those and see if your symptoms change. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Detecting quota limits
Hi List, We create a zfs filesystem for each user's homedir. I would like to monitor their usage and when the user approaches his quota I would like to receive a warning by mail. Does anybody have a script available which does this job and can be run using a cron job. Or even better, is this a build in feature of zfs? thanks, Martijn -- YoungGuns Kasteleinenkampweg 7b 5222 AX 's-Hertogenbosch T. 073 623 56 40 F. 073 623 56 39 www.youngguns.nl KvK 18076568 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] (Practical) limit on the number of snapshots?
On 08/01/2010 12:40, Peter van Gemert wrote: By having a snapshot you are not releasing the space forcing zfs to allocate new space from other parts of a disk drive. This may lead (depending on workload) to more fragmentation, less localized data (more and longer seeks). ZFS uses COW (copy on write) during writes. This means that it first has to find a new location for the data and when this data is written, the original block is released. When using snapshots, the original block is not released. I don't think the use of snapshots will alter the way data is fragmented or localized on disk. --- PeterVG Well, it will (depending on workload). For example - lets say you have a 80GB disk drive as a pool with a single db file which is 1GB in size. Now no snapshots are created and you constantly are modyfing logical blocks in the file. As ZFS will release the old block and will re-use it later on so all current data should be roughly within the first 2GB of the disk drive therefore highly localized. Now if you would create a snapshot while modyfing data, then another one and another one, you would end-up in a situation where free blocks are availably further and further onto a disk drive. When you end-up almost filling the disk drive even if you delete all snapshots now your active data will be scattered all over the disk (assuming you were not modyfing 100% of data between creating snapshots). It won't be highly localized anymore. -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [zones-discuss] Zones on shared storage - a warning
On Fri, Jan 8, 2010 at 6:55 AM, Darren J Moffat darr...@opensolaris.org wrote: Frank Batschulat (Home) wrote: This just can't be an accident, there must be some coincidence and thus there's a good chance that these CHKSUM errors must have a common source, either in ZFS or in NFS ? What are you using for on the wire protection with NFS ? Is it shared using krb5i or do you have IPsec configured ? If not I'd recommend trying one of those and see if your symptoms change. Shouldn't a scrub pick that up? Why would there be no errors from zoneadm install, which under the covers does a pkg image create followed by *multiple* pkg install invocations. No checksum errors pop up there. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [zones-discuss] Zones on shared storage - a warning
On Fri, Jan 8, 2010 at 6:51 AM, James Carlson carls...@workingcode.com wrote: Frank Batschulat (Home) wrote: This just can't be an accident, there must be some coincidence and thus there's a good chance that these CHKSUM errors must have a common source, either in ZFS or in NFS ? One possible cause would be a lack of substantial exercise. The man page says: A regular file. The use of files as a backing store is strongly discouraged. It is designed primarily for experimental purposes, as the fault tolerance of a file is only as good as the file system of which it is a part. A file must be specified by a full path. Could it be that discouraged and experimental mean not tested as thoroughly as you might like, and certainly not a good idea in any sort of production environment? It sounds like a bug, sure, but the fix might be to remove the option. This unsupported feature is supported with the use of Sun Ops Center 2.5 when a zone is put on a NAS Storage Library. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thin device support in ZFS?
Yet another way to thin-out the backing devices for a zpool on a thin-provisioned storage host, today: resilver. If your zpool has some redundancy across the SAN backing LUNs, simply drop and replace one at a time and allow zfs to resilver only the blocks currently in use onto the replacement LUN. -- Dan. pgpo7ejxaipJy.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zpool iostat -v hangs on L2ARC failure (SATA, 160 GB Postville)
Hello, today I wanted to test that the failure of the L2ARC device is not crucial to the pool. I added a Intel X25-M Postville (160GB) as cache device to a 54 disk mittor pool. Then I startet a SYNC iozone on the pool: iozone -ec -r 32k -s 2048m -l 2 -i 0 -i 2 -o Pool: pool mirror-0 disk1 disk2 mirror-1 disk3 disk4 cache intel-postville-ssd Then I pulled the power cable of the SSD device (not the sata connector) and from that moment on, al pool related commands hang. (e.g. zpool iostat -v) I've waited 20 minutes now - still hangs :( I can login to the system itself (after some time - so the whole is system is sluggish), so the syspool (which is a seperate device) is ok. Release is svn_104. dmesg shows: Jan 8 15:21:42 nexenta gda: [ID 107833 kern.warning] WARNING: /p...@0,0/pci-...@14,1/i...@1/c...@1,0 (Disk6): Jan 8 15:21:42 nexenta Error for command 'write sector'Error Level: Informational Jan 8 15:21:42 nexenta gda: [ID 107833 kern.notice]Sense Key: aborted command Jan 8 15:21:42 nexenta gda: [ID 107833 kern.notice]Vendor 'Gen-ATA ' error code: 0x3 Jan 8 15:21:47 nexenta genunix: [ID 698548 kern.notice] ata_disk_start: select failed Jan 8 15:21:47 nexenta gda: [ID 107833 kern.warning] WARNING: /p...@0,0/pci-...@14,1/i...@1/c...@0,0 (Disk5): Jan 8 15:21:47 nexenta Error for command 'write sector'Error Level: Informational Jan 8 15:21:47 nexenta gda: [ID 107833 kern.notice]Sense Key: aborted command Jan 8 15:21:47 nexenta gda: [ID 107833 kern.notice]Vendor 'Gen-ATA ' error code: 0x3 Jan 8 15:21:52 nexenta genunix: [ID 698548 kern.notice] ata_disk_start: select failed Jan 8 15:21:57 nexenta scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci-...@14,1/i...@1 (ata9): lspci: 00:00.0 Host bridge: ATI Technologies Inc RX780/RX790 Chipset Host Bridge 00:02.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (external gfx0 port A) 00:05.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (PCI express gpp port B) 00:06.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (PCI express gpp port C) 00:0a.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (PCI express gpp port F) 00:11.0 IDE interface: ATI Technologies Inc SB700/SB800 SATA Controller [IDE mode] 00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller 00:12.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller 00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller 00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller 00:13.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller 00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller 00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 3c) 00:14.1 IDE interface: ATI Technologies Inc SB700/SB800 IDE Controller 00:14.2 Audio device: ATI Technologies Inc SBx00 Azalia (Intel HDA) 00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller 00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge 00:14.5 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI2 Controller 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 01:00.0 VGA compatible controller: nVidia Corporation G72 [GeForce 7300 SE/7200 GS] (rev a1) 02:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet Controller (Copper) (rev 06) 03:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet Controller (Copper) (rev 06) 04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02) 05:07.0 Ethernet controller: Intel Corporation 82541PI Gigabit Ethernet Controller (rev 05) 05:0e.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 IEEE-1394a-2000 Controller (PHY/Link) Anyone seen something like this ? Hardware is a standard Gigabyte Mainboard with on-Soard sata. Regards, -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool iostat -v hangs on L2ARC failure (SATA, 160 GB Postville)
Ok, I now waited 30 minutes - still hung. After that I pulled the SATA cable to the L2ARC device also - still no success (I waited 10 minutes). After 10 minutes I put the L2ARC device back (SATA + Power) 20 seconds after that the system continues to run. dmesg shows: Jan 8 15:41:57 nexenta scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci-...@14,1/i...@1 (ata9): Jan 8 15:41:57 nexenta timeout: early timeout, target=1 lun=0 Jan 8 15:41:57 nexenta gda: [ID 107833 kern.warning] WARNING: /p...@0,0/pci-...@14,1/i...@1/c...@1,0 (Disk6): Jan 8 15:41:57 nexenta Error for command 'write sector'Error Level: Informational Jan 8 15:41:57 nexenta gda: [ID 107833 kern.notice]Sense Key: aborted command Jan 8 15:41:57 nexenta gda: [ID 107833 kern.notice]Vendor 'Gen-ATA ' error code: 0x3 Jan 8 15:42:01 nexenta fmd: [ID 441519 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major Jan 8 15:42:01 nexenta EVENT-TIME: Fri Jan 8 15:41:59 CET 2010 Jan 8 15:42:01 nexenta PLATFORM: GA-MA770-UD3, CSN: , HOSTNAME: nexenta Jan 8 15:42:01 nexenta SOURCE: zfs-diagnosis, REV: 1.0 Jan 8 15:42:01 nexenta EVENT-ID: aca93a91-e013-c1b8-a5b7-fff547b2a61e Jan 8 15:42:01 nexenta DESC: The number of I/O errors associated with a ZFS device exceeded Jan 8 15:42:01 nexenta acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD for more information. Jan 8 15:42:01 nexenta AUTO-RESPONSE: The device has been offlined and marked as faulted. An attempt Jan 8 15:42:01 nexenta will be made to activate a hot spare if available. Jan 8 15:42:01 nexenta IMPACT: Fault tolerance of the pool may be compromised. Jan 8 15:42:01 nexenta REC-ACTION: Run 'zpool status -x' and replace the bad device. Jan 8 15:42:13 nexenta fmd: [ID 441519 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major Jan 8 15:42:13 nexenta EVENT-TIME: Fri Jan 8 15:42:12 CET 2010 Jan 8 15:42:13 nexenta PLATFORM: GA-MA770-UD3, CSN: , HOSTNAME: nexenta Jan 8 15:42:13 nexenta SOURCE: zfs-diagnosis, REV: 1.0 Jan 8 15:42:13 nexenta EVENT-ID: 781fa01d-394f-c24d-b900-c114d1cd9d06 Jan 8 15:42:13 nexenta DESC: The number of I/O errors associated with a ZFS device exceeded Jan 8 15:42:13 nexenta acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD for more information. Jan 8 15:42:13 nexenta AUTO-RESPONSE: The device has been offlined and marked as faulted. An attempt Jan 8 15:42:13 nexenta will be made to activate a hot spare if available. Jan 8 15:42:13 nexenta IMPACT: Fault tolerance of the pool may be compromised. Jan 8 15:42:13 nexenta REC-ACTION: Run 'zpool status -x' and replace the bad device. .. the deivce is seen as faulted: pool: data state: ONLINE scrub: resilver completed after 0h0m with 0 errors on Fri Jan 8 15:42:03 2010 config: NAMESTATE READ WRITE CKSUM dataONLINE 0 0 0 mirrorONLINE 0 0 0 c3d0ONLINE 0 0 0 c6d0ONLINE 0 0 0 512 resilvered mirrorONLINE 0 0 0 c3d1ONLINE 0 0 0 c4d0ONLINE 0 0 0 cache c6d1 FAULTED 0 499 0 too many errors .. however zpool iostat -v still shows the device r...@nexenta:/export/home/admin# zpool iostat -v 1 capacity operationsbandwidth pool used avail read write read write -- - - - - - - data 209G 1.61T 0129 0 4.64M mirror 104G 824G 0 64 0 2.34M c3d0- - 0 64 0 2.34M c6d0- - 0 64 0 2.34M mirror 104G 824G 0 64 0 2.31M c3d1- - 0 64 0 2.31M c4d0- - 0 64 0 2.31M cache - - - - - - c6d1 137M 149G 0 0 0 0 -- - - - - - - syspool 2.18G 462G 0 0 0 0 c4d1s02.18G 462G 0 0 0 0 -- - - - - - - So this seems to be a hardware issue. I would expect that there is some general in kernel timeout for I/O's so that strangly failing and not reacting device (and real failures are like this) are killed. Did I miss something ? Is there a tunable (/etc/system) ? Thanks for your responses :) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] (Practical) limit on the number of snapshots?
On Fri, January 8, 2010 07:51, Robert Milkowski wrote: On 08/01/2010 12:40, Peter van Gemert wrote: By having a snapshot you are not releasing the space forcing zfs to allocate new space from other parts of a disk drive. This may lead (depending on workload) to more fragmentation, less localized data (more and longer seeks). ZFS uses COW (copy on write) during writes. This means that it first has to find a new location for the data and when this data is written, the original block is released. When using snapshots, the original block is not released. I don't think the use of snapshots will alter the way data is fragmented or localized on disk. Well, it will (depending on workload). For example - lets say you have a 80GB disk drive as a pool with a single db file which is 1GB in size. Now no snapshots are created and you constantly are modyfing logical blocks in the file. As ZFS will release the old block and will re-use it later on so all current data should be roughly within the first 2GB of the disk drive therefore highly localized. I thought block re-use was delayed to allow for TXG rollback, though? They'll certainly get reused eventually, but I think they get reused later rather than sooner. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] I/O errors after zfs promote back and forth
Hi, I have just observed the following issue and I would like to ask if it is already known: I'm using zones on ZFS filesystems which were cloned from a common template (which is itself an original filesystem). A couple of weeks ago, I did a pkg image-update, so all zone roots got cloned again and the new zone roots got promoted. I then decided to undo the update and promoted the original zone roots again. So until today, the zone template was dependend upon one of the zone roots and when I promoted it again to restore the intended order, all zones effectively crashed. When trying to execute processes in them, I got exec failures like this one: # zlogin ZONE [Connected to zone 'ZONE' pts/2] zlogin: exec failure: I/O error Is this issue known to anyone already? Thank you, Nils ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] I/O errors after zfs promote back and forth
BTW, this was on snv_111b - sorry I forgot to mention. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [zones-discuss] Zones on shared storage - a warning
On Fri, Jan 8, 2010 at 5:28 AM, Frank Batschulat (Home) frank.batschu...@sun.com wrote: [snip] Hey Mike, you're not the only victim of these strange CHKSUM errors, I hit the same during my slightely different testing, where I'm NFS mounting an entire, pre-existing remote file living in the zpool on the NFS server and use that to create a zpool and install zones into it. What does your overall setup look like? Mine is: T5220 + Sun System Firmware 7.2.4.f 2009/11/05 18:21 Primary LDom Solaris 10u8 Logical Domains Manager 1.2,REV=2009.06.25.09.48 + 142840-03 Guest Domain 4 vcpus + 15 GB memory OpenSolaris snv_130 (this is where the problem is observed) I've seen similar errors on Solaris 10 in the primary domain and on a M4000. Unfortunately Solaris 10 doesn't show the checksums in the ereport. There I noticed a mixture between read errors and checksum errors - and lots more of them. This could be because the S10 zone was a full root SUNWCXall compared to the much smaller default ipkg branded zone. On the primary domain running Solaris 10... (this command was run some time ago) primary-domain# zpool status myzone pool: myzone state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: none requested config: NAMESTATE READ WRITE CKSUM myzone DEGRADED 0 0 0 /foo/20g DEGRADED 4.53K 0 671 too many errors errors: No known data errors (this was run today, many days after previous command) primary-domain# fmdump -eV | egrep zio_err | uniq -c | head 1zio_err = 5 1zio_err = 50 1zio_err = 5 1zio_err = 50 1zio_err = 5 1zio_err = 50 2zio_err = 5 1zio_err = 50 3zio_err = 5 1zio_err = 50 Note that even though I had thousands of read errors the zone worked just fine. I would have never known (suspected?) there was a problem if I hadn't run zpool status or the various FMA commands. I've filed today: 6915265 zpools on files (over NFS) accumulate CKSUM errors with no apparent reason Thanks. I'll open a support call to help get some funding on it... here's the relevant piece worth investigating out of it (leaving out the actual setup etc..) as in your case, creating the zpool and installing the zone into it still gives a healthy zpool, but immediately after booting the zone, the zpool served over NFS accumulated CHKSUM errors. of particular interest are the 'cksum_actual' values as reported by Mike for his test case here: http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg33041.html if compared to the 'chksum_actual' values I got in the fmdump error output on my test case/system: note, the NFS servers zpool that is serving and sharing the file we use is healthy. zone halted now on my test system, and checking fmdump: osoldev.batschul./export/home/batschul.= fmdump -eV | grep cksum_actual | sort | uniq -c | sort -n | tail 2 cksum_actual = 0x4bea1a77300 0xf6decb1097980 0x217874c80a8d9100 0x7cd81ca72df5ccc0 2 cksum_actual = 0x5c1c805253 0x26fa7270d8d2 0xda52e2079fd74 0x3d2827dd7ee4f21 6 cksum_actual = 0x28e08467900 0x479d57f76fc80 0x53bca4db5209300 0x983ddbb8c4590e40 *A 6 cksum_actual = 0x348e6117700 0x765aa1a547b80 0xb1d6d98e59c3d00 0x89715e34fbf9cdc0 *B 7 cksum_actual = 0x0 0x0 0x0 0x0 *C 11 cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00 0x280934efa6d20f40 *D 14 cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400 0x7e0aef335f0c7f00 *E 17 cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800 0xd4f1025a8e66fe00 *F 20 cksum_actual = 0xbaddcafe00 0x5dcc54647f00 0x1f82a459c2aa00 0x7f84b11b3fc7f80 *G 25 cksum_actual = 0x5d6ee57f00 0x178a70d27f80 0x3fc19c3a19500 0x82804bc6ebcfc0 osoldev.root./export/home/batschul.= zpool status -v pool: nfszone state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: none requested config: NAME STATE READ WRITE CKSUM nfszone DEGRADED 0 0 0 /nfszone DEGRADED 0 0 462 too many errors errors: No known data errors == now compare this with Mike's error output as posted here:
Re: [zfs-discuss] (Practical) limit on the number of snapshots?
On 08/01/2010 14:50, David Dyer-Bennet wrote: On Fri, January 8, 2010 07:51, Robert Milkowski wrote: On 08/01/2010 12:40, Peter van Gemert wrote: By having a snapshot you are not releasing the space forcing zfs to allocate new space from other parts of a disk drive. This may lead (depending on workload) to more fragmentation, less localized data (more and longer seeks). ZFS uses COW (copy on write) during writes. This means that it first has to find a new location for the data and when this data is written, the original block is released. When using snapshots, the original block is not released. I don't think the use of snapshots will alter the way data is fragmented or localized on disk. Well, it will (depending on workload). For example - lets say you have a 80GB disk drive as a pool with a single db file which is 1GB in size. Now no snapshots are created and you constantly are modyfing logical blocks in the file. As ZFS will release the old block and will re-use it later on so all current data should be roughly within the first 2GB of the disk drive therefore highly localized. I thought block re-use was delayed to allow for TXG rollback, though? They'll certainly get reused eventually, but I think they get reused later rather than sooner. yes there is a delay but iirc it is only several transactions while the above scenario in practice usually means a snapshot a day and keep 30 of them. -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] link in zpool upgrade -v broken
Hi Ian, I see the problem. In your included URL below, you didn't include the /N suffix as included in the zpool upgrade output. CR 6898657 is still filed to identify the change. If you copy and paste the URL from the zpool upgrade -v output: http://www.opensolaris.org/os/community/zfs/version/N You will be redirected to the new version page: http://hub.opensolaris.org/bin/view/Community+Group+zfs/N See the output below. Thanks, Cindy # zpool upgrade -v This system is currently running ZFS pool version 22. The following versions are supported: VER DESCRIPTION --- 1 Initial ZFS version 2 Ditto blocks (replicated metadata) 3 Hot spares and double parity RAID-Z 4 zpool history 5 Compression using the gzip algorithm 6 bootfs pool property 7 Separate intent log devices 8 Delegated administration 9 refquota and refreservation properties 10 Cache devices 11 Improved scrub performance 12 Snapshot properties 13 snapused property 14 passthrough-x aclinherit 15 user/group space accounting 16 stmf property support 17 Triple-parity RAID-Z 18 Snapshot user holds 19 Log device removal 20 Compression using zle (zero-length encoding) 21 Deduplication 22 Received properties For more information on a particular version, including supported releases, see: http://www.opensolaris.org/os/community/zfs/version/N Where 'N' is the version number. On 01/07/10 16:52, Ian Collins wrote: http://www.opensolaris.org/os/community/zfs/version/ No longer exists. Is there a bug for this yet? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool iostat -v hangs on L2ARC failure (SATA, 160 GB Postville)
Ok, after browsing I found that the sata disks are not shown via cfgadm. I found http://opensolaris.org/jive/message.jspa?messageID=287791tstart=0 which states that you have to set the mode to AHCI to enable hot-plug etc. However I sill think, also the plain IDE driver needs a timeout to hande disk failures, cause cables etc can fail. I looked in the BIOS and it seems the disks are in IDE mode. There is a AHCI mode, however I dod not know if I can switch without reinstalling. Is it possible to set AHCI without reinstalling OSol ? Regards -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] (Practical) limit on the number of snapshots?
On Fri, 8 Jan 2010, Peter van Gemert wrote: I don't think the use of snapshots will alter the way data is fragmented or localized on disk. What happens after a snapshot is deleted? Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS partially hangs when removing an rpool mirrored disk while having some IO on another pool on another partition of the same disk
Hello, Sorry for the (very) long subject but I've pinpointed the problem to this exact situation. I know about the other threads related to hangs, but in my case there was no zfs destroy involved, nor any compression or deduplication. To make a long story short, when - a disk contains 2 partitions (p1=32GB, p2=1800 GB) and - p1 is used as part of a zfs mirror of rpool and - p2 is used as part of a raidz (tested raidz1 and raidz2) of tank and - some serious work is underway on tank (tested write, copy, scrub), If you physically remove the disk, zfs partially hangs. Putting back the physical disk does not help. For the long story : About the hardware : 1 x intel X25E (64GB SSD), 15x2TB SATA drives (7xWD, 8xHitachi), 2xQuadCore Xeon, 12GB RAM, 2xAreca-1680 (8-ports SAS controller), tyan S7002 mainboard. About the software / firmware : Opensolaris b130 installed on the SSD drive, on the first 32 GB. The areca cards are configured as a JBOD and are running the latest release firmware. Initial setup : We created a 32GB partition on all of the 2TB drives and mirrored the system partition, giving us a 16-way rpool mirror. The rest of the 2TB drives's space was put in a second partition and used for a raidz2 pool (named tank) Problem : Whenever we physically removed a disk from its tray while doing some speed testing on the tank pool, the system hung. At that time I hadn't read all the thread about zfs hangs and couldn't determine wether the system was hung or just zfs. In order to pinpoint the problem, we made another setup. Second setup : I reduced the number of partitions in the rpool mirror down to 3 (p1 from the SSD, p1 from a 2TB drive on the same controller as the SSD and p1 from a 2TB drive on the other controller). Problem : When the system is quiet, I am able to physically remove any disk, plug it back and resilver it. When I am putting some load on the tank pool, I can remove any disk that does *not* contain the rpool mirror (I can plug it back and resilver it while the load keeps running without noticeable performance impact). When I am putting some load on the tank pool, I cannot physically remove a disk that also contains a mirror of the rpool or zfs partially hangs. When I say partially, I mean that : - zpool iostat -v tank 5 freezes - if I run any zpool command related to rpool, I'm stuck (zpool clear rpool c4t0d7s0 for example or zpool status rpool) I can't launch new programms, but already launched programs continue to run (at least in an ssh session, since gnome becomes more and more frozen as you move from window to window). From ssh sessions : - prstat shows that only gnome-system-monitor, xorg, ssh, bash and various *stat utils (prstat, fstat, iostat, mpstat) are consumming some CPU. - zpool iostat -v tank 5 is frozen (It freezes when I issue a zpool clear rpool c4t0d7s0 in another session) - iostat -xn is not stuck but shows all zeroes since the very moment zpool iostat froze (which is quite strange if you look at fsstat ouput hereafter). NB: when I say all zeroes, I really mea nit, it's not zero dot domething, its zero dot zero. - mpstat shows normal activity (almost nothing since this is a test machine, so only a few percent are used, but it still shows some activity and refreshes correctly) CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 00 0 125 428 109 1131400 2512 0 0 98 10 0 2056 16 442210 277 11 1 0 88 20 0 163 152 13 3091300 13704 0 0 96 30 0 19 111 41 900400800 0 0 100 40 0 69 192 17 660300200 0 0 100 50 0 10617 920400 1670 0 0 100 60 0 96 191 25 740410 50 0 0 100 70 0 16586 630310590 0 0 100 - fsstat -F 5 shows all zeroes but for the zfs line (the figures hereunder stay almost the same over time) new name name attr attr lookup rddir read read write write file remov chng get setops ops ops bytes ops bytes 0 0 0 1,25K 0 2,51K 0 803 11,0M 473 11,0M zfs - disk leds show no activity - I cannot run any other command (neither from ssh, nor from gnome) - I cannot open another ssh session (I don't even get the login prompt in putty) - I can successfully ping the machine - I cannot establish a new cifs session (the login prompt should not appear since the machine is in an active directory domain, but when it's stuck the prompt appear and I cannot authenticate. I guess it's related to ldap or kerberos or whatever cannot be read on rpool), but an already active session will stay open (last time I even managed to create a text file with a few lines
Re: [zfs-discuss] [zones-discuss] Zones on shared storage - a warning
On Fri, Jan 8, 2010 at 9:11 AM, Mike Gerdts mger...@gmail.com wrote: I've seen similar errors on Solaris 10 in the primary domain and on a M4000. Unfortunately Solaris 10 doesn't show the checksums in the ereport. There I noticed a mixture between read errors and checksum errors - and lots more of them. This could be because the S10 zone was a full root SUNWCXall compared to the much smaller default ipkg branded zone. On the primary domain running Solaris 10... I've written a dtrace script to get the checksums on Solaris 10. Here's what I see with NFSv3 on Solaris 10. # zoneadm -z zone1 halt ; zpool export pool1 ; zpool import -d /mnt/pool1 pool1 ; zoneadm -z zone1 boot ; sleep 30 ; pkill dtrace # ./zfs_bad_cksum.d Tracing... dtrace: error on enabled probe ID 9 (ID 43443: fbt:zfs:zio_checksum_error:return): invalid address (0x301b363a000) in action #4 at DIF offset 20 dtrace: error on enabled probe ID 9 (ID 43443: fbt:zfs:zio_checksum_error:return): invalid address (0x3037f746000) in action #4 at DIF offset 20 cccdtrace: error on enabled probe ID 9 (ID 43443: fbt:zfs:zio_checksum_error:return): invalid address (0x3026e7b) in action #4 at DIF offset 20 cc Checksum errors: 3 : 0x130e01011103 0x20108 0x0 0x400 (fletcher_4_native) 3 : 0x220125cd8000 0x62425980c08 0x16630c08296c490c 0x82b320c082aef0c (fletcher_4_native) 3 : 0x2f2a0a202a20436f 0x7079726967687420 0x2863292032303031 0x2062792053756e20 (fletcher_4_native) 3 : 0x3c21444f43545950 0x452048544d4c2050 0x55424c494320222d 0x2f2f5733432f2f44 (fletcher_4_native) 3 : 0x6005a8389144 0xc2080e6405c200b6 0x960093d40800 0x9eea007b9800019c (fletcher_4_native) 3 : 0xac044a6903d00163 0xa138c8003446 0x3f2cd1e100b10009 0xa37af9b5ef166104 (fletcher_4_native) 3 : 0xbaddcafebaddcafe 0xc 0x0 0x0 (fletcher_4_native) 3 : 0xc4025608801500ff 0x1018500704528210 0x190103e50066 0xc34b90001238f900 (fletcher_4_native) 3 : 0xfe00fc01fc42fc42 0xfc42fc42fc42fc42 0xfffc42fc42fc42fc 0x42fc42fc42fc42fc (fletcher_4_native) 4 : 0x4b2a460a 0x0 0x4b2a460a 0x0 (fletcher_4_native) 4 : 0xc00589b159a00 0x543008a05b673 0x124b60078d5be 0xe3002b2a0b605fb3 (fletcher_4_native) 4 : 0x130e010111 0x32000b301080034 0x10166cb34125410 0xb30c19ca9e0c0860 (fletcher_4_native) 4 : 0x130e010111 0x3a201080038 0x104381285501102 0x418016996320408 (fletcher_4_native) 4 : 0x130e010111 0x3a201080038 0x1043812c5501102 0x81802325c080864 (fletcher_4_native) 4 : 0x130e010111 0x3a0001c01080038 0x1383812c550111c 0x818975698080864 (fletcher_4_native) 4 : 0x1f81442e9241000 0x2002560880154c00 0xff10185007528210 0x19010003e566 (fletcher_4_native) 5 : 0xbab10c 0xf 0x53ae 0xdd549ae39aa1ba20 (fletcher_4_native) 5 : 0x130e010111 0x3ab01080038 0x1163812c550110b 0x8180a7793080864 (fletcher_4_native) 5 : 0x61626300 0x0 0x0 0x0 (fletcher_4_native) 5 : 0x8003 0x3df0d6a1 0x0 0x0 (fletcher_4_native) 6 : 0xbab10c 0xf 0x5384 0xdd549ae39aa1ba20 (fletcher_4_native) 7 : 0xbab10c 0xf 0x0 0x9af5e5f61ca2e28e (fletcher_4_native) 7 : 0x130e010111 0x3a201080038 0x104381265501102 0xc18c7210c086006 (fletcher_4_native) 7 : 0x275c222074650a2e 0x5c222020436f7079 0x7269676874203139 0x38392041540a2e5c (fletcher_4_native) 8 : 0x130e010111 0x3a0003101080038 0x1623812c5501131 0x8187f66a4080864 (fletcher_4_native) 9 : 0x8a000801010c0682 0x2eed0809c1640513 0x70200ff00026424 0x18001d16101f0059 (fletcher_4_native) 12 : 0xbab10c 0xf 0x0 0x45a9e1fc57ca2aa8 (fletcher_4_native) 30 : 0xbaddcafebaddcafe 0xbaddcafebaddcafe 0xbaddcafebaddcafe 0xbaddcafebaddcafe (fletcher_4_native) 47 : 0x0 0x0 0x0 0x0 (fletcher_4_native) 92 : 0x130e01011103 0x10108 0x0 0x200 (fletcher_4_native) Since I had to guess at what the Solaris 10 source looks like, some extra eyeballs on the dtrace script is in order. Mike -- Mike Gerdts http://mgerdts.blogspot.com/ zfs_bad_cksum.d Description: Binary data ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS Dedup Performance
I haven't seen much discussion on how deduplication affects performance. I've enabled dudup on my 4-disk raidz array and have seen a significant drop in write throughput, from about 100 MB/s to 3 MB/s. I can't imagine such a decrease is normal. # zpool iostat nest 1 (with dedup enabled): ... nest1.05T 411G 91 18 197K 2.35M nest1.05T 411G147 15 443K 1.98M nest1.05T 411G 82 28 174K 3.59M # zpool iostat nest 1 (with dedup disabled): ... nest1.05T 410G 0787 0 96.9M nest1.05T 410G 1899 253K 95.0M nest1.05T 409G 0533 0 48.5M I do notice when dedup is enabled that the drives sound like they are constantly seeking. iostat shows average service times around 20 ms which is normal for my drives and prstat shows that my processor and memory aren't a bottleneck. What could cause such a marked decrease in throughput? Is anyone else experiencing similar effects? Thanks, James ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Dedup Performance
On Fri, Jan 08, 2010 at 10:00:14AM -0800, James Lee wrote: I haven't seen much discussion on how deduplication affects performance. I've enabled dudup on my 4-disk raidz array and have seen a significant drop in write throughput, from about 100 MB/s to 3 MB/s. I can't imagine such a decrease is normal. Seems like I've seen other posts with similar numbers (maybe 9MB/s or so?). Sounded like adding SSD for caching really improved performance however. # zpool iostat nest 1 (with dedup enabled): ... nest1.05T 411G 91 18 197K 2.35M nest1.05T 411G147 15 443K 1.98M nest1.05T 411G 82 28 174K 3.59M # zpool iostat nest 1 (with dedup disabled): ... nest1.05T 410G 0787 0 96.9M nest1.05T 410G 1899 253K 95.0M nest1.05T 409G 0533 0 48.5M I do notice when dedup is enabled that the drives sound like they are constantly seeking. iostat shows average service times around 20 ms which is normal for my drives and prstat shows that my processor and memory aren't a bottleneck. What could cause such a marked decrease in throughput? Is anyone else experiencing similar effects? Thanks, James Ray ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [zones-discuss] Zones on shared storage - a warning
Frank Batschulat (Home) wrote: This just can't be an accident, there must be some coincidence and thus there's a good chance that these CHKSUM errors must have a common source, either in ZFS or in NFS ? One possible cause would be a lack of substantial exercise. The man page says: A regular file. The use of files as a backing store is strongly discouraged. It is designed primarily for experimental purposes, as the fault tolerance of a file is only as good as the file system of which it is a part. A file must be specified by a full path. Could it be that discouraged and experimental mean not tested as thoroughly as you might like, and certainly not a good idea in any sort of production environment? It sounds like a bug, sure, but the fix might be to remove the option. -- James Carlson 42.703N 71.076W carls...@workingcode.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [zones-discuss] Zones on shared storage - a warning
On Fri, 08 Jan 2010 13:55:13 +0100, Darren J Moffat darr...@opensolaris.org wrote: Frank Batschulat (Home) wrote: This just can't be an accident, there must be some coincidence and thus there's a good chance that these CHKSUM errors must have a common source, either in ZFS or in NFS ? What are you using for on the wire protection with NFS ? Is it shared using krb5i or do you have IPsec configured ? If not I'd recommend trying one of those and see if your symptoms change. Hey Darren, doing krb5i is certainly a good idea for additional protection in general, however I have some doubts that NFS OTW corruption will produce the exact same wrong checksum inside 2 totally different setups and networks, as comparing Mike and my results showed [see 1]. cheers frankB [1] osoldev.batschul./export/home/batschul.= fmdump -eV | grep cksum_actual | sort | uniq -c | sort -n | tail 2cksum_actual = 0x4bea1a77300 0xf6decb1097980 0x217874c80a8d9100 0x7cd81ca72df5ccc0 2cksum_actual = 0x5c1c805253 0x26fa7270d8d2 0xda52e2079fd74 0x3d2827dd7ee4f21 6cksum_actual = 0x28e08467900 0x479d57f76fc80 0x53bca4db5209300 0x983ddbb8c4590e40 *A 6cksum_actual = 0x348e6117700 0x765aa1a547b80 0xb1d6d98e59c3d00 0x89715e34fbf9cdc0 *B 7cksum_actual = 0x0 0x0 0x0 0x0 *C 11cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00 0x280934efa6d20f40 *D 14cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400 0x7e0aef335f0c7f00 *E 17cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800 0xd4f1025a8e66fe00 *F 20cksum_actual = 0xbaddcafe00 0x5dcc54647f00 0x1f82a459c2aa00 0x7f84b11b3fc7f80 *G 25cksum_actual = 0x5d6ee57f00 0x178a70d27f80 0x3fc19c3a19500 0x82804bc6ebcfc0 == now compare this with Mike's error output as posted here: http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg33041.html # fmdump -eV | grep cksum_actual | sort | uniq -c | sort -n | tail 2cksum_actual = 0x14c538b06b6 0x2bb571a06ddb0 0x3e05a7c4ac90c62 0x290cbce13fc59dce *D 3cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400 0x7e0aef335f0c7f00 *E 3cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800 0xd4f1025a8e66fe00 *B 4cksum_actual = 0x0 0x0 0x0 0x0 4cksum_actual = 0x1d32a7b7b00 0x248deaf977d80 0x1e8ea26c8a2e900 0x330107da7c4bcec0 5cksum_actual = 0x14b8f7afe6 0x915db8d7f87 0x205dc7979ad73 0x4e0b3a8747b8a8 *C 6cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00 0x280934efa6d20f40 *A 6cksum_actual = 0x348e6117700 0x765aa1a547b80 0xb1d6d98e59c3d00 0x89715e34fbf9cdc0 *F 16cksum_actual = 0xbaddcafe00 0x5dcc54647f00 0x1f82a459c2aa00 0x7f84b11b3fc7f80 *G 48cksum_actual = 0x5d6ee57f00 0x178a70d27f80 0x3fc19c3a19500 0x82804bc6ebcfc0 and observe that the values in 'chksum_actual' causing our CHKSUM pool errors eventually because of missmatching with what had been expected are the SAME ! for 2 totally different client systems and 2 different NFS servers (mine vrs. Mike's), see the entries marked with *A to *G. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [zones-discuss] Zones on shared storage - a warning
Mike Gerdts wrote: This unsupported feature is supported with the use of Sun Ops Center 2.5 when a zone is put on a NAS Storage Library. Ah, ok. I didn't know that. -- James Carlson 42.703N 71.076W carls...@workingcode.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [zones-discuss] Zones on shared storage - a warning
On 1/8/2010 10:04 AM, James Carlson wrote: Mike Gerdts wrote: This unsupported feature is supported with the use of Sun Ops Center 2.5 when a zone is put on a NAS Storage Library. Ah, ok. I didn't know that. Does anyone know how that works? I can't find it in the docs, no one inside of Sun seemed to have a clue when I asked around, etc. RTFM gladly taken. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [zones-discuss] Zones on shared storage - a warning
On Jan 8, 2010, at 6:20 AM, Frank Batschulat (Home) wrote: On Fri, 08 Jan 2010 13:55:13 +0100, Darren J Moffat darr...@opensolaris.org wrote: Frank Batschulat (Home) wrote: This just can't be an accident, there must be some coincidence and thus there's a good chance that these CHKSUM errors must have a common source, either in ZFS or in NFS ? What are you using for on the wire protection with NFS ? Is it shared using krb5i or do you have IPsec configured ? If not I'd recommend trying one of those and see if your symptoms change. Hey Darren, doing krb5i is certainly a good idea for additional protection in general, however I have some doubts that NFS OTW corruption will produce the exact same wrong checksum inside 2 totally different setups and networks, as comparing Mike and my results showed [see 1]. Attach a mirror (not on NFS) and see if the bitmap yields any clues. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] link in zpool upgrade -v broken
Cindy Swearingen wrote: Hi Ian, I see the problem. In your included URL below, you didn't include the /N suffix as included in the zpool upgrade output. That's correct, N is the version number. I see it is fixed now, thanks. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [zones-discuss] Zones on shared storage - a warning
On Fri, Jan 8, 2010 at 12:28 PM, Torrey McMahon tmcmah...@yahoo.com wrote: On 1/8/2010 10:04 AM, James Carlson wrote: Mike Gerdts wrote: This unsupported feature is supported with the use of Sun Ops Center 2.5 when a zone is put on a NAS Storage Library. Ah, ok. I didn't know that. Does anyone know how that works? I can't find it in the docs, no one inside of Sun seemed to have a clue when I asked around, etc. RTFM gladly taken. Storage libraries are discussed very briefly at: http://wikis.sun.com/display/OC2dot5/Storage+Libraries Creation of zones is discussed at: http://wikis.sun.com/display/OC2dot5/Creating+Zones I've found no documentation that explains the implementation details. From looking at a test environment that I have running, it seems to go like: 1. The storage admin carves out some NFS space and exports it with the appropriate options to the various hosts (global zones). 2. In the Ops Center BUI, the ops center admin creates a new storage library. He selects type NFS and specifies the hostname and path that was allocated. 3. The ops center admin associates the storage library with various hosts. This causes it to be be mounted at /var/mnt/virtlibs/libraryId on those hosts. I'll call this $libmnt. 4. When the sysadmin provisions a zone through ops center, a UUID is allocated and associated with this zone. I'll call it $zuuid. A directory $libmnt/$zuuid is created with a set of directories under it. 5. As the sysadmin provisions ops center prompts for the virtual disk size. A file of that size is created at $libmnt/$zuuid/virtdisk/data. 6. Ops center creates a zpool: zpool create -m /var/mnt/oc-zpools/$zuuid/ z$zuuid \ $libmnt/$zuuid/virtdisk/data 7. The zonepath is created using a uuid that is unique to the zonepath ($puuid) z$zuuid/$puuid. It has a quota and a reservation set (8G each in the zpool history I am looking at). 8. The zone is configured with zonepath=/var/mnt/oc-zpools/$zuuid/$puuid, then installed Just in case anyone sees this as the right way to do things, I think it is generally OK with a couple caveats. The key areas that I would suggest for improvement are: - Mount the NFS space with -o forcedirectio. There is no need to cache data twice. - Never use UUID's in paths. This makes it nearly impossible for a sysadmin or a support person to look at the output of commands on the system and understand what it is doing. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Dedup Performance
See the reads on the pool with the low I/O ? I suspect reading the DDT causes the writes to slow down. See this bug http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6913566. It seems to give some backgrounds. Can you test setting the primarycache=metadata on the volume you test ? This would be my initial test. My suggestion would be that it may improve the situation because your ARC can be better utilized for DDT (this does not make much sence for production without a SSD cache, because you practially disable all caches for reading without a L2ARC (aka SSD)!) As I read the bug report above - it seems the if the DDT (deduplication table) does not fit into memory or dropped from there the DDT has to be read from disk causing massive random I/O. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Dedup Performance
James Lee wrote: I haven't seen much discussion on how deduplication affects performance. I've enabled dudup on my 4-disk raidz array and have seen a significant drop in write throughput, from about 100 MB/s to 3 MB/s. I can't imagine such a decrease is normal. What is you data? I've found data that lends its self to deduplication writes slightly faster while data that does not (video, iso images) writes dramatically slower. So I turn dedupe (and compression) off for filesystems containing random data. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Dedup Performance
On Fri, Jan 8, 2010 at 1:44 PM, Ian Collins i...@ianshome.com wrote: James Lee wrote: I haven't seen much discussion on how deduplication affects performance. I've enabled dudup on my 4-disk raidz array and have seen a significant drop in write throughput, from about 100 MB/s to 3 MB/s. I can't imagine such a decrease is normal. What is you data? I have seen the same, fsstat reports 4-7 seconds of small writes then bursts of 40-80MB/s but without dedup i see 80-150MB/s writes on my 4x 500GB sata drives, split between two controllers. 6GB of ram, and about 1.5TB of storage with 1.2TB used. if I disable dedup, speed goes backup. While doing dedup writes zfs destroy pool/filesystem takes about 100x time as usual even if the pool is that is being destroyed is empty reports say its far worse when over 100GB of data is on a drive. my dedup ratio for the pool is 1.15x. Read performance seems about the same or slightly faster I didn't really benchmark this work load since my clients seem to be the bottleneck. As money is tight at the moment i don't have the funds for a SSD to test with, but have disk space on non-utilized disk to try but haven't researched the effect of adding and removing (if possible) l2arc or zil log slices on a pool. it would be great to enable a 5-50GB slice off a sata drive to use as logging device for greater performance. James Dickens uadmin.blogspot.com I've found data that lends its self to deduplication writes slightly faster while data that does not (video, iso images) writes dramatically slower. So I turn dedupe (and compression) off for filesystems containing random data. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Dedup Performance
On 01/08/2010 02:42 PM, Lutz Schumann wrote: See the reads on the pool with the low I/O ? I suspect reading the DDT causes the writes to slow down. See this bug http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6913566. It seems to give some backgrounds. Can you test setting the primarycache=metadata on the volume you test ? This would be my initial test. My suggestion would be that it may improve the situation because your ARC can be better utilized for DDT (this does not make much sence for production without a SSD cache, because you practially disable all caches for reading without a L2ARC (aka SSD)!) As I read the bug report above - it seems the if the DDT (deduplication table) does not fit into memory or dropped from there the DDT has to be read from disk causing massive random I/O. The symptoms described in that bug report do match up with mine. I have also experienced long hang times (1hr) destroying a dataset while the disk just thrashes. I tried setting primarycache=metadata, but that did not help. I pulled the DDT statistics for my pool, but don't know how to determine its physical size-on-disk from that. If deduplication ends up requiring a separate sort-of log device, that will be a real shame. # zdb -DD nest DDT-sha256-zap-duplicate: 780321 entries, size 338 on disk, 174 in core DDT-sha256-zap-unique: 6188123 entries, size 335 on disk, 164 in core DDT histogram (aggregated over all DDTs): bucket allocated referenced __ __ __ refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE -- -- - - - -- - - - 15.90M752G729G729G5.90M752G729G729G 2 756K 94.0G 93.7G 93.6G1.48M188G187G187G 45.36K152M 80.3M 81.5M22.4K618M325M330M 8 258 4.05M 1.93M 2.00M2.43K 36.7M 16.3M 16.9M 16 30434K 42K 50.9K 597 10.2M824K 1003K 325255K 65.5K 66.6K 204 10.5M 3.26M 3.30M 64 20 2.02M906K910K1.41K141M 62.0M 62.2M 1284 2K 2K 2.99K 723362K362K541K 2561 512 512 766 277138K138K207K 5122 1K 1K 1.50K1.62K830K830K 1.21M Total6.65M846G823G823G7.41M941G917G917G dedup = 1.11, compress = 1.03, copies = 1.00, dedup * compress / copies = 1.14 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] unable to zfs destroy
this one has me alittle confused. ideas? j...@opensolaris:~# zpool import z cannot mount 'z/nukeme': mountpoint or dataset is busy cannot share 'z/cle2003-1': smb add share failed j...@opensolaris:~# zfs destroy z/nukeme internal error: Bad exchange descriptor Abort (core dumped) j...@opensolaris:~# adb core core file = core -- program ``/sbin/zfs'' on platform i86pc SIGABRT: Abort $c libc_hwcap1.so.1`_lwp_kill+0x15(1, 6, 80462a8, fee9bb5e) libc_hwcap1.so.1`raise+0x22(6, 0, 80462f8, fee7255a) libc_hwcap1.so.1`abort+0xf2(8046328, fedd, 8046328, 8086570, 8086970, 400) libzfs.so.1`zfs_verror+0xd5(8086548, 813, fedc5178, 804635c) libzfs.so.1`zfs_standard_error_fmt+0x225(8086548, 32, fedc5178, 808acd0) libzfs.so.1`zfs_destroy+0x10e(808acc8, 0, 0, 80479c8) destroy_callback+0x69(808acc8, 8047910, 80555ec, 8047910) zfs_do_destroy+0x31f(2, 80479c8, 80479c4, 80718dc) main+0x26a(3, 80479c4, 80479d4, 8053fdf) _start+0x7d(3, 8047ae4, 8047ae8, 8047af0, 0, 8047af9) ^d j...@opensolaris:~# uname -a SunOS opensolaris 5.11 snv_130 i86pc i386 i86pc j...@opensolaris:~# zpool status -v z pool: z state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub in progress for 0h39m, 19.15% done, 2h46m to go config: NAMESTATE READ WRITE CKSUM z ONLINE 0 0 2 c3t0d0s7 ONLINE 0 0 4 c3t1d0s7 ONLINE 0 0 0 c2d0 ONLINE 0 0 4 errors: Permanent errors have been detected in the following files: z/nukeme:0x0 j...@opensolaris:~# zfs list z/nukeme NAME USED AVAIL REFER MOUNTPOINT z/nukeme 49.0G 496G 49.0G /z/nukeme j...@opensolaris:~# zdb -d z/nukeme 0x0 zdb: can't open 'z/nukeme': Device busy there is also no mount point /z/nukeme any ideas how to nuke /z/nukeme? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] I/O Read starvation
dd if=/dev/urandom of=largefile.txt bs=1G count=8 cp largefile.txt ./test/1.txt cp largefile.txt ./test/2.txt Thats it now the system is totally unusable after launching the two 8G copies. Until these copies finish no other application is able to launch completely. Checking prstat shows them to be in the sleep state. Question: I m guessing this because ZFS doesnt use CFQ and that one process is allowed to queue up all its I/O reads ahead of other processes? Is there a concept of priority among I/O reads? I only ask because if root were to launch some GUI application they dont start up until both copies are done. So there is no concept of priority? Needless to say this does not exist on Linux 2.60... -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ssd pool + ssd cache ?
Hi list, Experimental question ... Imagine a pool made of SSDs disks, is there any interest to add a SSD cache to it ? What real impact ? Thx. -- Francois ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss