Re: [zfs-discuss] Drive i/o anomaly
Thanks Richard - interesting... The c8 controller is the motherboard SATA controller on an Intel D510 motherboard. I've read over the man page for iostat again, and I don't see anything in there that makes a distinction between the controller and the device. If it is the controller, would it make sense that the problem affects only one drive and not the other? It still smells of a drive issue to me. Since the controller is on the motherboard and difficult to replace, I'll replace the drive shortly and see how it goes. Nonetheless, I still find it odd that the whole io system effectively hangs up when one drive's queue fills up. Since the purpose of a mirror is to continue operating in the case of one drive's failure, I find it frustrating that the system slows right down so much because one drive's i/o queue is full. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Drive i/o anomaly
Hi, I have a low-power server with three drives in it, like so: matt@vault:~$ zpool status pool: rpool state: ONLINE scan: resilvered 588M in 0h3m with 0 errors on Fri Jan 7 07:38:06 2011 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror-0ONLINE 0 0 0 c8t1d0s0 ONLINE 0 0 0 c8t0d0s0 ONLINE 0 0 0 cache c12d0s0 ONLINE 0 0 0 errors: No known data errors I'm running netatalk file sharing for mac, and using it as a time machine backup server for my mac laptop. When files are copying to the server, I often see periods of a minute or so where network traffic stops. I'm convinced that there's some bottleneck in the storage side of things because when this happens, I can still ping the machine and if I have an ssh window, open, I can still see output from a `top` command running smoothly. However, if I try and do anything that touches disk (eg `ls`) that command stalls. At the time it comes good, everything comes good, file copies across the network continue, etc. If I have a ssh terminal session open and run `iostat -nv 5` I see something like this: extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1.2 36.0 153.6 4608.0 1.2 0.3 31.99.3 16 18 c12d0 0.0 113.40.0 7446.7 0.8 0.17.00.5 15 5 c8t0d0 0.2 106.44.1 7427.8 4.0 0.1 37.81.4 93 14 c8t1d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.4 73.2 25.7 9243.0 2.3 0.7 31.69.8 34 37 c12d0 0.0 226.60.0 24860.5 1.6 0.27.00.9 25 19 c8t0d0 0.2 127.63.4 12377.6 3.8 0.3 29.72.2 91 27 c8t1d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 44.20.0 5657.6 1.4 0.4 31.79.0 19 20 c12d0 0.2 76.04.8 9420.8 1.1 0.1 14.21.7 12 13 c8t0d0 0.0 16.60.0 2058.4 9.0 1.0 542.1 60.2 100 100 c8t1d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.00.20.0 25.6 0.0 0.00.32.3 0 0 c12d0 0.00.00.00.0 0.0 0.00.00.0 0 0 c8t0d0 0.0 11.00.0 1365.6 9.0 1.0 818.1 90.9 100 100 c8t1d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.20.00.10.0 0.0 0.00.1 25.4 0 1 c12d0 0.00.00.00.0 0.0 0.00.00.0 0 0 c8t0d0 0.0 17.60.0 2182.4 9.0 1.0 511.3 56.8 100 100 c8t1d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.00.00.00.0 0.0 0.00.00.0 0 0 c12d0 0.00.00.00.0 0.0 0.00.00.0 0 0 c8t0d0 0.0 16.60.0 2058.4 9.0 1.0 542.1 60.2 100 100 c8t1d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.00.00.00.0 0.0 0.00.00.0 0 0 c12d0 0.00.00.00.0 0.0 0.00.00.0 0 0 c8t0d0 0.0 15.80.0 1959.2 9.0 1.0 569.6 63.3 100 100 c8t1d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.20.00.10.0 0.0 0.00.10.1 0 0 c12d0 0.00.00.00.0 0.0 0.00.00.0 0 0 c8t0d0 0.0 17.40.0 2157.6 9.0 1.0 517.2 57.4 100 100 c8t1d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.00.00.00.0 0.0 0.00.00.0 0 0 c12d0 0.00.00.00.0 0.0 0.00.00.0 0 0 c8t0d0 0.0 18.20.0 2256.8 9.0 1.0 494.5 54.9 100 100 c8t1d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.00.00.00.0 0.0 0.00.00.0 0 0 c12d0 0.00.00.00.0 0.0 0.00.00.0 0 0 c8t0d0 0.0 14.80.0 1835.2 9.0 1.0 608.1 67.5 100 100 c8t1d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.20.00.10.0 0.0 0.00.10.1 0 0 c12d0 0.01.40.00.6 0.0 0.00.00.2 0 0 c8t0d0 0.0 49.00.0 6049.6 6.7 0.5 137.6 11.2 100 55 c8t1d0 extended device statistics r/sw/s kr/s
Re: [zfs-discuss] Drive i/o anomaly
Thanks, Marion. (I actually got the drive labels mixed up in the original post... I edited it on the forum page: http://opensolaris.org/jive/thread.jspa?messageID=511057#511057 ) My suspicion was the same: the drive doing the slow i/o is the problem. I managed to confirm that by taking the other drive offline (c8t0d0 samsung), and the same stalls and slow i/o occurred. After putting the drive online (and letting the resilver complete) I took the slow drive (c8t1d0 western digital green) offline and the system ran very nicely. It is a 4k sector drive, but I thought zfs recognised those drives and didn't need any special configuration...? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrating zpool to new drives with 4K Sectors
Except for meta data which seems to be written in small pieces, wouldn't having a zfs record size being a multiple of 4k on a vdev that is 4k aligned work ok? Or can the start of a zfs record that's 16kb for example start at any sector in the vdev? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] iScsi slow
On 04/08/2010, at 2:13, Roch Bourbonnais roch.bourbonn...@sun.com wrote: Le 27 mai 2010 à 07:03, Brent Jones a écrit : On Wed, May 26, 2010 at 5:08 AM, Matt Connolly matt.connolly...@gmail.com wrote: I've set up an iScsi volume on OpenSolaris (snv_134) with these commands: sh-4.0# zfs create rpool/iscsi sh-4.0# zfs set shareiscsi=on rpool/iscsi sh-4.0# zfs create -s -V 10g rpool/iscsi/test The underlying zpool is a mirror of two SATA drives. I'm connecting from a Mac client with global SAN initiator software, connected via Gigabit LAN. It connects fine, and I've initialiased a mac format volume on that iScsi volume. Performance, however, is terribly slow, about 10 times slower than an SMB share on the same pool. I expected it would be very similar, if not faster than SMB. Here's my test results copying 3GB data: iScsi: 44m01s 1.185MB/s SMB share: 4m2711.73MB/s Reading (the same 3GB) is also worse than SMB, but only by a factor of about 3: iScsi: 4m3611.34MB/s SMB share: 1m4529.81MB/s cleaning up some old mail Not unexpected. Filesystems have readahead code to prefetch enough to cover the latency of the read request. iSCSI only responds to the request. Put a filesystem on top of iscsi and try again. As I indicated above, there is a mac filesystem on the iscsi volume. Matt. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Zpool mirror fail testing - odd resilver behaviour after reconnect
I have a Opensolaris snv_134 machine with 2 x 1.5TB drives. One is a Samsung Silencer the other is a dreaded Western Digital Green. I'm testing the mirror for failure by simply yanking out the SATA cable while the machine is running. The system never skips a beat, which is great. But the reconnect behaviour is vastly different on the two drives. 1. Samsung reconnect. `cfgadm` reported the drive as connected but unconfigured. After running `cfgadm -c configure sata1/1`, the drive automatically came online in the zpool mirror and resilvered its differences which completed in about 10 seconds. This excellent. 2. WD Green reconnect. `cfgadm` reported the drive as disconnected. I had to use the '-f' option to connect the drive and then configure it: m...@vault:~$ cfgadm Ap_Id Type Receptacle Occupant Condition sata1/0sata-portdisconnected unconfigured failed sata1/1::dsk/c8t1d0disk connectedconfigured ok m...@vault:~$ pfexec cfgadm -c connect sata1/0 cfgadm: Insufficient condition m...@vault:~$ pfexec cfgadm -f -c connect sata1/0 Activate the port: /devices/p...@0,0/pci8086,4...@1f,2:0 This operation will enable activity on the SATA port Continue (yes/no)? yes m...@vault:~$ cfgadm Ap_Id Type Receptacle Occupant Condition sata1/0disk connectedunconfigured unknown sata1/1::dsk/c8t1d0disk connectedconfigured ok m...@vault:~$ pfexec cfgadm -c configure sata1/0 m...@vault:~$ cfgadm Ap_Id Type Receptacle Occupant Condition sata1/0::dsk/c8t0d0disk connectedconfigured ok sata1/1::dsk/c8t1d0disk connectedconfigured ok After this point, zpool resilvered the entire 243GB dataset. I suspect that the automatic connect is simply a firmware problem and yet another reason to NOT BUY Western Digital Green drives. But my real question is: Why does zpool want to resilver the entire dataset on drive, but not the other?? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] mirror writes 10x slower than individual writes
I have an odd setup at present, because I'm testing while still building my machine. It's an Intel Atom D510 mobo running snv_134 2GB RAM with 2 SATA drives (AHCI): 1: Samsung 250GB old laptop drive 2: WD Green 1.5TB drive (idle3 turned off) Ultimately, it will be a time machine backup for my Mac laptop. So I have installed Netatalk 2.1.1 which is working great. Read performance from the mirror, via gigabit ethernet rocks, easily sustaining 50MB/s off the two drives mirrored. However, write performance is terrible, typically no better than 1-2MB/s on average. I just thought to detach the WD drive from the mirror and test the drives individually, so with the system still running on drive 1 I create an independent zpool on the other drive and a netatalk share to it. Using `dd` to copy a single large file, to each drive the results are: Drive 1: Samsung (rpool, and there's a scrub going on) 1565437216 bytes transferred in 98.236700 secs (15935360 bytes/sec) Drive 2: Western Digital 1.5TB green drive: 1565437216 bytes transferred in 71.745737 secs (21819237 bytes/sec) However, when the two drives were mirrored, after all resilvering completed and there was no background I/O, the write performance was about 10x worse. Watching `zpool iostat -v 2` I could see that quite often drive 1 would write a big chunk of data and then wait for ages for drive 2 to write the same data to disc. Could it be that there is a separate cache for the mirror that was stalling waiting on the cache for the larger drive?? Would this scenario be caused because the drives are so different in size? 250GB and 1500GB?? Once the scrub finishes, I'll re-attach the mirror, and re-test tomorrow, reporting the `zpool iostat` in detail... -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Zfs mirror boot hang at boot
Hi, I'm running snv_134 on 64-bit x86 motherboard, with 2 SATA drives. The zpool rpool uses whole disk of each drive. I've installed grub on both discs, and mirroring seems to be working great. I just started testing what happens when a drive fails. I kicked off some activities and unplugged one of the drives while it was running, the system kept running, and zpool status indicated that one drive was removed. Awesome. I plugged it back in, and it recovered perfectly. But with one of the drives unplugged, the system hangs at boot. On both drives (with the other unplugged) grub loads, and the system starts to boot. However, it gets stuck at the Hostname: Vault line and never gets to reading ZFS config like it would on a normal boot. If I reconnect both drives then booting continues correctly. If I detach a drive from the pool, then the system also correctly boots off a single connected drive. However, reattaching the 2nd drive causes a whole resilver to occur. Is this a bug? Or is there some other thing you need to do to mark the drive as offline or something. Shame that you have to do that before rebooting! Would make it very hard to recover if the drive was physically dead Thanks, Matt -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] iScsi slow
I've set up an iScsi volume on OpenSolaris (snv_134) with these commands: sh-4.0# zfs create rpool/iscsi sh-4.0# zfs set shareiscsi=on rpool/iscsi sh-4.0# zfs create -s -V 10g rpool/iscsi/test The underlying zpool is a mirror of two SATA drives. I'm connecting from a Mac client with global SAN initiator software, connected via Gigabit LAN. It connects fine, and I've initialiased a mac format volume on that iScsi volume. Performance, however, is terribly slow, about 10 times slower than an SMB share on the same pool. I expected it would be very similar, if not faster than SMB. Here's my test results copying 3GB data: iScsi: 44m01s 1.185MB/s SMB share: 4m2711.73MB/s Reading (the same 3GB) is also worse than SMB, but only by a factor of about 3: iScsi: 4m3611.34MB/s SMB share: 1m4529.81MB/s Is there something obvious I've missed here? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss