Re: [zfs-discuss] Drive i/o anomaly

2011-02-09 Thread Matt Connolly
Thanks Richard - interesting...

The c8 controller is the motherboard SATA controller on an Intel D510 
motherboard.

I've read over the man page for iostat again, and I don't see anything in there 
that makes a distinction between the controller and the device.

If it is the controller, would it make sense that the problem affects only one 
drive and not the other? It still smells of a drive issue to me. 

Since the controller is on the motherboard and difficult to replace, I'll 
replace the drive shortly and see how it goes.

Nonetheless,  I still find it odd that the whole io system effectively hangs up 
when one drive's queue fills up. Since the purpose of a mirror is to continue 
operating in the case of one drive's failure, I find it frustrating that the 
system slows right down so much because one drive's i/o queue is full.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Drive i/o anomaly

2011-02-07 Thread Matt Connolly
Hi, I have a low-power server with three drives in it, like so:


matt@vault:~$ zpool status
  pool: rpool
 state: ONLINE
 scan: resilvered 588M in 0h3m with 0 errors on Fri Jan  7 07:38:06 2011
config:

NAME  STATE READ WRITE CKSUM
rpool ONLINE   0 0 0
  mirror-0ONLINE   0 0 0
c8t1d0s0  ONLINE   0 0 0
c8t0d0s0  ONLINE   0 0 0
cache
  c12d0s0 ONLINE   0 0 0

errors: No known data errors


I'm running netatalk file sharing for mac, and using it as a time machine 
backup server for my mac laptop.

When files are copying to the server, I often see periods of a minute or so 
where network traffic stops. I'm convinced that there's some bottleneck in the 
storage side of things because when this happens, I can still ping the machine 
and if I have an ssh window, open, I can still see output from a `top` command 
running smoothly. However, if I try and do anything that touches disk (eg `ls`) 
that command stalls. At the time it comes good, everything comes good, file 
copies across the network continue, etc.

If I have a ssh terminal session open and run `iostat -nv 5` I see something 
like this:


extended device statistics  
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
1.2   36.0  153.6 4608.0  1.2  0.3   31.99.3  16  18 c12d0
0.0  113.40.0 7446.7  0.8  0.17.00.5  15   5 c8t0d0
0.2  106.44.1 7427.8  4.0  0.1   37.81.4  93  14 c8t1d0
extended device statistics  
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.4   73.2   25.7 9243.0  2.3  0.7   31.69.8  34  37 c12d0
0.0  226.60.0 24860.5  1.6  0.27.00.9  25  19 c8t0d0
0.2  127.63.4 12377.6  3.8  0.3   29.72.2  91  27 c8t1d0
extended device statistics  
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0   44.20.0 5657.6  1.4  0.4   31.79.0  19  20 c12d0
0.2   76.04.8 9420.8  1.1  0.1   14.21.7  12  13 c8t0d0
0.0   16.60.0 2058.4  9.0  1.0  542.1   60.2 100 100 c8t1d0
extended device statistics  
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.00.20.0   25.6  0.0  0.00.32.3   0   0 c12d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c8t0d0
0.0   11.00.0 1365.6  9.0  1.0  818.1   90.9 100 100 c8t1d0
extended device statistics  
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.20.00.10.0  0.0  0.00.1   25.4   0   1 c12d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c8t0d0
0.0   17.60.0 2182.4  9.0  1.0  511.3   56.8 100 100 c8t1d0
extended device statistics  
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.00.00.00.0  0.0  0.00.00.0   0   0 c12d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c8t0d0
0.0   16.60.0 2058.4  9.0  1.0  542.1   60.2 100 100 c8t1d0
extended device statistics  
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.00.00.00.0  0.0  0.00.00.0   0   0 c12d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c8t0d0
0.0   15.80.0 1959.2  9.0  1.0  569.6   63.3 100 100 c8t1d0
extended device statistics  
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.20.00.10.0  0.0  0.00.10.1   0   0 c12d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c8t0d0
0.0   17.40.0 2157.6  9.0  1.0  517.2   57.4 100 100 c8t1d0
extended device statistics  
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.00.00.00.0  0.0  0.00.00.0   0   0 c12d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c8t0d0
0.0   18.20.0 2256.8  9.0  1.0  494.5   54.9 100 100 c8t1d0
extended device statistics  
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.00.00.00.0  0.0  0.00.00.0   0   0 c12d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c8t0d0
0.0   14.80.0 1835.2  9.0  1.0  608.1   67.5 100 100 c8t1d0
extended device statistics  
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.20.00.10.0  0.0  0.00.10.1   0   0 c12d0
0.01.40.00.6  0.0  0.00.00.2   0   0 c8t0d0
0.0   49.00.0 6049.6  6.7  0.5  137.6   11.2 100  55 c8t1d0
extended device statistics  
r/sw/s   kr/s  

Re: [zfs-discuss] Drive i/o anomaly

2011-02-07 Thread Matt Connolly
Thanks, Marion.

(I actually got the drive labels mixed up in the original post... I edited it 
on the forum page: 
http://opensolaris.org/jive/thread.jspa?messageID=511057#511057 )

My suspicion was the same: the drive doing the slow i/o is the problem.

I managed to confirm that by taking the other drive offline (c8t0d0 samsung), 
and the same stalls and slow i/o occurred.

After putting the drive online (and letting the resilver complete) I took the 
slow drive (c8t1d0 western digital green) offline and the system ran very 
nicely.

It is a 4k sector drive, but I thought zfs recognised those drives and didn't 
need any special configuration...?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Migrating zpool to new drives with 4K Sectors

2011-02-07 Thread Matt Connolly
Except for meta data which seems to be written in small pieces, wouldn't having 
a zfs record size being a multiple of 4k on a vdev that is 4k aligned work ok?

Or can the start of a zfs record that's 16kb for example start at any sector in 
the vdev?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] iScsi slow

2010-08-04 Thread Matt Connolly
On 04/08/2010, at 2:13, Roch Bourbonnais roch.bourbonn...@sun.com wrote:

 
 Le 27 mai 2010 à 07:03, Brent Jones a écrit :
 
 On Wed, May 26, 2010 at 5:08 AM, Matt Connolly
 matt.connolly...@gmail.com wrote:
 I've set up an iScsi volume on OpenSolaris (snv_134) with these commands:
 
 sh-4.0# zfs create rpool/iscsi
 sh-4.0# zfs set shareiscsi=on rpool/iscsi
 sh-4.0# zfs create -s -V 10g rpool/iscsi/test
 
 The underlying zpool is a mirror of two SATA drives. I'm connecting from a 
 Mac client with global SAN initiator software, connected via Gigabit LAN. 
 It connects fine, and I've initialiased a mac format volume on that iScsi 
 volume.
 
 Performance, however, is terribly slow, about 10 times slower than an SMB 
 share on the same pool. I expected it would be very similar, if not faster 
 than SMB.
 
 Here's my test results copying 3GB data:
 
 iScsi:  44m01s  1.185MB/s
 SMB share:  4m2711.73MB/s
 
 Reading (the same 3GB) is also worse than SMB, but only by a factor of 
 about 3:
 
 iScsi:  4m3611.34MB/s
 SMB share:  1m4529.81MB/s
 
 
 cleaning up some old mail 
 
 Not unexpected. Filesystems have readahead code to prefetch enough to cover 
 the latency of the read request. iSCSI only responds to the request.
 Put a filesystem on top of iscsi and try again.

As I indicated above, there is a mac filesystem on the iscsi volume.

Matt. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Zpool mirror fail testing - odd resilver behaviour after reconnect

2010-06-30 Thread Matt Connolly
I have a Opensolaris snv_134 machine with 2 x 1.5TB drives. One is a Samsung 
Silencer the other is a dreaded Western Digital Green.

I'm testing the mirror for failure by simply yanking out the SATA cable while 
the machine is running. The system never skips a beat, which is great. But the 
reconnect behaviour is vastly different on the two drives.

1. Samsung reconnect. `cfgadm` reported the drive as connected but 
unconfigured. After running `cfgadm -c configure sata1/1`, the drive 
automatically came online in the zpool mirror and resilvered its differences 
which completed in about 10 seconds. This excellent.

2. WD Green reconnect. `cfgadm` reported the drive as disconnected. I had to 
use the '-f' option to connect the drive and then configure it:

m...@vault:~$ cfgadm
Ap_Id  Type Receptacle   Occupant Condition
sata1/0sata-portdisconnected unconfigured failed
sata1/1::dsk/c8t1d0disk connectedconfigured   ok
m...@vault:~$ pfexec cfgadm -c connect sata1/0
cfgadm: Insufficient condition
m...@vault:~$ pfexec cfgadm -f -c connect sata1/0
Activate the port: /devices/p...@0,0/pci8086,4...@1f,2:0
This operation will enable activity on the SATA port
Continue (yes/no)? yes
m...@vault:~$ cfgadm
Ap_Id  Type Receptacle   Occupant Condition
sata1/0disk connectedunconfigured unknown
sata1/1::dsk/c8t1d0disk connectedconfigured   ok
m...@vault:~$ pfexec cfgadm -c configure sata1/0
m...@vault:~$ cfgadm
Ap_Id  Type Receptacle   Occupant Condition
sata1/0::dsk/c8t0d0disk connectedconfigured   ok
sata1/1::dsk/c8t1d0disk connectedconfigured   ok


After this point, zpool resilvered the entire 243GB dataset.

I suspect that the automatic connect is simply a firmware problem and yet 
another reason to NOT BUY Western Digital Green drives.

But my real question is: Why does zpool want to resilver the entire dataset on 
drive, but not the other??
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] mirror writes 10x slower than individual writes

2010-05-31 Thread Matt Connolly
I have an odd setup at present, because I'm testing while still building my 
machine.

It's an Intel Atom D510 mobo running snv_134 2GB RAM with 2 SATA drives (AHCI):
1: Samsung 250GB old laptop drive
2: WD Green 1.5TB drive (idle3 turned off)

Ultimately, it will be a time machine backup for my Mac laptop. So I have 
installed Netatalk 2.1.1 which is working great.

Read performance from the mirror, via gigabit ethernet rocks, easily sustaining 
50MB/s off the two drives mirrored.

However, write performance is terrible, typically no better than 1-2MB/s on 
average.

I just thought to detach the WD drive from the mirror and test the drives 
individually, so with the system still running on drive 1 I create an 
independent zpool on the other drive and a netatalk share to it.

Using `dd` to copy a single large file, to each drive the results are:

Drive 1: Samsung (rpool, and there's a scrub going on)
1565437216 bytes transferred in 98.236700 secs (15935360 bytes/sec)

Drive 2: Western Digital 1.5TB green drive:
1565437216 bytes transferred in 71.745737 secs (21819237 bytes/sec)

However, when the two drives were mirrored, after all resilvering completed and 
there was no background I/O, the write performance was about 10x worse.

Watching `zpool iostat -v 2` I could see that quite often drive 1 would write a 
big chunk of data and then wait for ages for drive 2 to write the same data to 
disc.

Could it be that there is a separate cache for the mirror that was stalling 
waiting on the cache for the larger drive??

Would this scenario be caused because the drives are so different in size? 
250GB and 1500GB??

Once the scrub finishes, I'll re-attach the mirror, and re-test tomorrow, 
reporting the `zpool iostat` in detail...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Zfs mirror boot hang at boot

2010-05-29 Thread Matt Connolly
Hi,

I'm running snv_134 on 64-bit x86 motherboard, with 2 SATA drives. The zpool 
rpool uses whole disk of each drive. I've installed grub on both discs, and 
mirroring seems to be working great.

I just started testing what happens when a drive fails. I kicked off some 
activities and unplugged one of the drives while it was running, the system 
kept running, and zpool status indicated that one drive was removed. Awesome. I 
plugged it back in, and it recovered perfectly.

But with one of the drives unplugged, the system hangs at boot. On both drives 
(with the other unplugged) grub loads, and the system starts to boot. However, 
it gets stuck at the Hostname: Vault line and never gets to reading ZFS 
config like it would on a normal boot.

If I reconnect both drives then booting continues correctly.

If I detach a drive from the pool, then the system also correctly boots off a 
single connected drive. However, reattaching the 2nd drive causes a whole 
resilver to occur.

Is this a bug? Or is there some other thing you need to do to mark the drive as 
offline or something. Shame that you have to do that before rebooting! Would 
make it very hard to recover if the drive was physically dead

Thanks,
Matt
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] iScsi slow

2010-05-26 Thread Matt Connolly
I've set up an iScsi volume on OpenSolaris (snv_134) with these commands:

sh-4.0# zfs create rpool/iscsi
sh-4.0# zfs set shareiscsi=on rpool/iscsi
sh-4.0# zfs create -s -V 10g rpool/iscsi/test

The underlying zpool is a mirror of two SATA drives. I'm connecting from a Mac 
client with global SAN initiator software, connected via Gigabit LAN. It 
connects fine, and I've initialiased a mac format volume on that iScsi volume.

Performance, however, is terribly slow, about 10 times slower than an SMB share 
on the same pool. I expected it would be very similar, if not faster than SMB.

Here's my test results copying 3GB data:

iScsi:  44m01s  1.185MB/s
SMB share:  4m2711.73MB/s

Reading (the same 3GB) is also worse than SMB, but only by a factor of about 3:

iScsi:  4m3611.34MB/s
SMB share:  1m4529.81MB/s


Is there something obvious I've missed here?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss