Re: [zfs-discuss] Woeful performance from an iSCSI pool

2012-11-24 Thread Ian Collins

Ian Collins wrote:

I look after a remote server that has two iSCSI pools.  The volumes for
each pool are sparse volumes and a while back the target's storage
became full, causing weird and wonderful corruption issues until they
manges to free some space.

Since then, one pool has been reasonably OK, but the other has terrible
performance receiving snapshots.  Despite both iSCSI devices using the
same IP connection, iostat shows one with reasonable service times while
the other shows really high (up to 9 seconds) service times and 100%
busy.  This kills performance for snapshots with many random file
removals and additions.

I'm currently zero filling the bad pool to recover space on the target
storage to see if that improves matters.


It did. Maybe the volume's free space had become very fragmented.

There are a couple of lessons here:

1) When using a thin provisioned volume for an iSCSI target, don't let 
the volume's pool become full!


2) if the pool using the iSCSI target has a lot of churn, consider zero 
filling the pool to flush out the free blocks.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Woeful performance from an iSCSI pool

2012-11-22 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Ian Collins
> 
> I look after a remote server that has two iSCSI pools.  The volumes for
> each pool are sparse volumes and a while back the target's storage
> became full, causing weird and wonderful corruption issues until they
> manges to free some space.
> 
> Since then, one pool has been reasonably OK, but the other has terrible
> performance receiving snapshots.  Despite both iSCSI devices using the
> same IP connection, iostat shows one with reasonable service times while
> the other shows really high (up to 9 seconds) service times and 100%
> busy.  This kills performance for snapshots with many random file
> removals and additions.
> 
> I'm currently zero filling the bad pool to recover space on the target
> storage to see if that improves matters.
> 
> Has anyone else seen similar behaviour with previously degraded iSCSI
> pools?

This sounds exactly like the behavior I was seeing with my attempt at two 
machines zpool mirror'ing each other via iscsi.  In my case, I had two machines 
that are both targets and initiators.  I made the initiator service dependent 
on the target service, and I made the zpool mount dependent on the initiator 
service, and I made the virtualbox guest start dependent on the zpool mount.

Everything seemed fine for a while, including some reboots.  But then one 
reboot, one of my systems stayed down too long, and when it finally came back 
up, both machines started choking.

So far I haven't found any root cause, and so far the only solution I've found 
was to reinstall the OS.  I tried everything I know in terms of removing, 
forgetting, recreating the targets, initiators, and pool, but somehow none of 
that was sufficient.

I recently (yesterday) got budgetary approval to dig into this more, so 
hopefully maybe I'll have some insight before too long, but don't hold your 
breath.  I could fail, and even if I don't, it's likely to be weeks or months.

What I want to know from you is:

Which machines are your solaris machines?  Just the targets?  Just the 
initiators?  All of them?

You say you're having problems just with snapshots.  Are you sure you're not 
having trouble with all sorts of IO, and not just snapshots?  What about import 
/ export?

In my case, I found I was able to zfs send, zfs receive, zfs status, all fine.  
But when I launched a guest VM, there would be a massive delay - you said up to 
9 seconds - I was sometimes seeing over 30s - sometimes crashing the host 
system.  And the guest OS was acting like it was getting IO error, without 
actually displaying error message indicating IO error.  I would attempt, and 
sometimes fail, to power off the guest vm (kill -KILL VirtualBox).  After the 
failure began, zpool status still works (and reports no errors), but if I try 
to do things like export/import, they fail indefinitely, and I need to power 
cycle the host.  While in the failure mode, I can zpool iostat, and I sometimes 
see 0 transactions with nonzero bandwidth.  Which defies my understanding.

Did you ever see the iscsi targets "offline" or "degraded" in any way?  Did you 
do anything like "online" or "clear?"

My systems are openindiana - the latest, I forget if that's 151a5 or a6

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Woeful performance from an iSCSI pool

2012-11-21 Thread Ian Collins

On 11/22/12 10:15, Ian Collins wrote:

I look after a remote server that has two iSCSI pools.  The volumes for
each pool are sparse volumes and a while back the target's storage
became full, causing weird and wonderful corruption issues until they
manges to free some space.

Since then, one pool has been reasonably OK, but the other has terrible
performance receiving snapshots.  Despite both iSCSI devices using the
same IP connection, iostat shows one with reasonable service times while
the other shows really high (up to 9 seconds) service times and 100%
busy.  This kills performance for snapshots with many random file
removals and additions.

I'm currently zero filling the bad pool to recover space on the target
storage to see if that improves matters.

Has anyone else seen similar behaviour with previously degraded iSCSI
pools?

As a data point, both pools are being zero filled with dd.  A 30 second 
iostat sample shows one device getting more than double the write 
throughput of the other:


r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
0.2   64.00.0   50.1  0.0  5.60.7   87.9   4  64 
c0t600144F096C94AC74ECD96F20001d0
5.6   44.90.0   18.2  0.0  5.80.3  115.7   2  76 
c0t600144F096C94AC74FF354B2d0


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Woeful performance from an iSCSI pool

2012-11-21 Thread Ian Collins
I look after a remote server that has two iSCSI pools.  The volumes for 
each pool are sparse volumes and a while back the target's storage 
became full, causing weird and wonderful corruption issues until they 
manges to free some space.


Since then, one pool has been reasonably OK, but the other has terrible 
performance receiving snapshots.  Despite both iSCSI devices using the 
same IP connection, iostat shows one with reasonable service times while 
the other shows really high (up to 9 seconds) service times and 100% 
busy.  This kills performance for snapshots with many random file 
removals and additions.


I'm currently zero filling the bad pool to recover space on the target 
storage to see if that improves matters.


Has anyone else seen similar behaviour with previously degraded iSCSI 
pools?


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss