subject:"\[zfs\-discuss\] Abysmal ISCSI \/ ZFS Performance"

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-23 Thread Kjetil Torgrim Homme

Miles Nordin  writes:

>> "kth" == Kjetil Torgrim Homme  writes:
>
>kth> the SCSI layer handles the replaying of operations after a
>kth> reboot or connection failure.
>
> how?
>
> I do not think it is handled by SCSI layers, not for SAS nor iSCSI.

sorry, I was inaccurate.  error reporting is done by the SCSI layer, and
the filesystem handles it by retrying whatever outstanding operations it
has.

> Also, remember a write command that goes into the write cache is a
> SCSI command that's succeeded, even though it's not actually on disk
> for sure unless you can complete a sync cache command successfully and
> do so with no errors nor ``protocol events'' in the gap between the
> successful write and the successful sync.  A facility to replay failed
> commands won't help because when a drive with write cache on reboots,
> successful writes are rolled back.

this is true, sorry about my lack of precision.  the SCSI layer can't do
this on its own.

-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-22 Thread Miles Nordin

> "kth" == Kjetil Torgrim Homme  writes:

   kth> basically iSCSI just defines a reliable channel for SCSI.  

pft.

AIUI a lot of the complexity in real stacks is ancient protocol
arcania for supporting multiple initiators and TCQ regardless of
whther the physical target supports these things, multiple paths
between a single target,initiator pair, and their weird SCTP-like
notion that several physical SCSI targets ought to be combined into
multiple LUN's of a single virtual iSCSI target.  I think the mapping
from iSCSI to SCSI is not usually very direct.  I have not dug into it
though.

   kth> the SCSI layer handles the replaying of operations after a
   kth> reboot or connection failure.

how?

I do not think it is handled by SCSI layers, not for SAS nor iSCSI.

Also, remember a write command that goes into the write cache is a
SCSI command that's succeeded, even though it's not actually on disk
for sure unless you can complete a sync cache command successfully and
do so with no errors nor ``protocol events'' in the gap between the
successful write and the successful sync.  A facility to replay failed
commands won't help because when a drive with write cache on reboots,
successful writes are rolled back.


pgpcWFNVHdZuZ.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-22 Thread Kjetil Torgrim Homme

Miles Nordin  writes:

> There will probably be clients that might seem to implicitly make this
> assuption by mishandling the case where an iSCSI target goes away and
> then comes back (but comes back less whatever writes were in its write
> cache).  Handling that case for NFS was complicated, and I bet such
> complexity is just missing without any equivalent from the iSCSI spec,
> but I could be wrong.  I'd love to be educated.
>
> Even if there is some magical thing in iSCSI to handle it, the magic
> will be rarely used and often wrong until peopel learn how to test it,
> which they haven't yet they way they have with NFS.

I decided I needed to read up on this and found RFC 3783 which is very
readable, highly recommended:

  http://tools.ietf.org/html/rfc3783

basically iSCSI just defines a reliable channel for SCSI.  the SCSI
layer handles the replaying of operations after a reboot or connection
failure.  as far as I understand it, anyway.

-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-22 Thread Ragnar Sundblad

On 22 feb 2010, at 21.28, Miles Nordin wrote:

>> "rs" == Ragnar Sundblad  writes:
> 
>rs> But are there any clients that assume that an iSCSI volume is
>rs> synchronous?
> 
> there will probably be clients that might seem to implicitly make this
> assuption by mishandling the case where an iSCSI target goes away and
> then comes back (but comes back less whatever writes were in its write
> cache).  Handling that case for NFS was complicated, and I bet such
> complexity is just missing without any equivalent from the iSCSI spec,
> but I could be wrong.  I'd love to be educated.

Yes, this area may very well be a mine field of bugs. But this is
not a new phenomenon, it is the same with SAS, FC, USB, hot plug
disks, and even eSATA (and I guess with CD/DVD drives also with
SCSI with ATAPI (or rather SATAPI (does it have a name?))).

I believe the correct way of handling this in all those cases would
be having the old device instance fail, the file system being told
about it, having all current operations fail and all open files
be failed. When the disk comes back, it should get a new device
instance, and it should have to be remounted. All files will have
to be reopened. I hope no driver will just attach it again and happily
just continue without telling anyone/anything. But then again,
crazier things have been coded...

> Even if there is some magical thing in iSCSI to handle it, the magic
> will be rarely used and often wrong until peopel learn how to test it,
> which they haven't yet they way they have with NFS.

I am not sure there is anything really magic or unusual about this
really, but I certainly agree that it is a typical thing that might
not have been tested thoroughly enough.

/ragge

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-22 Thread Miles Nordin

> "rs" == Ragnar Sundblad  writes:

rs> But are there any clients that assume that an iSCSI volume is
rs> synchronous?

there will probably be clients that might seem to implicitly make this
assuption by mishandling the case where an iSCSI target goes away and
then comes back (but comes back less whatever writes were in its write
cache).  Handling that case for NFS was complicated, and I bet such
complexity is just missing without any equivalent from the iSCSI spec,
but I could be wrong.  I'd love to be educated.

Even if there is some magical thing in iSCSI to handle it, the magic
will be rarely used and often wrong until peopel learn how to test it,
which they haven't yet they way they have with NFS.

yeah, of course, making all writes synchronous isn't an ok way to fix
this case because it'll make iscsi way slower than non-iscsi
alternatives.

rs> Isn't an iSCSI target supposed to behave like any other SCSI
rs> disk (pSCSI, SAS, FC, USB MSC, SSA, ATAPI, FW SBP...)?  With
rs> that I mean: A disk which understands SCSI commands with an
rs> optional write cache that could be turned off, with cache sync
rs> command, and all those things.

yeah, reboot a SAS disk without rebooting the host it's attached to,
and you may see some dropped writes showing up as mysterious checksum
errors there as well.  I bet disabling said SAS disk's write cache
will lessen/eliminate that problem.

I think it's become a stupid mess because everyone assumed long past
the point where it became unreasonable that disks with mounted
filesystems would not ever lose power unless the kernel with the
mounted filesystem also lost power.

rs> But - all normal disks come with write caching enabled, [...]
rs> so why should an iSCSI lun behave any different?

because normal disks usually don't dump the contents of their write
caches on the floor unless the kernel running the filesystem code also
loses power at the same instant.  This coincident kernel panic acts as
a signal to the filesystem to expect some lost writes of the disks.
It also lets the kernel take advantage of NFS server reboot recovery
(asking NFS clients to replay some of their writes), and it's an
excuse to force-close any file a userland process might've had open on
the filesystem, thus forcing those userland processes to go through
their crash-recovery steps by replaying database logs and such.

Over iSCSI it's relatively common for a target to lose power and then
come back without its write cache.  but when iSCSI does it, now you
are expected to soldier on without killing all userland processes.
NFS probably could invoke its crash recovery state machine without an
actual server reboot if it wanted to, but I bet it doesn't currently
know how, and that's probably not the right fix because you've still
got the userland processes problem.

I agree with you iSCSI write cache needs to stay on, but there is
probably broken shit all over the place from this.  pre-ZFS iSCSI
targets tend to have battery-backed NVRAM so they can be
all-synchronous without demolishing performance and thus fix, or maybe
just ease a little bit, this problem.


pgp1nTEtWxPi8.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-20 Thread Richard Elling

On Feb 18, 2010, at 4:55 AM, Phil Harman wrote:

> This discussion is very timely, but I don't think we're done yet. I've been 
> working on using NexentaStor with Sun's DVI stack. The demo I've been playing 
> with glues SunRays to VirtualBox instances using ZFS zvols over iSCSI for the 
> boot image, with all the associated ZFS snapshot/clone goodness we all love 
> so well.
> 
> The supported config for the ZFS storage server is Solaris 10u7 or 10u8. When 
> I eventually got VDI going with NexentaStor (my value add), I found that some 
> operations which only took 10 minutes with Solaris 10u8 were taking over an 
> hour with NexentaStor. Using pfiles I found that iscsitgtd has the zvol open 
> O_SYNC.

You need the COMSTAR plugin for NexentaStor (no need to beat the dead horse :-)
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
http://nexenta-atlanta.eventbrite.com (March 15-17, 2010)



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-19 Thread Ragnar Sundblad

On 19 feb 2010, at 23.22, Phil Harman wrote:

> On 19/02/2010 21:57, Ragnar Sundblad wrote:
>> On 18 feb 2010, at 13.55, Phil Harman wrote:
>>   
>>> Whilst the latest bug fixes put the world to rights again with respect to 
>>> correctness, it may be that some of our performance workaround are still 
>>> unsafe (i.e. if my iSCSI client assumes all writes are synchronised to 
>>> nonvolatile storage, I'd better be pretty sure of the failure modes before 
>>> I work around that).
>>> 
>> But are there any clients that assume that an iSCSI volume is synchronous?
>> 
>> Isn't an iSCSI target supposed to behave like any other SCSI disk
>> (pSCSI, SAS, FC, USB MSC, SSA, ATAPI, FW SBP...)?
>> With that I mean: A disk which understands SCSI commands with an
>> optional write cache that could be turned off, with cache sync
>> command, and all those things.
>> Put in another way, isn't is the OS/file systems responsibility to
>> use the SCSI disk responsibly regardless of the underlying
>> protocol?
>> 
>> /ragge
>>   
> 
> Yes, that would be nice wouldn't it? But the world is seldom that simple, is 
> it? For example, Sun's first implementation of zvol was unsafe by default, 
> with no cache flush option either.
> 
> A few years back we used to note that one of the reasons Solaris was slower 
> than Linux at fileystems microbenchmarks was because Linux ran with the write 
> caches on (whereas we would never be that foolhardy).

(Exactly, and there are more "better fast than safe" evilness in that OS too, 
especially in the file system area. That is why I never use it for anything 
that should store anything.)

> And then this seems to claim that NTFS may not be that smart either ...
> 
>  http://blogs.sun.com/roch/entry/iscsi_unleashed
> 
> (see the WCE Settings paragraph)
> 
> I'm only going on what I've read.

But - all normal disks come with write caching enabled, so in both the Linux 
case and the NTFS case, this is how they always operate, with all disks, so why 
should an iSCSI lun behave any different?

If they can't handle the write cache (handle syncing, barriers, ordering an all 
that), they should turn the cache off, just as Solaris does in almost all cases 
except when you use an entire disk for zfs (I believe because solaris UFS was 
never really adapted to write caches). And they should do that for all SCSI 
disks.

(I seem to recall at in the bad old days you had to disable the write cache 
yourself if you should use a disk on SunOS, but that was probably because it 
wasn't standardized, and you did it with a jumper on the controller board.)

So - I just do not understand why an iSCSI lun should not try to emulate how 
all other SCSI disks work as much as possible? This must be the most compatible 
mode of operation, or am I wrong?

/ragge

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-19 Thread Ragnar Sundblad


On 19 feb 2010, at 23.20, Ross Walker wrote:

> On Feb 19, 2010, at 4:57 PM, Ragnar Sundblad  wrote:
> 
>> 
>> On 18 feb 2010, at 13.55, Phil Harman wrote:
>> 
>> ...
>>> Whilst the latest bug fixes put the world to rights again with respect to 
>>> correctness, it may be that some of our performance workaround are still 
>>> unsafe (i.e. if my iSCSI client assumes all writes are synchronised to 
>>> nonvolatile storage, I'd better be pretty sure of the failure modes before 
>>> I work around that).
>> 
>> But are there any clients that assume that an iSCSI volume is synchronous?
>> 
>> Isn't an iSCSI target supposed to behave like any other SCSI disk
>> (pSCSI, SAS, FC, USB MSC, SSA, ATAPI, FW SBP...)?
>> With that I mean: A disk which understands SCSI commands with an
>> optional write cache that could be turned off, with cache sync
>> command, and all those things.
>> Put in another way, isn't is the OS/file systems responsibility to
>> use the SCSI disk responsibly regardless of the underlying
>> protocol?
> 
> That was my argument a while back.
> 
> If you use /dev/dsk then all writes should be asynchronous and WCE should be 
> on and the initiator should issue a 'sync' to make sure it's in NV storage, 
> if you use /dev/rdsk all writes should be synchronous and WCE should be off. 
> RCD should be off in all cases and the ARC should cache all it can.
> 
> Making COMSTAR always start with /dev/rdsk and flip to /dev/dsk if the 
> initiator flags write cache is the wrong way to go about it. It's more 
> complicated then it needs to be and it leaves setting the storage policy up 
> to the system admin rather then the storage admin.
> 
> It would be better to put effort into supporting FUA and DPO options in the 
> target then dynamically changing a volume's cache policy from the initiator 
> side.

But wouldn't the most disk like behavior then be to implement all the
FUA, DPO, cache mode page, flush cache, etc, etc, have COMSTAR implement
a cache just like disks do, maybe have a user knob to set the cache size
(typically 32 MB or so on modern disks, could probably be used here too
as a default), and still use /dev/rdsk devices?

That could seem, in my naive limited little mind and humble opinion, as
a pretty good approximation of how real disks work, and no OS should have
to be more surprised than usual of how a SCSI disk works.

Maybe COMSTAR already does this, or parts of it?

Or am I wrong?

/ragge

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-19 Thread Phil Harman


On 19/02/2010 21:57, Ragnar Sundblad wrote:

On 18 feb 2010, at 13.55, Phil Harman wrote:
   

Whilst the latest bug fixes put the world to rights again with respect to 
correctness, it may be that some of our performance workaround are still unsafe 
(i.e. if my iSCSI client assumes all writes are synchronised to nonvolatile 
storage, I'd better be pretty sure of the failure modes before I work around 
that).
 

But are there any clients that assume that an iSCSI volume is synchronous?

Isn't an iSCSI target supposed to behave like any other SCSI disk
(pSCSI, SAS, FC, USB MSC, SSA, ATAPI, FW SBP...)?
With that I mean: A disk which understands SCSI commands with an
optional write cache that could be turned off, with cache sync
command, and all those things.
Put in another way, isn't is the OS/file systems responsibility to
use the SCSI disk responsibly regardless of the underlying
protocol?

/ragge
   


Yes, that would be nice wouldn't it? But the world is seldom that 
simple, is it? For example, Sun's first implementation of zvol was 
unsafe by default, with no cache flush option either.


A few years back we used to note that one of the reasons Solaris was 
slower than Linux at fileystems microbenchmarks was because Linux ran 
with the write caches on (whereas we would never be that foolhardy).


And then this seems to claim that NTFS may not be that smart either ...

  http://blogs.sun.com/roch/entry/iscsi_unleashed

(see the WCE Settings paragraph)

I'm only going on what I've read.

Cheers,
Phil

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-19 Thread Ross Walker


On Feb 19, 2010, at 4:57 PM, Ragnar Sundblad  wrote:



On 18 feb 2010, at 13.55, Phil Harman wrote:

...
Whilst the latest bug fixes put the world to rights again with  
respect to correctness, it may be that some of our performance  
workaround are still unsafe (i.e. if my iSCSI client assumes all  
writes are synchronised to nonvolatile storage, I'd better be  
pretty sure of the failure modes before I work around that).


But are there any clients that assume that an iSCSI volume is  
synchronous?


Isn't an iSCSI target supposed to behave like any other SCSI disk
(pSCSI, SAS, FC, USB MSC, SSA, ATAPI, FW SBP...)?
With that I mean: A disk which understands SCSI commands with an
optional write cache that could be turned off, with cache sync
command, and all those things.
Put in another way, isn't is the OS/file systems responsibility to
use the SCSI disk responsibly regardless of the underlying
protocol?


That was my argument a while back.

If you use /dev/dsk then all writes should be asynchronous and WCE  
should be on and the initiator should issue a 'sync' to make sure it's  
in NV storage, if you use /dev/rdsk all writes should be synchronous  
and WCE should be off. RCD should be off in all cases and the ARC  
should cache all it can.


Making COMSTAR always start with /dev/rdsk and flip to /dev/dsk if the  
initiator flags write cache is the wrong way to go about it. It's more  
complicated then it needs to be and it leaves setting the storage  
policy up to the system admin rather then the storage admin.


It would be better to put effort into supporting FUA and DPO options  
in the target then dynamically changing a volume's cache policy from  
the initiator side.


-Ross
 
___

zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-19 Thread Ragnar Sundblad

On 18 feb 2010, at 13.55, Phil Harman wrote:

...
> Whilst the latest bug fixes put the world to rights again with respect to 
> correctness, it may be that some of our performance workaround are still 
> unsafe (i.e. if my iSCSI client assumes all writes are synchronised to 
> nonvolatile storage, I'd better be pretty sure of the failure modes before I 
> work around that).

But are there any clients that assume that an iSCSI volume is synchronous?

Isn't an iSCSI target supposed to behave like any other SCSI disk
(pSCSI, SAS, FC, USB MSC, SSA, ATAPI, FW SBP...)?
With that I mean: A disk which understands SCSI commands with an
optional write cache that could be turned off, with cache sync
command, and all those things.
Put in another way, isn't is the OS/file systems responsibility to
use the SCSI disk responsibly regardless of the underlying
protocol?

/ragge

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-18 Thread Nigel Smith

Another things you could check, which has been reported to
cause a problem, is if network or disk drivers share an interrupt
with a slow device, like say a usb device. So try:

# echo ::interrupts -d | mdb -k

... and look for multiple driver names on an INT#.
Regards
Nigel Smith
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-18 Thread Nigel Smith

Hi Matt

> Haven't gotten NFS or CIFS to work properly.
> Maybe I'm just too dumb to figure it out,
> but I'm ending up with permissions errors that don't let me do much.
> All testing so far has been with iSCSI.

So until you can test NFS or CIFS, we don't know if it's a 
general performance problem, or just an iSCSI problem.

To get CIFS working, try this:

  
http://blogs.sun.com/observatory/entry/accessing_opensolaris_shares_from_windows

> Here's IOStat while doing writes : 
> Here's IOStat when doing reads : 

Your getting >1000 Kr/s & kw/s, so add the iostat 'M' option
to display throughput in MegaBytes per second.

> It'll sustain 10-12% gigabit for a few minutes, have a little dip,

I'd still be interested to see the size of the TCP buffers.
What does this report:

# ndd /dev/tcp  tcp_xmit_hiwat
# ndd /dev/tcp  tcp_recv_hiwat
# ndd /dev/tcp  tcp_conn_req_max_q
# ndd /dev/tcp  tcp_conn_req_max_q0

> Current NIC is an integrated NIC on an Abit Fatality motherboard.
> Just your generic fare gigabit network card.
> I can't imagine that it would be holding me back that much though.

Well there are sometimes bugs in the device drivers:

  http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6913756
  http://sigtar.com/2009/02/12/opensolaris-rtl81118168b-issues/

That's why I say don't just assume the network is performing to the optimum.

To do a local test, direct to the hard drives, you could try 'dd',
with various transfer sizes. Some advice from BenR, here:

  http://www.cuddletech.com/blog/pivot/entry.php?id=820

Regards
Nigel Smith
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-18 Thread Marc Nicholas

Run Bonnie++. You can install it with the Sun package manger and it'll
appear under /usr/benchmarks/bonnie++

Look for the command line I posted a couple of days back for a decent set of
flags to truly rate performance (using sync writes).

-marc

On Thu, Feb 18, 2010 at 11:05 AM, Matt wrote:

> Also - still looking for the best way to test local performance - I'd love
> to make sure that the volume is actually able to perform at a level locally
> to saturate gigabit.  If it can't do it internally, why should I expect it
> to work over GbE?
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance - napp-it + benchmarks

2010-02-18 Thread Bob Friesenhahn


On Thu, 18 Feb 2010, Günther wrote:

i was surprised about the seqential write/ rewrite result.
the wd 2 TB drives performs very well only in sequential write of characters 
but are horrible bad in blockwise write/ rewrite
the 15k sas drives with ssd read cache performs 20 x better (10MB/s -> 200 
MB/s)  


Usually very poor re-write performance is in indication of 
insufficient RAM for caching combined with imperfect alignment 
between the written block size and the underlying zfs block size. 
There is no doubt that an enterprise SAS drive will smoke a 
high-capacity SATA "green" drive when it comes to update performance.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-18 Thread Matt

Also - still looking for the best way to test local performance - I'd love to 
make sure that the volume is actually able to perform at a level locally to 
saturate gigabit.  If it can't do it internally, why should I expect it to work 
over GbE?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-18 Thread Marc Nicholas

On Thu, Feb 18, 2010 at 10:49 AM, Matt wrote:


> Here's IOStat while doing writes :
>
> r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>1.0  256.93.0 2242.9  0.3  0.11.30.5  11  12 c0t0d0
>0.0  253.90.0 2242.9  0.3  0.11.00.4  10  11 c0t1d0
>1.0  253.92.5 2234.4  0.2  0.10.90.4   9  11 c1t0d0
>1.0  258.92.5 2228.9  0.3  0.11.30.5  12  13 c1t1d0
>
> This shows about a 10-12% utilization of my gigabit network, as reported by
> Task Manager in Windows 7.
>

Unless you are using SSDs (which I believe you're not), you're IOPS-bound on
the drives IMHO. Writes are a better test of this than reads for cache
reasons.

-marc
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-18 Thread Matt

> One question though:
  
> Just this one SAS adaptor? Are you connecting to the
> drive
> backplane with one cable for the 4 internal SAS
> connectors?
> Are you using SAS or SATA drives? Will you be filling
> up 24
> slots with 2 TByte drives, and are you sure you won't
> be 
> oversubscribed with just 4x SAS? And SSD, which
> drives are you 
> using and in which mounts (internal or external
> caddies)?
> 
I'm just going to use the single 4x SAS.  1200MB/sec should be a great plenty 
for 24 drives total.  I'm going to be mounting 2x SSD for ZIL and 2x SSD for 
ARC, then 20-2TB drives.  I'm guessing that with a random I/O workload, I'll 
never hit the 1200MB/sec peak that the 4x SAS can sustain.

Also - for the ZIL I will be using 2x 32GB Intel X25-E SLC drives, and for the 
ARC I'll be using 2x 160GB Intel X25M MLC drives.  I'm hoping that the cache 
will allow me to saturate gigabit and eventually infiniband.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-18 Thread Matt

Responses inline : 

> Hi Matt
> Are the seeing low speeds on writes only or on both
> read AND write?
> 
Lows speeds both reading and writing.

> Are you seeing low speed just with iSCSI or also with
> NFS or CIFS?

Haven't gotten NFS or CIFS to work properly.  Maybe I'm just too dumb to figure 
it out, but I'm ending up with permissions errors that don't let me do much.  
All testing so far has been with iSCSI.
> 
 
> To check, do this:
> 
>   # svcs -a | grep iscsi
> If 'svc:/system/iscsitgt:default' is online,
> you are using the old & mature 'user mode' iscsi
> target.
> 
> If 'svc:/network/iscsi/target:default' is online,
> then you are using the new 'kernel mode' comstar
> iscsi target.

It shows that I'm using the COMSTAR target.

> 
> For another good way to monitor disk i/o, try:
> 
>   # iostat -xndz 1
> 

Here's IOStat while doing writes : 

r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
1.0  256.93.0 2242.9  0.3  0.11.30.5  11  12 c0t0d0
0.0  253.90.0 2242.9  0.3  0.11.00.4  10  11 c0t1d0
1.0  253.92.5 2234.4  0.2  0.10.90.4   9  11 c1t0d0
1.0  258.92.5 2228.9  0.3  0.11.30.5  12  13 c1t1d0

This shows about a 10-12% utilization of my gigabit network, as reported by 
Task Manager in Windows 7.


Here's IOStat when doing reads : 

  extended device statistics  
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  554.10.0 11256.80.0  3.8  0.76.81.3  68  70 c0t0d0
  749.10.0 11003.70.0  2.8  0.53.80.7  51  54 c0t1d0
  742.10.0 11333.40.0  2.9  0.53.90.7  51  49 c1t0d0
  736.10.0 11045.90.0  2.8  0.53.80.7  53  53 c1t1d0


Which gives me about 30% utilization.

Another copy to the SAN yielded this result : 

 extended device statistics  
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   15.1  314.2  883.9 4106.2  0.9  0.32.90.9  28  30 c0t0d0
   15.1  321.2  854.3 4106.2  0.9  0.32.70.8  26  26 c0t1d0
   28.1  315.2  916.5 4101.2  0.8  0.22.20.7  22  25 c1t0d0
   14.1  316.2  895.4 4097.2  0.9  0.32.70.8  26  27 c1t1d0


Which looks like writes held up at nearly 30% (doing multiple streams of data). 
 Still not gigabit, but getting better.  It also seems to be very hit-or-miss.  
It'll sustain 10-12% gigabit for a few minutes, have a little dip, jump up to 
15% for a while, then back to 10%, then up to 20%, then up to 30%, then back 
down.  I can't really make heads or tails of it.

> 
> Don't just assume that your Ethernet & IP & TCP
> layer
> are performing to the optimum - check it.
> 
> I often use 'iperf' or 'netperf' to do this:
> 
>   http://blogs.sun.com/observatory/entry/netperf
> (Iperf is available by installing the SUNWiperf
> package.
> A package for netperf is in the contrib repository.)
> 

I'll look in to this, I don't have either installed right now.

> The last time I checked, the default values used
> in the OpenSolaris TCP stack are not optimum
> for Gigabit speed, and need to be adjusted.
> Here is some advice, I found with Google, but
> there are others:
> 
> 
> ttp://serverfault.com/questions/13190/what-are-good-sp
> eeds-for-iscsi-and-nfs-over-1gb-ethernet
> 
> BTW, what sort of network card are you using,
> as this can make a difference.
> 

Current NIC is an integrated NIC on an Abit Fatality motherboard.  Just your 
generic fare gigabit network card.  I can't imagine that it would be holding me 
back that much though.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-18 Thread Phil Harman

This discussion is very timely, but I don't think we're done yet. I've 
been working on using NexentaStor with Sun's DVI stack. The demo I've 
been playing with glues SunRays to VirtualBox instances using ZFS zvols 
over iSCSI for the boot image, with all the associated ZFS 
snapshot/clone goodness we all love so well.


The supported config for the ZFS storage server is Solaris 10u7 or 10u8. 
When I eventually got VDI going with NexentaStor (my value add), I found 
that some operations which only took 10 minutes with Solaris 10u8 were 
taking over an hour with NexentaStor. Using pfiles I found that 
iscsitgtd has the zvol open O_SYNC.


My hope is that COMSTAR is a lot more intelligent, and that it does 
indeed support DKIOCFLUSHWRITECACHE. However, if your iSCSI client 
expects all writes to be flushed synchronously, all the debate we've 
seen on this list about the new wcd=false option for rdsk zvols is moot 
(as using the option, when it is available, could result in data loss).


When you do iSCSI to other big brand storage appliances, you generally 
have the benefit of NVRAM cacheing. As we all know, the same can be 
achieved with ZFS and an SSD "Logzilla". I didn't have one at hand, and 
I didn't think of disabling the ZIL (although some have reported that 
this only seems to help ZFS hosted files, not zvols). Instead, since I 
didn't mind losing my data, for the same of experiment, I added a TMPFS 
"Logzilla" ...


# mkfile 4g /tmp/zilla
# zpool add vdipool log /tmp/zilla

WARNING: DON'T TRY THIS ON ZPOOLS YOU CARE ABOUT!  However, for this 
purposes of my experiment, it worked a treat, proving to me that an SSD 
"Logzilla" was the way ahead.


I think a lot of the angst in this thread is because "it used to work"  
(i.e. we used to get great iSCSI performance from zvols). But then Sun 
fixed a glaring bug (i.e. that zvols were unsafe for synchronous writes) 
and our world fell apart.


Whilst the latest bug fixes put the world to rights again with respect 
to correctness, it may be that some of our performance workaround are 
still unsafe (i.e. if my iSCSI client assumes all writes are 
synchronised to nonvolatile storage, I'd better be pretty sure of the 
failure modes before I work around that).


Right now, it seems like an SSD "Logzilla" is needed if you want 
correctness and performance.


Phil Harman
Harman Holistix - focusing on the detail and the big picture
Our holistic services include: performance health checks, system tuning, 
DTrace training, coding advice, developer assassinations


http://blogs.sun.com/pgdh (mothballed)
http://harmanholistix.com/mt (current)
http://linkedin.com/in/philharman



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-18 Thread Eugen Leitl

On Wed, Feb 17, 2010 at 11:21:07PM -0800, Matt wrote:
> Just out of curiosity - what Supermicro chassis did you get?  I've got the 
> following items shipping to me right now, with SSD drives and 2TB main drives 
> coming as soon as the system boots and performs normally (using 8 extra 500GB 
> Barracuda ES.2 drives as test drives).

That looks like a sane combination. Please report how this particular
setup performs, I'm quite curious.

One question though:

> 
> http://www.acmemicro.com/estore/merchant.ihtml?pid=5440&lastcatid=53&step=4
> http://www.newegg.com/Product/Product.aspx?Item=N82E16820139043
> http://www.acmemicro.com/estore/merchant.ihtml?pid=4518&step=4

Just this one SAS adaptor? Are you connecting to the drive
backplane with one cable for the 4 internal SAS connectors?
Are you using SAS or SATA drives? Will you be filling up 24
slots with 2 TByte drives, and are you sure you won't be 
oversubscribed with just 4x SAS? And SSD, which drives are you 
using and in which mounts (internal or external caddies)?

> http://www.acmemicro.com/estore/merchant.ihtml?pid=6708&step=4
> http://www.newegg.com/Product/Product.aspx?Item=N82E16819117187
> http://www.newegg.com/Product/Product.aspx?Item=N82E16835203002

-- 
Eugen* Leitl http://leitl.org";>leitl http://leitl.org
__
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance - napp-it + benchmarks

2010-02-18 Thread Günther

hello

my intention was to show , how you can tune up a pool of  drives
(how much can you reach when using sas compared to 2 TB high capacity drives)


and now the other results with same config and sas drives:


wd 2TB x 7, z3, dedup and compress on, no ssd
daten   12.6T   start   2010.02.17  8G  202 MB/s83  
10 MB/s 4   4.436 MB/s  5   135 MB/s87  761 MB/s

sas 15k, 146GB x 4, z3+dedup and compress off, no ssd:
z3nocache   544Gstart   2010.02.18  8G  71 MB/s 31  
84 MB/s 15  47 MB/s 13  87 MB/s 55  113 
MB/s

sas 15k, 146GB x 4, z3+dedup and compress on, no ssd:
z3nocache   544Gstart   2010.02.18  8G  218 MB/s99  
410 MB/s92  171 MB/s50  148 MB/s92  578 
MB/s 

sas 15k, 146GB x 4, z3+dedup and compress on + ssd read cache:
z3cache 544Gstart   2010.02.17  8G  172 MB/s77  
205 MB/s40  95 MB/s 27  141 MB/s90  546 
MB/s  


# result ##
all pools are zfs z3
sas are Seagate 15K/m drives, 146 GB

 
seq-write-ch  seq-write-block   rewrite
read-charread-block

wd 2gb x7202 MB/s  10 MB/s   4,4 MB/s   135 
MB/s 761 MB/s

sas 15k x 4 no dedup: 71 MB/s  84 MB/s   47 MB/s 87 
MB/s 113 MB/s   
sas 15k x 4 +dedup+comp: 218 MB/s 410 MB/s  171 MB/s148 
MB/s 578 MB/s   
sas 15k x 4 +ded+ssd:172 MB/s 205 MB/s   95 MB/s141 
MB/s 546 MB/s


conclusion:
if you need performance: 
use fast sas drives
activate dedup and compress (if you have enough cpu power)
ssd read cache is not important in bonnie test

high capacity drives are very well in reading and seq. writing


-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance - napp-it + benchmarks

2010-02-18 Thread Tomas Ögren

On 18 February, 2010 - Günther sent me these 1,1K bytes:

> hello
> there is a new beta v. 0.220 of napp-it, the free webgui for nexenta(core) 3
> 
> new:
> -bonnie benchmarks included  http://www.napp-it.org/bench.png"; 
> target="_blank">see screenshot
> -bug fixes
> 
> if you look at the benchmark screenshot:
> -pool daten: zfs3 of 7 x wd 2TB raid edition (WD2002FYPS), dedup and compress 
> enabled
> -pool z3ssdcache: zfs3 of 4 sas Seagate 15k/s (ST3146855SS)  edition, 
>   dedup and compress enabled + ssd read cache (supertalent ultradrive 64GB)
> 
> i was surprised about the seqential write/ rewrite result.
> the wd 2 TB drives performs very well only in sequential write of characters 
> but are horrible bad in blockwise write/ rewrite
> the 15k sas drives with ssd read cache performs 20 x better (10MB/s -> 200 
> MB/s)  

Most probably due to lack of ram to hold the dedup tables, which your
second version "fixes" with an l2arc.

Try the same test without dedup or same l2arc in both, instead of
comparing apples to canoes.

> 
> downlaod:
> http://www.napp-it.org
> 
> howto setup
> http://www.napp-it.org/napp-it.pdf
> 
> 
> gea
> -- 
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance - napp-it + benchmarks

2010-02-18 Thread Günther

hello
there is a new beta v. 0.220 of napp-it, the free webgui for nexenta(core) 3

new:
-bonnie benchmarks included  http://www.napp-it.org/bench.png"; 
target="_blank">see screenshot
-bug fixes

if you look at the benchmark screenshot:
-pool daten: zfs3 of 7 x wd 2TB raid edition (WD2002FYPS), dedup and compress 
enabled
-pool z3ssdcache: zfs3 of 4 sas Seagate 15k/s (ST3146855SS)  edition, 
  dedup and compress enabled + ssd read cache (supertalent ultradrive 64GB)

i was surprised about the seqential write/ rewrite result.
the wd 2 TB drives performs very well only in sequential write of characters 
but are horrible bad in blockwise write/ rewrite
the 15k sas drives with ssd read cache performs 20 x better (10MB/s -> 200 
MB/s)  

downlaod:
http://www.napp-it.org

howto setup
http://www.napp-it.org/napp-it.pdf


gea
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-18 Thread Nigel Smith

Hi Matt
Are the seeing low speeds on writes only or on both read AND write?

Are you seeing low speed just with iSCSI or also with NFS or CIFS?

> I've tried updating to COMSTAR 
> (although I'm not certain that I'm actually using it)

To check, do this:

  # svcs -a | grep iscsi

If 'svc:/system/iscsitgt:default' is online,
you are using the old & mature 'user mode' iscsi target.

If 'svc:/network/iscsi/target:default' is online,
then you are using the new 'kernel mode' comstar iscsi target.

For another good way to monitor disk i/o, try:

  # iostat -xndz 1

  http://docs.sun.com/app/docs/doc/819-2240/iostat-1m?a=view

Don't just assume that your Ethernet & IP & TCP layer
are performing to the optimum - check it.

I often use 'iperf' or 'netperf' to do this:

  http://blogs.sun.com/observatory/entry/netperf

(Iperf is available by installing the SUNWiperf package.
A package for netperf is in the contrib repository.)

The last time I checked, the default values used
in the OpenSolaris TCP stack are not optimum
for Gigabit speed, and need to be adjusted.
Here is some advice, I found with Google, but
there are others:

  
http://serverfault.com/questions/13190/what-are-good-speeds-for-iscsi-and-nfs-over-1gb-ethernet

BTW, what sort of network card are you using,
as this can make a difference.

Regards
Nigel Smith
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-18 Thread Markus Kovero

> No one has said if they're using dks, rdsk, or file-backed COMSTAR LUNs yet.
> I'm using file-backed COMSTAR LUNs, with ZIL currently disabled.
> I can get between 100-200MB/sec, depending on random/sequential and block 
> sizes.
> 
> Using dsk/rdsk, I was not able to see that level of performance at all.
> 
> -- 
> Brent Jones
> br...@servuhome.net

Hi, I find comstar performance very low if using zvols under dsk, somehow using 
them under rdsk and letting comstar to handle cache makes performance really 
good (disks/nics become limiting factor).

Yours
Markus Kovero
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-18 Thread Brent Jones

On Wed, Feb 17, 2010 at 11:03 PM, Matt  wrote:
> No SSD Log device yet.  I also tried disabling the ZIL, with no effect on 
> performance.
>
> Also - what's the best way to test local performance?  I'm _somewhat_ dumb as 
> far as opensolaris goes, so if you could provide me with an exact command 
> line for testing my current setup (exactly as it appears above) I'd love to 
> report the local I/O readings.
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

No one has said if they're using dks, rdsk, or file-backed COMSTAR LUNs yet.
I'm using file-backed COMSTAR LUNs, with ZIL currently disabled.
I can get between 100-200MB/sec, depending on random/sequential and block sizes.

Using dsk/rdsk, I was not able to see that level of performance at all.

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-17 Thread Matt

Just out of curiosity - what Supermicro chassis did you get?  I've got the 
following items shipping to me right now, with SSD drives and 2TB main drives 
coming as soon as the system boots and performs normally (using 8 extra 500GB 
Barracuda ES.2 drives as test drives).


http://www.acmemicro.com/estore/merchant.ihtml?pid=5440&lastcatid=53&step=4
http://www.newegg.com/Product/Product.aspx?Item=N82E16820139043
http://www.acmemicro.com/estore/merchant.ihtml?pid=4518&step=4
http://www.acmemicro.com/estore/merchant.ihtml?pid=6708&step=4
http://www.newegg.com/Product/Product.aspx?Item=N82E16819117187
http://www.newegg.com/Product/Product.aspx?Item=N82E16835203002
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-17 Thread Matt

No SSD Log device yet.  I also tried disabling the ZIL, with no effect on 
performance.

Also - what's the best way to test local performance?  I'm _somewhat_ dumb as 
far as opensolaris goes, so if you could provide me with an exact command line 
for testing my current setup (exactly as it appears above) I'd love to report 
the local I/O readings.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-17 Thread Brent Jones

On Wed, Feb 17, 2010 at 10:42 PM, Matt  wrote:

>
> I've got a very similar rig to the OP showing up next week (plus an 
> infiniband card) I'd love to get this performing up to GB Ethernet speeds, 
> otherwise I may have to abandon the iSCSI project if I can't get it to 
> perform.

Do you have an SSD log device? If not, try disabling the ZIL
temporarily to see if that helps. Your workload will likely benefit
from a log device.

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-17 Thread Matt

Just wanted to add that I'm in the exact same boat - I'm connecting from a 
Windows system and getting just horrid iSCSI transfer speeds.

I've tried updating to COMSTAR (although I'm not certain that I'm actually 
using it) to no avail, and I tried updating to the latest DEV version of 
OpenSolaris.  All that resulted from updating to the latest DEV version was a 
completely broken system that the I couldn't access the command line on.  
Fortunately i was able to roll back to the previous version and keep tinkering.

Anyone have any ideas as to what could really be causing this slowdown?

I've got 5-500GB Seagate Barracuda ES.2 drives that I'm using for my zpools, 
and I've done the following.

1 - zpool create data mirror c0t0d0 c0t1d0
2 - zfs create -s -V 600g data/iscsitarget
3 - sbdadm create-lu /dev/zvol/rdsk/data/iscsitarget
4 - stfadm add-view xx

So I've got a 500GB RAID1 zpool, and I've created a 600GB sparse volume on top 
of it, shared it via iSCSI, and connected to it.  Everything works stellar up 
until I copy files to it, then I get just sluggishness.

I start to copy a file from my windows 7 system to the iSCSI target, then pull 
up IOSTAT using this command : zpool iostat -v data 10

It shows me this : 

   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
data 895M   463G  0666  0  7.93M
  mirror 895M   463G  0666  0  7.93M
c0t0d0  -  -  0269  0  7.91M
c0t1d0  -  -  0272  0  7.93M
--  -  -  -  -  -  -

So I figure, since ZFS is pretty sweet, how about I add some additional drives. 
 That should bump up my performance.

I execute this : 

zpool add data mirror c1t0d0 c1t1d0

It adds it to my zpool, and I run IOSTAT again, while the copy is still running.

   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
data1.17G   927G  0738  1.58K  8.87M
  mirror1.17G   463G  0390  1.58K  4.61M
c0t0d0  -  -  0172  1.58K  4.61M
c0t1d0  -  -  0175  0  4.61M
  mirror42.5K   464G  0348  0  4.27M
c1t0d0  -  -  0156  0  4.27M
c1t1d0  -  -  0159  0  4.27M
--  -  -  -  -  -  -


I get a whopping extra 1MB/sec by adding two drives.  It fluctuates a lot, 
sometimes dropping down to 4MB/sec, sometimes rocketing all the way up to 
20MB/sec, but nothing consistent.

Basically, my transfer rates are the same no matter how many drives I add to 
the zpool.

Is there anything I am missing on this?

BTW - "test" server specs

AMD dual core 6000+
2GB RAM
Onboard Sata controller
Onboard Ethernet (gigabit)

I've got a very similar rig to the OP showing up next week (plus an infiniband 
card) I'd love to get this performing up to GB Ethernet speeds, otherwise I may 
have to abandon the iSCSI project if I can't get it to perform.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-16 Thread Eric D. Mudama


On Tue, Feb 16 at  9:44, Brian E. Imhoff wrote:

But, at the end of the day, this is quite a bomb: "A single raidz2
vdev has about as many IOs per second as a single disk, which could
really hurt iSCSI performance."

If I have to break 24 disks up in to multiple vdevs to get the
expected performance might be a deal breaker.  To keep raidz2
redundancy, I would have to lose..almost half of the available
storage to get reasonable IO speeds.


ZFS is quite flexible.  You can put multiple vdevs in a pool, and dial
your performance/redundancy just about wherever you want them.

24 disks could be:

12x mirrored vdevs (best random IO, 50% capacity, any 1 failure absorbed, up to 
12 w/ limits)
6x 4-disk raidz vdevs (75% capacity, any 1 failure absorbed, up to 6 with 
limits)
4x 6-disk raidz vdevs (~83% capacity, any 1 failure absorbed, up to 4 with 
limits)
4x 6-disk raidz2 vdevs (~66% capacity, any 2 failures absorbed, up to 8 with 
limits)
1x 24-disk raidz2 vdev (~92% capacity, any 2 failures absorbed, worst random IO 
perf)
etc.

I think the 4x 6-disk raidz2 vdev setup is quite commonly used with 24
disks available, but each application is different.  We use mirrors
vdevs at work, with a separate box as a "live" backup using raidz of
larger SATA drives.

--eric

--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-16 Thread Richard Elling

On Feb 16, 2010, at 9:44 AM, Brian E. Imhoff wrote:

> Some more back story.  I initially started with Solaris 10 u8, and was 
> getting 40ish MB/s reads, and 65-70MB/s writes, which was still a far cry 
> from the performance I was getting with OpenFiler.  I decided to try 
> Opensolaris 2009.06, thinking that since it was more "state of the art & up 
> to date" then main Solaris. Perhaps there would be some performance tweaks or 
> bug fixes which might bring performance closer to what I saw with OpenFiler.  
>  But, then on an untouched clean install of OpenSolaris 2009.06, ran into 
> something...else...apparently causing this far far far worse performance.

You thought a release dated 2009.06 was further along than than a release dated
2009.10? :-)   CR 6794730 was fixed in April, 2009, after the freeze for the 
2009.06
release, but before the freeze for 2009.10.
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6794730

The schedule is published here, so you can see that there is a freeze now for
the 2010.03 OpenSolaris release.
http://hub.opensolaris.org/bin/view/Community+Group+on/schedule

As they say in comedy, timing is everything :-(

> But, at the end of the day, this is quite a bomb:  "A single raidz2 vdev has 
> about as many IOs per second as a single disk, which could really hurt iSCSI 
> performance."  

The context for this statement is for small, random reads.  40 MB/sec of 8KB 
reads is 5,000 IOPS, or about 50 HDDs worth of small random reads @ 100 
IOPS/disk,
or one decent SSD.

> If I have to break 24 disks up in to multiple vdevs to get the expected 
> performance might be a deal breaker.  To keep raidz2 redundancy, I would have 
> to lose..almost half of the available storage to get reasonable IO speeds.

Are your requirements for bandwidth or IOPS?

> Now knowing about vdev IO limitations, I believe the speeds I saw with 
> Solaris 10u8 are inline with those limitations, and instead of fighting with 
> whatever issue I have with this clean install of OpenSolaris, I reverted back 
> to 10u8.  I guess I'll just have to see if the speeds that Solaris ISCSI 
> w/ZFS is capable of, is workable for what I want to do, and what the size 
> sacrifice/performace acceptability point is at.

In Solaris 10 you are stuck with the legacy iSCSI target code. In OpenSolaris, 
you
have the option of using COMSTAR which performs and scales better, as Roch
describes here:
http://blogs.sun.com/roch/entry/iscsi_unleashed

> Thanks for all the responses and help.  First time posting here, and this 
> looks like an excellent community.

We try hard, and welcome the challenges :-)
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
http://nexenta-atlanta.eventbrite.com (March 15-17, 2010)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-16 Thread Brian E. Imhoff

Some more back story.  I initially started with Solaris 10 u8, and was getting 
40ish MB/s reads, and 65-70MB/s writes, which was still a far cry from the 
performance I was getting with OpenFiler.  I decided to try Opensolaris 
2009.06, thinking that since it was more "state of the art & up to date" then 
main Solaris. Perhaps there would be some performance tweaks or bug fixes which 
might bring performance closer to what I saw with OpenFiler.   But, then on an 
untouched clean install of OpenSolaris 2009.06, ran into 
something...else...apparently causing this far far far worse performance.

But, at the end of the day, this is quite a bomb:  "A single raidz2 vdev has 
about as many IOs per second as a single disk, which could really hurt iSCSI 
performance."  

If I have to break 24 disks up in to multiple vdevs to get the expected 
performance might be a deal breaker.  To keep raidz2 redundancy, I would have 
to lose..almost half of the available storage to get reasonable IO speeds.

Now knowing about vdev IO limitations, I believe the speeds I saw with Solaris 
10u8 are inline with those limitations, and instead of fighting with whatever 
issue I have with this clean install of OpenSolaris, I reverted back to 10u8.  
I guess I'll just have to see if the speeds that Solaris ISCSI w/ZFS is capable 
of, is workable for what I want to do, and what the size sacrifice/performace 
acceptability point is at.

Thanks for all the responses and help.  First time posting here, and this looks 
like an excellent community.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-16 Thread Richard Elling

On Feb 15, 2010, at 11:34 PM, Ragnar Sundblad wrote:
> 
> On 15 feb 2010, at 23.33, Bob Beverage wrote:
> 
>>> On Wed, Feb 10, 2010 at 10:06 PM, Brian E. Imhoff
>>>  wrote:
>>> I've seen exactly the same thing. Basically, terrible
>>> transfer rates
>>> with Windows
>>> and the server sitting there completely idle.
>> 
>> I am also seeing this behaviour.  It started somewhere around snv111 but I 
>> am not sure exactly when.  I used to get 30-40MB/s transfers over cifs but 
>> at some point that dropped to roughly 7.5MB/s.
> 
> Wasn't zvol changed a while ago from asynchronous to
> synchronous? Could that be it?

Yes.

> I don't understand that change at all - of course a zvol with or
> without iscsi to access it should behave exactly as a (not broken)
> disk, strictly obeying the protocol for write cache. cache flush etc.
> Having it entirely synchronous is in many cases almost as useless
> as having it asynchronous.

There are two changes at work here, and OpenSolaris 2009.06 is
in the middle of them -- and therefore is at the least optimal spot.
You have the choice of moving to a later build, after b113, which
has the proper fix.

> Just as much as zfs itself should demands this from it's disks, as it
> does, I believe it should provide this itself when used as storage
> for others. To me it seems that the zvol+iscsi functionality seems not
> ready for production and needs more work. If anyone has any better
> explanation, please share it with me!

The fix is in Solaris 10 10/09 and the OpenStorage software.  For some
reason, this fix is not available in the OpenSolaris supported bug fixes.
Perhaps someone from Oracle can shed light on that (non)decision?
So until next month, you will need to use an OpenSolaris dev release
after b113.

> I guess a good slog could help a bit, especially if you have a bursty
> write load.

Yes.
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
http://nexenta-atlanta.eventbrite.com (March 15-17, 2010)



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-15 Thread Ragnar Sundblad

On 15 feb 2010, at 23.33, Bob Beverage wrote:

>> On Wed, Feb 10, 2010 at 10:06 PM, Brian E. Imhoff
>>  wrote:
>> I've seen exactly the same thing. Basically, terrible
>> transfer rates
>> with Windows
>> and the server sitting there completely idle.
> 
> I am also seeing this behaviour.  It started somewhere around snv111 but I am 
> not sure exactly when.  I used to get 30-40MB/s transfers over cifs but at 
> some point that dropped to roughly 7.5MB/s.

Wasn't zvol changed a while ago from asynchronous to
synchronous? Could that be it?

I don't understand that change at all - of course a zvol with or
without iscsi to access it should behave exactly as a (not broken)
disk, strictly obeying the protocol for write cache. cache flush etc.
Having it entirely synchronous is in many cases almost as useless
as having it asynchronous.

Just as much as zfs itself should demands this from it's disks, as it
does, I believe it should provide this itself when used as storage
for others. To me it seems that the zvol+iscsi functionality seems not
ready for production and needs more work. If anyone has any better
explanation, please share it with me!

I guess a good slog could help a bit, especially if you have a bursty
write load.

/ragge

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-15 Thread Bob Beverage

> On Wed, Feb 10, 2010 at 10:06 PM, Brian E. Imhoff
>  wrote:
> I've seen exactly the same thing. Basically, terrible
> transfer rates
> with Windows
> and the server sitting there completely idle.

I am also seeing this behaviour.  It started somewhere around snv111 but I am 
not sure exactly when.  I used to get 30-40MB/s transfers over cifs but at some 
point that dropped to roughly 7.5MB/s.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-15 Thread Peter Tribble

On Wed, Feb 10, 2010 at 10:06 PM, Brian E. Imhoff  wrote:
> I am in the proof-of-concept phase of building a large ZFS/Solaris based SAN 
> box, and am experiencing absolutely poor / unusable performance.
...
>
> From here, I discover the iscsi target on our Windows server 2008 R2 File 
> server, and see the disk is attached in Disk Management.  I initialize the 
> 10TB disk fine, and begin to quick format it.  Here is where I begin to see 
> the poor performance issue.   The Quick Format took about 45 minutes. And 
> once the disk is fully mounted, I get maybe 2-5 MB/s average to this disk.

Did you actually make any progress on this?

I've seen exactly the same thing. Basically, terrible transfer rates
with Windows
and the server sitting there completely idle. We had support cases open with
both Sun and Microsoft, which got nowhere.

This seems to me to be more a case of working out where the impedance
mismatch is rather than a straightforward performance issue. In my case
I could saturate the network from a Solaris client, but only maybe 2% from
a Windows box. Yes, tweaking nagle got us to almost 3%. Still nowhere
near enough to make replacing our FC SAN with X4540s an attractive
proposition.

(And I see that most of the other replies simply asserted that your zfs
configuration was bad, without either having experienced this scenario
or worked out that the actual delivered performance was an order of
magnitude or two short of what even an admittedly sub-optimal configuration
ought to have delivered.)

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-10 Thread Kjetil Torgrim Homme

[please don't top-post, please remove CC's, please trim quotes.  it's
 really tedious to clean up your post to make it readable.]

Marc Nicholas  writes:
> Brent Jones  wrote:
>> Marc Nicholas  wrote:
>>> Kjetil Torgrim Homme  wrote:
 his problem is "lazy" ZFS, notice how it gathers up data for 15
 seconds before flushing the data to disk.  tweaking the flush
 interval down might help.
>>>
>>> How does lowering the flush interval help? If he can't ingress data
>>> fast enough, faster flushing is a Bad Thibg(tm).

if network traffic is blocked during the flush, you can experience
back-off on both the TCP and iSCSI level.

 what are the other values?  ie., number of ops and actual amount of
 data read/written.

this remained unanswered.

>> ZIL performance issues? Is writecache enabled on the LUNs?
> This is a Windows box, not a DB that flushes every write.

have you checked if the iSCSI traffic is synchronous or not?  I don't
use Windows, but other reports on the list have indicated that at least
the NTFS format operation *is* synchronous.  use zilstats to see.

> The drives are capable of over 2000 IOPS (albeit with high latency as
> its NCQ that gets you there) which would mean, even with sync flushes,
> 8-9MB/sec.

2000 IOPS is the aggregate, but the disks are set up as *one* RAID-Z2!
NCQ doesn't help much, since the write operations issued by ZFS are
already ordered correctly.

the OP may also want to try tweaking metaslab_df_free_pct, this helped
linear write performance on our Linux clients a lot:
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6869229

-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-10 Thread Brent Jones

On Wed, Feb 10, 2010 at 4:05 PM, Brent Jones  wrote:
> On Wed, Feb 10, 2010 at 3:12 PM, Marc Nicholas  wrote:
>> How does lowering the flush interval help? If he can't ingress data
>> fast enough, faster flushing is a Bad Thibg(tm).
>>
>> -marc
>>
>> On 2/10/10, Kjetil Torgrim Homme  wrote:
>>> Bob Friesenhahn  writes:
 On Wed, 10 Feb 2010, Frank Cusack wrote:

 The other three commonly mentioned issues are:

  - Disable the naggle algorithm on the windows clients.
>>>
>>> for iSCSI?  shouldn't be necessary.
>>>
  - Set the volume block size so that it matches the client filesystem
    block size (default is 128K!).
>>>
>>> default for a zvol is 8 KiB.
>>>
  - Check for an abnormally slow disk drive using 'iostat -xe'.
>>>
>>> his problem is "lazy" ZFS, notice how it gathers up data for 15 seconds
>>> before flushing the data to disk.  tweaking the flush interval down
>>> might help.
>>>
> An "iostat -xndz 1" readout of the "%b% coloum during a file copy to
> the LUN shows maybe 10-15 seconds of %b at 0 for all disks, then 1-2
> seconds of 100, and repeats.
>>>
>>> what are the other values?  ie., number of ops and actual amount of data
>>> read/written.
>>>
>>> --
>>> Kjetil T. Homme
>>> Redpill Linpro AS - Changing the game
>>>
>>> ___
>>> zfs-discuss mailing list
>>> zfs-discuss@opensolaris.org
>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>>
>>
>> --
>> Sent from my mobile device
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
>
> ZIL performance issues? Is writecache enabled on the LUNs?
>
> --
> Brent Jones
> br...@servuhome.net
>

Also, are you using rdsk based iSCSI LUNs, or file-based LUNs?

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-10 Thread Marc Nicholas

This is a Windows box, not a DB that flushes every write.

The drives are capable of over 2000 IOPS (albeit with high latency as
its NCQ that gets you there) which would mean, even with sync flushes,
8-9MB/sec.

-marc

On 2/10/10, Brent Jones  wrote:
> On Wed, Feb 10, 2010 at 3:12 PM, Marc Nicholas  wrote:
>> How does lowering the flush interval help? If he can't ingress data
>> fast enough, faster flushing is a Bad Thibg(tm).
>>
>> -marc
>>
>> On 2/10/10, Kjetil Torgrim Homme  wrote:
>>> Bob Friesenhahn  writes:
 On Wed, 10 Feb 2010, Frank Cusack wrote:

 The other three commonly mentioned issues are:

  - Disable the naggle algorithm on the windows clients.
>>>
>>> for iSCSI?  shouldn't be necessary.
>>>
  - Set the volume block size so that it matches the client filesystem
    block size (default is 128K!).
>>>
>>> default for a zvol is 8 KiB.
>>>
  - Check for an abnormally slow disk drive using 'iostat -xe'.
>>>
>>> his problem is "lazy" ZFS, notice how it gathers up data for 15 seconds
>>> before flushing the data to disk.  tweaking the flush interval down
>>> might help.
>>>
> An "iostat -xndz 1" readout of the "%b% coloum during a file copy to
> the LUN shows maybe 10-15 seconds of %b at 0 for all disks, then 1-2
> seconds of 100, and repeats.
>>>
>>> what are the other values?  ie., number of ops and actual amount of data
>>> read/written.
>>>
>>> --
>>> Kjetil T. Homme
>>> Redpill Linpro AS - Changing the game
>>>
>>> ___
>>> zfs-discuss mailing list
>>> zfs-discuss@opensolaris.org
>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>>
>>
>> --
>> Sent from my mobile device
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
>
> ZIL performance issues? Is writecache enabled on the LUNs?
>
> --
> Brent Jones
> br...@servuhome.net
>

-- 
Sent from my mobile device
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-10 Thread Brent Jones

On Wed, Feb 10, 2010 at 3:12 PM, Marc Nicholas  wrote:
> How does lowering the flush interval help? If he can't ingress data
> fast enough, faster flushing is a Bad Thibg(tm).
>
> -marc
>
> On 2/10/10, Kjetil Torgrim Homme  wrote:
>> Bob Friesenhahn  writes:
>>> On Wed, 10 Feb 2010, Frank Cusack wrote:
>>>
>>> The other three commonly mentioned issues are:
>>>
>>>  - Disable the naggle algorithm on the windows clients.
>>
>> for iSCSI?  shouldn't be necessary.
>>
>>>  - Set the volume block size so that it matches the client filesystem
>>>    block size (default is 128K!).
>>
>> default for a zvol is 8 KiB.
>>
>>>  - Check for an abnormally slow disk drive using 'iostat -xe'.
>>
>> his problem is "lazy" ZFS, notice how it gathers up data for 15 seconds
>> before flushing the data to disk.  tweaking the flush interval down
>> might help.
>>
 An "iostat -xndz 1" readout of the "%b% coloum during a file copy to
 the LUN shows maybe 10-15 seconds of %b at 0 for all disks, then 1-2
 seconds of 100, and repeats.
>>
>> what are the other values?  ie., number of ops and actual amount of data
>> read/written.
>>
>> --
>> Kjetil T. Homme
>> Redpill Linpro AS - Changing the game
>>
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
>
> --
> Sent from my mobile device
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

ZIL performance issues? Is writecache enabled on the LUNs?

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-10 Thread Marc Nicholas

How does lowering the flush interval help? If he can't ingress data
fast enough, faster flushing is a Bad Thibg(tm).

-marc

On 2/10/10, Kjetil Torgrim Homme  wrote:
> Bob Friesenhahn  writes:
>> On Wed, 10 Feb 2010, Frank Cusack wrote:
>>
>> The other three commonly mentioned issues are:
>>
>>  - Disable the naggle algorithm on the windows clients.
>
> for iSCSI?  shouldn't be necessary.
>
>>  - Set the volume block size so that it matches the client filesystem
>>block size (default is 128K!).
>
> default for a zvol is 8 KiB.
>
>>  - Check for an abnormally slow disk drive using 'iostat -xe'.
>
> his problem is "lazy" ZFS, notice how it gathers up data for 15 seconds
> before flushing the data to disk.  tweaking the flush interval down
> might help.
>
>>> An "iostat -xndz 1" readout of the "%b% coloum during a file copy to
>>> the LUN shows maybe 10-15 seconds of %b at 0 for all disks, then 1-2
>>> seconds of 100, and repeats.
>
> what are the other values?  ie., number of ops and actual amount of data
> read/written.
>
> --
> Kjetil T. Homme
> Redpill Linpro AS - Changing the game
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

-- 
Sent from my mobile device
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-10 Thread Kjetil Torgrim Homme

Bob Friesenhahn  writes:
> On Wed, 10 Feb 2010, Frank Cusack wrote:
>
> The other three commonly mentioned issues are:
>
>  - Disable the naggle algorithm on the windows clients.

for iSCSI?  shouldn't be necessary.

>  - Set the volume block size so that it matches the client filesystem
>block size (default is 128K!).

default for a zvol is 8 KiB.

>  - Check for an abnormally slow disk drive using 'iostat -xe'.

his problem is "lazy" ZFS, notice how it gathers up data for 15 seconds
before flushing the data to disk.  tweaking the flush interval down
might help.

>> An "iostat -xndz 1" readout of the "%b% coloum during a file copy to
>> the LUN shows maybe 10-15 seconds of %b at 0 for all disks, then 1-2
>> seconds of 100, and repeats.

what are the other values?  ie., number of ops and actual amount of data
read/written.

-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-10 Thread Marc Nicholas

Definitely use Comstar as Tim says.

At home I'm using 4*WD Caviar Blacks on an AMD Phenom x4 @ 1.Ghz and
only 2GB of RAM. I'm running svn132. No HBA - onboard SB700 SATA
ports.$

I can, with IOmeter, saturate GigE from my WinXP laptop via iSCSI.

Can you toss the RAID controller aside an use motherboard SATA ports
with just a few drives? That could help highlight if its the RAID
controler or not, and even one drive has better throughput than you're
seeing.

Cache, ZIL, and vdev tweaks are great - but you're not seeing any of
those bottlnecks, I can assure you.

-marc

On 2/10/10, Tim Cook  wrote:
> On Wed, Feb 10, 2010 at 4:06 PM, Brian E. Imhoff
> wrote:
>
>> I am in the proof-of-concept phase of building a large ZFS/Solaris based
>> SAN box, and am experiencing absolutely poor / unusable performance.
>>
>> Where to begin...
>>
>> The Hardware setup:
>> Supermicro 4U 24 Drive Bay Chassis
>> Supermicro X8DT3 Server Motherboard
>> 2x Xeon E5520 Nehalem 2.26 Quad Core CPUs
>> 4GB Memory
>> Intel EXPI9404PT 4 port 1000GB Server Network Card (used for ISCSI traffic
>> only)
>> Adaptec 52445 28 Port SATA/SAS Raid Controller connected to
>> 24x Western Digital WD1002FBYS 1TB Enterprise drives.
>>
>> I have configured the 24 drives as single simple volumes in the Adeptec
>> RAID BIOS , and are presenting them to the OS as such.
>>
>> I then, Create a zpool, using raidz2, using all 24 drives, 1 as a
>> hotspare:
>> zpool create tank raidz2 c1t0d0 c1t1d0 [] c1t22d0 spare c1t23d00
>>
>> Then create a volume store:
>> zfs create -o canmount=off tank/volumes
>>
>> Then create a 10 TB volume to be presented to our file server:
>> zfs create -V 10TB -o shareiscsi=on tank/volumes/fsrv1data
>>
>> From here, I discover the iscsi target on our Windows server 2008 R2 File
>> server, and see the disk is attached in Disk Management.  I initialize the
>> 10TB disk fine, and begin to quick format it.  Here is where I begin to
>> see
>> the poor performance issue.   The Quick Format took about 45 minutes. And
>> once the disk is fully mounted, I get maybe 2-5 MB/s average to this disk.
>>
>> I have no clue what I could be doing wrong.  To my knowledge, I followed
>> the documentation for setting this up correctly, though I have not looked
>> at
>> any tuning guides beyond the first line saying you shouldn't need to do
>> any
>> of this as the people who picked these defaults know more about it then
>> you.
>>
>> Jumbo Frames are enabled on both sides of the iscsi path, as well as on
>> the
>> switch, and rx/tx buffers increased to 2048 on both sides as well.  I know
>> this is not a hardware / iscsi network issue.  As another test, I
>> installed
>> Openfiler in a similar configuration (using hardware raid) on this box,
>> and
>> was getting 350-450 MB/S from our fileserver,
>>
>> An "iostat -xndz 1" readout of the "%b% coloum during a file copy to the
>> LUN shows maybe 10-15 seconds of %b at 0 for all disks, then 1-2 seconds
>> of
>> 100, and repeats.
>>
>> Is there anything I need to do to get this usable?  Or any additional
>> information I can provide to help solve this problem?  As nice as
>> Openfiler
>> is, it doesn't have ZFS, which is necessary to achieve our final goal.
>>
>>
>>
> You're extremely light on ram for a system with 24TB of storage and two
> E5520's.  I don't think it's the entire source of your issue, but I'd
> strongly suggest considering doubling what you have as a starting point.
>
> What version of opensolaris are you using?  Have you considered using
> COMSTAR as your iSCSI target?
>
> --Tim
>

-- 
Sent from my mobile device
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-10 Thread Bob Friesenhahn


On Wed, 10 Feb 2010, Frank Cusack wrote:


On 2/10/10 2:06 PM -0800 Brian E. Imhoff wrote:

I then, Create a zpool, using raidz2, using all 24 drives, 1 as a
hotspare: zpool create tank raidz2 c1t0d0 c1t1d0 [] c1t22d0 spare
c1t23d00


Well there's one problem anyway.  That's going to be horribly slow no
matter what.


The other three commonly mentioned issues are:

 - Disable the naggle algorithm on the windows clients.

 - Set the volume block size so that it matches the client filesystem
   block size (default is 128K!).

 - Check for an abnormally slow disk drive using 'iostat -xe'.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-10 Thread Tim Cook

On Wed, Feb 10, 2010 at 4:06 PM, Brian E. Imhoff wrote:

> I am in the proof-of-concept phase of building a large ZFS/Solaris based
> SAN box, and am experiencing absolutely poor / unusable performance.
>
> Where to begin...
>
> The Hardware setup:
> Supermicro 4U 24 Drive Bay Chassis
> Supermicro X8DT3 Server Motherboard
> 2x Xeon E5520 Nehalem 2.26 Quad Core CPUs
> 4GB Memory
> Intel EXPI9404PT 4 port 1000GB Server Network Card (used for ISCSI traffic
> only)
> Adaptec 52445 28 Port SATA/SAS Raid Controller connected to
> 24x Western Digital WD1002FBYS 1TB Enterprise drives.
>
> I have configured the 24 drives as single simple volumes in the Adeptec
> RAID BIOS , and are presenting them to the OS as such.
>
> I then, Create a zpool, using raidz2, using all 24 drives, 1 as a hotspare:
> zpool create tank raidz2 c1t0d0 c1t1d0 [] c1t22d0 spare c1t23d00
>
> Then create a volume store:
> zfs create -o canmount=off tank/volumes
>
> Then create a 10 TB volume to be presented to our file server:
> zfs create -V 10TB -o shareiscsi=on tank/volumes/fsrv1data
>
> From here, I discover the iscsi target on our Windows server 2008 R2 File
> server, and see the disk is attached in Disk Management.  I initialize the
> 10TB disk fine, and begin to quick format it.  Here is where I begin to see
> the poor performance issue.   The Quick Format took about 45 minutes. And
> once the disk is fully mounted, I get maybe 2-5 MB/s average to this disk.
>
> I have no clue what I could be doing wrong.  To my knowledge, I followed
> the documentation for setting this up correctly, though I have not looked at
> any tuning guides beyond the first line saying you shouldn't need to do any
> of this as the people who picked these defaults know more about it then you.
>
> Jumbo Frames are enabled on both sides of the iscsi path, as well as on the
> switch, and rx/tx buffers increased to 2048 on both sides as well.  I know
> this is not a hardware / iscsi network issue.  As another test, I installed
> Openfiler in a similar configuration (using hardware raid) on this box, and
> was getting 350-450 MB/S from our fileserver,
>
> An "iostat -xndz 1" readout of the "%b% coloum during a file copy to the
> LUN shows maybe 10-15 seconds of %b at 0 for all disks, then 1-2 seconds of
> 100, and repeats.
>
> Is there anything I need to do to get this usable?  Or any additional
> information I can provide to help solve this problem?  As nice as Openfiler
> is, it doesn't have ZFS, which is necessary to achieve our final goal.
>
>
>
You're extremely light on ram for a system with 24TB of storage and two
E5520's.  I don't think it's the entire source of your issue, but I'd
strongly suggest considering doubling what you have as a starting point.

What version of opensolaris are you using?  Have you considered using
COMSTAR as your iSCSI target?

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-10 Thread David Dyer-Bennet


On Wed, February 10, 2010 16:28, Will Murnane wrote:
> On Wed, Feb 10, 2010 at 17:06, Brian E. Imhoff 
> wrote:
>> I am in the proof-of-concept phase of building a large ZFS/Solaris based
>> SAN box, and am experiencing absolutely poor / unusable performance.
>>
>> I then, Create a zpool, using raidz2, using all 24 drives, 1 as a
>> hotspare:
>> zpool create tank raidz2 c1t0d0 c1t1d0 [] c1t22d0 spare c1t23d00
> Create several smaller raidz2 vdevs, and consider adding a log device
> and/or cache devices.  A single raidz2 vdev has about as many IOs per
> second as a single disk, which could really hurt iSCSI performance.
>  zpool create tank raidz2 c1t0d0 c1t1d0 ... \
>   raidz2 c1t5d0 c1t6d0 ... \
>   etc
> You might try, say, four 5-wide stripes with a spare, a mirrored log
> device, and a cache device.  More memory wouldn't hurt anything,
> either.

That's useful general advice for increasing I/O I think, but he clearly
has something other than a "general" problem.  Did you read the numbers he
gave on his iSCSI performance?  That can't be explained just by
overly-large RAIDZ groups I don't think.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-10 Thread Frank Cusack


On 2/10/10 2:06 PM -0800 Brian E. Imhoff wrote:

I then, Create a zpool, using raidz2, using all 24 drives, 1 as a
hotspare: zpool create tank raidz2 c1t0d0 c1t1d0 [] c1t22d0 spare
c1t23d00


Well there's one problem anyway.  That's going to be horribly slow no
matter what.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-10 Thread Will Murnane

On Wed, Feb 10, 2010 at 17:06, Brian E. Imhoff  wrote:
> I am in the proof-of-concept phase of building a large ZFS/Solaris based SAN 
> box, and am experiencing absolutely poor / unusable performance.
>
> I then, Create a zpool, using raidz2, using all 24 drives, 1 as a hotspare:
> zpool create tank raidz2 c1t0d0 c1t1d0 [] c1t22d0 spare c1t23d00
Create several smaller raidz2 vdevs, and consider adding a log device
and/or cache devices.  A single raidz2 vdev has about as many IOs per
second as a single disk, which could really hurt iSCSI performance.
 zpool create tank raidz2 c1t0d0 c1t1d0 ... \
  raidz2 c1t5d0 c1t6d0 ... \
  etc
You might try, say, four 5-wide stripes with a spare, a mirrored log
device, and a cache device.  More memory wouldn't hurt anything,
either.

Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-10 Thread Brian E. Imhoff

I am in the proof-of-concept phase of building a large ZFS/Solaris based SAN 
box, and am experiencing absolutely poor / unusable performance.  

Where to begin...

The Hardware setup:
Supermicro 4U 24 Drive Bay Chassis
Supermicro X8DT3 Server Motherboard
2x Xeon E5520 Nehalem 2.26 Quad Core CPUs
4GB Memory
Intel EXPI9404PT 4 port 1000GB Server Network Card (used for ISCSI traffic only)
Adaptec 52445 28 Port SATA/SAS Raid Controller connected to 
24x Western Digital WD1002FBYS 1TB Enterprise drives.

I have configured the 24 drives as single simple volumes in the Adeptec RAID 
BIOS , and are presenting them to the OS as such.  

I then, Create a zpool, using raidz2, using all 24 drives, 1 as a hotspare:
zpool create tank raidz2 c1t0d0 c1t1d0 [] c1t22d0 spare c1t23d00

Then create a volume store:
zfs create -o canmount=off tank/volumes

Then create a 10 TB volume to be presented to our file server:
zfs create -V 10TB -o shareiscsi=on tank/volumes/fsrv1data

>From here, I discover the iscsi target on our Windows server 2008 R2 File 
>server, and see the disk is attached in Disk Management.  I initialize the 
>10TB disk fine, and begin to quick format it.  Here is where I begin to see 
>the poor performance issue.   The Quick Format took about 45 minutes. And once 
>the disk is fully mounted, I get maybe 2-5 MB/s average to this disk.  

I have no clue what I could be doing wrong.  To my knowledge, I followed the 
documentation for setting this up correctly, though I have not looked at any 
tuning guides beyond the first line saying you shouldn't need to do any of this 
as the people who picked these defaults know more about it then you.

Jumbo Frames are enabled on both sides of the iscsi path, as well as on the 
switch, and rx/tx buffers increased to 2048 on both sides as well.  I know this 
is not a hardware / iscsi network issue.  As another test, I installed 
Openfiler in a similar configuration (using hardware raid) on this box, and was 
getting 350-450 MB/S from our fileserver, 

An "iostat -xndz 1" readout of the "%b% coloum during a file copy to the LUN 
shows maybe 10-15 seconds of %b at 0 for all disks, then 1-2 seconds of 100, 
and repeats.

Is there anything I need to do to get this usable?  Or any additional 
information I can provide to help solve this problem?  As nice as Openfiler is, 
it doesn't have ZFS, which is necessary to achieve our final goal.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

51 matches

Mail list logo