Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-22 Thread Robert Milkowski
Hello Richard, Wednesday, October 15, 2008, 6:39:49 PM, you wrote: RE Archie Cowan wrote: I just stumbled upon this thread somehow and thought I'd share my zfs over iscsi experience. We recently abandoned a similar configuration with several pairs of x4500s exporting zvols as iscsi

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-22 Thread Richard Elling
Robert Milkowski wrote: Hello Richard, Wednesday, October 15, 2008, 6:39:49 PM, you wrote: RE Archie Cowan wrote: I just stumbled upon this thread somehow and thought I'd share my zfs over iscsi experience. We recently abandoned a similar configuration with several pairs of x4500s

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-20 Thread Gary Mills
On Thu, Oct 16, 2008 at 03:50:19PM +0800, Gray Carper wrote: Sidenote: Today we made eight network/iSCSI related tweaks that, in aggregate, have resulted in dramatic performance improvements (some I just hadn't gotten around to yet, others suggested by Sun's Mertol Ozyoney)...

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-20 Thread Jim Dunham
Gary, Sidenote: Today we made eight network/iSCSI related tweaks that, in aggregate, have resulted in dramatic performance improvements (some I just hadn't gotten around to yet, others suggested by Sun's Mertol Ozyoney)... - disabling the Nagle algorithm on the head node -

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-20 Thread Gray Carper
Hey, Jim! Thanks so much for the excellent assist on this - much better than I could have ever answered it! I thought I'd add a little bit on the other four... - raising ddi_msix_alloc_limit to 8 For PCI cards that use up to 8 interrupts, which our 10GBe adapters do. The previous value of 2

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-17 Thread Ross
Some of that is very worrying Miles, do you have bug ID's for any of those problems? I'm guessing the problem of the device being reported ok after the reboot could be this one: http://bugs.opensolaris.org/view_bug.do?bug_id=6582549 And could the errors after the reboot be one of these?

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-16 Thread Ross
Well obviously recovery scenarios need testing, but I still don't see it being that bad. My thinking on this is: 1. Loss of a server is very much the worst case scenario. Disk errors are much more likely, and with raid-z2 pools on the individual servers this should not pose a problem. I

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-16 Thread Gray Carper
Howdy! Very valuable advice here (and from Bob, who made similar comments - thanks, Bob!). I think, then, we'll generally stick to 128K recordsizes. In the case of databases, we'll stray as appropriate, and we may also stray with the HPC compute cluster if we can get demonstrate that it is worth

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-16 Thread Ross
Miles makes a good point here, you really need to look at how this copes with various failure modes. Based on my experience, iSCSI is something that may cause you problems. When I tested this kind of setup last year I found that the entire pool hung for 3 minutes any time an iSCSI volume went

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-16 Thread Gray Carper
Oops - one thing I meant to mention: We only plan to cross-site replicate data for those folks who require it. The HPC data crunching would have no use for it, so that filesystem wouldn't be replicated. In reality, we only expect a select few users, with relatively small filesystems, to actually

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-16 Thread Miles Nordin
r == Ross [EMAIL PROTECTED] writes: r 1. Loss of a server is very much the worst case scenario. r Disk errors are much more likely, and with raid-z2 pools on r the individual servers yeah, it kind of sucks that the slow resilvering speed enforces this two-tier scheme. Also if

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-16 Thread Marion Hakanson
[EMAIL PROTECTED] said: It's interesting how the speed and optimisation of these maintenance activities limit pool size. It's not just full scrubs. If the filesystem is subject to corruption, you need a backup. If the filesystem takes two months to back up / restore, then you need really

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-16 Thread Erast Benson
pNFS is NFS-centric of course and it is not yet stable, isn't it? btw, what is the ETA for pNFS putback? On Thu, 2008-10-16 at 12:20 -0700, Marion Hakanson wrote: [EMAIL PROTECTED] said: It's interesting how the speed and optimisation of these maintenance activities limit pool size. It's

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-16 Thread Nicolas Williams
On Thu, Oct 16, 2008 at 12:20:36PM -0700, Marion Hakanson wrote: I'll chime in here with feeling uncomfortable with such a huge ZFS pool, and also with my discomfort of the ZFS-over-ISCSI-on-ZFS approach. There just seem to be too many moving parts depending on each other, any one of which

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-16 Thread Marion Hakanson
[EMAIL PROTECTED] said: In general, such tasks would be better served by T5220 (or the new T5440 :-) and J4500s. This would change the data paths from: client --net-- T5220 --net-- X4500 --SATA-- disks to client --net-- T5440 --SAS-- disks With the J4500 you get the same storage

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-16 Thread Miles Nordin
nw == Nicolas Williams [EMAIL PROTECTED] writes: nw But does it work well enough? It may be faster than NFS if You're talking about different things. Gray is using NFS period between the storage cluster and the compute cluster, no iSCSI. Gray's (``does it work well enough''): iSCSI

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-16 Thread Nicolas Williams
On Thu, Oct 16, 2008 at 04:30:28PM -0400, Miles Nordin wrote: nw == Nicolas Williams [EMAIL PROTECTED] writes: nw But does it work well enough? It may be faster than NFS if You're talking about different things. Gray is using NFS period between the storage cluster and the compute

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-16 Thread Miles Nordin
nw == Nicolas Williams [EMAIL PROTECTED] writes: mh == Marion Hakanson [EMAIL PROTECTED] writes: nw I was replying to Marion's [...] nw ZFS-over-iSCSI could certainly perform better than NFS, better than what, ZFS-over-'mkfile'-files-on-NFS? No one was suggesting that. Do you mean

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-16 Thread David Magda
On Oct 16, 2008, at 15:20, Marion Hakanson wrote: For the stated usage of the original poster, I think I would aim toward turning each of the Thumpers into an NFS server, configure the head- node as a pNFS/NFSv4.1 It's a shame that Lustre isn't available on Solaris yet either.

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-16 Thread Marion Hakanson
[EMAIL PROTECTED] said: but Marion's is not really possible at all, and won't be for a while with other groups' choice of storage-consumer platform, so it'd have to be GlusterFS or some other goofy fringe FUSEy thing or not-very-general crude in-house hack. Well, of course the magnitude of

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-15 Thread Bob Friesenhahn
On Wed, 15 Oct 2008, Gray Carper wrote: be good to set different recordsize paramaters for each one. Do you have any suggestions on good starting sizes for each? I'd imagine filesharing might benefit from a relatively small record size (64K?), image-based backup targets might like a pretty

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-15 Thread Archie Cowan
I just stumbled upon this thread somehow and thought I'd share my zfs over iscsi experience. We recently abandoned a similar configuration with several pairs of x4500s exporting zvols as iscsi targets and mirroring them for high availability with T5220s. Initially, our performance was also

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-15 Thread Gray Carper
Howdy, Brent! Thanks for your interest! We're pretty enthused about this project over here and I'd be happy to share some details with you (and anyone else who cares to peek). In this post I'll try to hit the major configuration bullet-points, but I can also throw you command-line level specifics

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-15 Thread Richard Elling
Archie Cowan wrote: I just stumbled upon this thread somehow and thought I'd share my zfs over iscsi experience. We recently abandoned a similar configuration with several pairs of x4500s exporting zvols as iscsi targets and mirroring them for high availability with T5220s. In

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-15 Thread Akhilesh Mritunjai
Hi Gray, You've got a nice setup going there, few comments: 1. Do not tune ZFS without a proven test-case to show otherwise, except... 2. For databases. Tune recordsize for that particular FS to match DB recordsize. Few questions... * How are you divvying up the space ? * How are you taking

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-15 Thread Ross
Am I right in thinking your top level zpool is a raid-z pool consisting of six 28TB iSCSI volumes? If so that's a very nice setup, it's what we'd be doing if we had that kind of cash :-) -- This message posted from opensolaris.org ___ zfs-discuss

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-15 Thread Miles Nordin
gc == Gray Carper [EMAIL PROTECTED] writes: gc 5. The NAS nead node has wrangled up all six of the iSCSI gc targets are you using raidz on the head node? It sounds like simple striping, which is probably dangerous with the current code. This kind of sucks because with simple striping

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-15 Thread Miles Nordin
r == Ross [EMAIL PROTECTED] writes: r Am I right in thinking your top level zpool is a raid-z pool r consisting of six 28TB iSCSI volumes? If so that's a very r nice setup, not if it scrubs at 400GB/day, and 'zfs send' is uselessly slow. Also I am thinking the J4500 Richard

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-15 Thread Bob Friesenhahn
On Wed, 15 Oct 2008, Marcelo Leal wrote: Are you talking about what he had in the logic of the configuration at top level, or you are saying his top level pool is a raidz? I would think his top level zpool is a raid0... ZFS does not support RAID0 (simple striping). Bob

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-15 Thread Tomas Ögren
On 15 October, 2008 - Bob Friesenhahn sent me these 0,6K bytes: On Wed, 15 Oct 2008, Marcelo Leal wrote: Are you talking about what he had in the logic of the configuration at top level, or you are saying his top level pool is a raidz? I would think his top level zpool is a raid0...

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-15 Thread Marcelo Leal
So, there is no raid10 in a solaris/zfs setup? I´m talking about no redundancy... -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-15 Thread Bob Friesenhahn
On Wed, 15 Oct 2008, Tomas Ögren wrote: ZFS does not support RAID0 (simple striping). zpool create mypool disk1 disk2 disk3 Sure it does. This is load-share, not RAID0. Also, to answer the other fellow, since ZFS does not support RAID0, it also does not support RAID 1+0 (10). :-) With

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-15 Thread Richard Elling
Bob Friesenhahn wrote: On Wed, 15 Oct 2008, Tomas Ögren wrote: ZFS does not support RAID0 (simple striping). zpool create mypool disk1 disk2 disk3 Sure it does. This is load-share, not RAID0. Also, to answer the other fellow, since ZFS does not support RAID0, it also does not support

[zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-14 Thread Gray Carper
Hey, all! We've recently used six x4500 Thumpers, all publishing ~28TB iSCSI targets over ip-multipathed 10GB ethernet, to build a ~150TB ZFS pool on an x4200 head node. In trying to discover optimal ZFS pool construction settings, we've run a number of iozone tests, so I thought I'd share

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-14 Thread Gray Carper
Howdy! Sounds good. We'll upgrade to 1.1 (b101) as soon as it is released, re-run our battery of tests, and see where we stand. Thanks! -Gray On Tue, Oct 14, 2008 at 8:47 PM, James C. McPherson [EMAIL PROTECTED] wrote: Gray Carper wrote: Hello again! (And hellos to Erast, who has been a

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-14 Thread Gray Carper
Hey there, James! We're actually running NexentaStor v1.0.8, which is based on b85. We haven't done any tuning ourselves, but I suppose it is possible that Nexenta did. If there's something specific you have in mind, I'd be happy to look for it. Thanks! -Gray On Tue, Oct 14, 2008 at 8:10 PM,

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-14 Thread James C. McPherson
Gray Carper wrote: Hey there, James! We're actually running NexentaStor v1.0.8, which is based on b85. We haven't done any tuning ourselves, but I suppose it is possible that Nexenta did. If there's something specific you'd like me to look for, I'd be happy to. Hi Gray, So build 85

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-14 Thread Akhilesh Mritunjai
Just a random spectator here, but I think artifacts you're seeing here are not due to file size, but rather due to record size. What is the ZFS record size ? On a personal note, I wouldn't do non-concurrent (?) benchmarks. They are at best useless and at worst misleading for ZFS - Akhilesh.

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-14 Thread Bob Friesenhahn
On Tue, 14 Oct 2008, Gray Carper wrote: So, how concerned should we be about the low scores here and there? Any suggestions on how to improve our configuration? And how excited should we be about the 8GB tests? ; The level of concern should depend on how you expect your storage pool to

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-14 Thread Erast Benson
James, all serious ZFS bug fixes back-ported to b85 as well as marvell and other sata drivers. Not everything is possible to back-port of course, but I would say all critical things are there. This includes ZFS ARC optimization patches, for example. On Tue, 2008-10-14 at 22:33 +1000, James C.

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-14 Thread Brent Jones
On Tue, Oct 14, 2008 at 12:31 AM, Gray Carper [EMAIL PROTECTED] wrote: Hey, all! We've recently used six x4500 Thumpers, all publishing ~28TB iSCSI targets over ip-multipathed 10GB ethernet, to build a ~150TB ZFS pool on an x4200 head node. In trying to discover optimal ZFS pool

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-14 Thread James C. McPherson
Gray Carper wrote: Hey, all! We've recently used six x4500 Thumpers, all publishing ~28TB iSCSI targets over ip-multipathed 10GB ethernet, to build a ~150TB ZFS pool on an x4200 head node. In trying to discover optimal ZFS pool construction settings, we've run a number of iozone tests, so I

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-14 Thread James C. McPherson
Gray Carper wrote: Hello again! (And hellos to Erast, who has been a huge help to me many, many times! :) As I understand it, Nexenta 1.1 should be released in a matter of weeks and it'll be based on build 101. We are waiting for that with baited breath, since it includes some very

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-14 Thread Gray Carper
Hello again! (And hellos to Erast, who has been a huge help to me many, many times! :) As I understand it, Nexenta 1.1 should be released in a matter of weeks and it'll be based on build 101. We are waiting for that with baited breath, since it includes some very important Active Directory

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-14 Thread James C. McPherson
Erast Benson wrote: James, all serious ZFS bug fixes back-ported to b85 as well as marvell and other sata drivers. Not everything is possible to back-port of course, but I would say all critical things are there. This includes ZFS ARC optimization patches, for example. Excellent! James --

Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-14 Thread Gray Carper
Hey there, Bob! Looks like you and Akhilesh (thanks, Akhilesh!) are driving at a similar, very valid point. I'm currently using the default recordsize (128K) on all of the ZFS pool (those of the iSCSI target nodes and the aggregate pool on the head node). I should've mentioned something about