Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Jeroen Roodhart
Hi list,

 If you're running solaris proper, you better mirror
 your
  ZIL log device.  
...
 I plan to get to test this as well, won't be until
 late next week though.

Running OSOL nv130. Power off the machine, removed the F20 and power back on. 
Machines boots OK and comes up normally with the following message in 'zpool 
status':
...
pool: mypool
 state: FAULTED
status: An intent log record could not be read.
Waiting for adminstrator intervention to fix the faulted pool.
action: Either restore the affected device(s) and run 'zpool online',
or ignore the intent log records by running 'zpool clear'.
   see: http://www.sun.com/msg/ZFS-8000-K4
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
mypool  FAULTED  0 0 0  bad intent log
...

Nice! Running a later version of ZFS seems to lessen the need for 
ZIL-mirroring...

With kind regards,

Jeroen
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Jeroen Roodhart
Hi list,

 If you're running solaris proper, you better mirror
 your
  ZIL log device.  
...
 I plan to get to test this as well, won't be until
 late next week though.

Running OSOL nv130. Power off the machine, removed the F20 and power back on. 
Machines boots OK and comes up normally with the following message in 'zpool 
status':
...
pool: mypool
 state: FAULTED
status: An intent log record could not be read.
Waiting for adminstrator intervention to fix the faulted pool.
action: Either restore the affected device(s) and run 'zpool online',
or ignore the intent log records by running 'zpool clear'.
   see: http://www.sun.com/msg/ZFS-8000-K4
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
mypool  FAULTED  0 0 0  bad intent log
...

Nice! Running a later version of ZFS seems to lessen the need for 
ZIL-mirroring...

With kind regards,

Jeroen
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-06 Thread Jeroen Roodhart
Hi Roch,

 Can  you try 4 concurrent tar to four different ZFS
 filesystems (same pool). 

Hmmm, you're on to something here:

http://www.science.uva.nl/~jeroen/zil_compared_e1000_iostat_iops_svc_t_10sec_interval.pdf

In short: when using two exported file systems total time goes down to around 
4mins (IOPS maxes out at around 5500 when adding all four vmods together). When 
using four file systems total time goes down to around 3min30s (IOPS maxing out 
at about 9500).

I figured it is either NFS or a per file system data structure in the ZFS/ZIL 
interface. To rule out NFS I tried exporting two directories using default 
NFS shares (via /etc/dfs/dfstab entries). To my surprise this seems to bypass 
the ZIL all together (dropping to 100 IOPS, which results from our RAIDZ2 
configuration). So clearly ZFS sharenfs is more than a nice front end for NFS 
configuration :).  

But back to your suggestion: You clearly had a hypothesis behind your question. 
Care to elaborate?

With kind regards,

Jeroen
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-03 Thread Jeroen Roodhart
Hi Al,

 Have you tried the DDRdrive from Christopher George
 cgeo...@ddrdrive.com?
 Looks to me like a much better fit for your application than the F20?
 
 It would not hurt to check it out.  Looks to me like
 you need a product with low *latency* - and a RAM based cache
 would be a much better performer than any solution based solely on
 flash.
 
 Let us know (on the list) how this works out for you.

Well, I did look at it but at that time there was no Solaris support yet. Right 
now it seems there is only a beta driver? I kind of remember that if you'd want 
reliable fallback to nvram, you'd need an UPS feeding the card. I could be very 
wrong there, but the product documentation isn't very clear on this (at least 
to me ;) ) 

Also, we'd kind of like to have a SnOracle supported option. 

But yeah, on paper it does seem it could be an attractive solution...

With kind regards,

Jeroen
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Jeroen Roodhart
Hi Casper,

 :-)

Leuk te zien dat je straal nog steeds even ver komt :-)

I'm happy to see that it is now the default and I hope this will cause the
Linux NFS client implementation to be faster for conforming NFS servers.

Interesting thing is that apparently defaults on Solaris an Linux are chosen 
such that one can't signal the desired behaviour to the other. At least we 
didn't manage to get a Linux client to asynchronously mount a Solaris (ZFS 
backed) NFS export...

Anyway we seem to be getting of topic here :-)

The thread was started to get insight in behaviour of the F20 as ZIL. _My_ 
particular interest would be to be able to answer why perfomance doesn't seem 
to scale up when adding vmod-s... 

With kind regards,

Jeroen
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Jeroen Roodhart
 It doesn't have to be F20.  You could use the Intel
 X25 for example.  

The mlc-based disks are bound to be too slow (we tested with an OCZ Vertex 
Turbo). So you're stuck with the X25-E (which Sun stopped supporting for some 
reason). I believe most normal SSDs do have some sort of cache and usually no 
supercap or other backup power solution. So be wary of that.

Having said all this, the new Sandforce based SSDs look promising...
 
If you're running solaris proper, you better mirror your
 ZIL log device.  

Absolutely true, I forgot this 'cause we're running OSOL nv130... (we 
constantly seem to need features that haven't landed in Solaris proper :) ).

 If you're running opensolaris ... I don't know if that's
 important. 

At least I can confirm ability of adding and removing ZIL devices on the fly 
with OSOL of a sufficiently recent build. 

 I'll  probably test it, just to be sure, but I might never
 get around to it
 because I don't have a justifiable business reason to
 build the opensolaris
 machine just for this one little test.

I plan to get test this as well, won't be until late next week though.

With kind regards,

Jeroen

Message was edited by: tuxwield
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-03-31 Thread Jeroen Roodhart
Oh, one more comment. If you don't mirror your ZIL, and your unmirrored SSD
goes bad, you lose your whole pool. Or at least suffer data corruption.

Hmmm, I thought that in that case ZFS reverts to the regular on disks ZIL?

With kind regards,

Jeroen
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-03-31 Thread Jeroen Roodhart
The write cache is _not_ being disabled. The write cache is being marked
as non-volatile.

Of course you're right :) Please filter my postings with a sed 's/write 
cache/write cache flush/g' ;)

BTW, why is a Sun/Oracle branded product not properly respecting the NV
bit in the cache flush command? This seems remarkably broken, and leads
to the amazingly bad advice given on the wiki referenced above.

I suspect it has something to do with emulating disk semantics over PCIE. 
Anyway, this did get us stumped in the beginning, performance wasn't better 
than when using an OCZ Vertex Turbo ;) 

By the way, the URL to the reference is part of the official F20 product 
documentation (that's how we found it in the first place)...

With kind regards,

Jeroen
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-03-31 Thread Jeroen Roodhart
Hi Karsten,

 But is this mode of operation *really* safe?

As far as I can tell it is. 

-The F20 uses some form of power backup that should provide power to the 
interface card long enough to get the cache onto solid state in case of power 
failure. 

-Recollecting from earlier threads here; in case the card fails (but not the 
host), there should be enough data residing in memory for ZFS to safely switch 
to the regular on disk ZIL.

-According to my contacts at Sun, the F20 is a viable replacement solution for 
the X25-E. 

-Switching write caching off seems to be officially recommended on the Sun 
performance wiki
 (translated to more sane defaults).

If I'm wrong here I'd like to know too, 'cause this is probably the way we're 
taking it in production.
 :)

With kind regards,

Jeroen
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-03-31 Thread Jeroen Roodhart
Hi Richard,

For this case, what is the average latency to the F20?

I'm not giving the average since I only performed a single run here (still need 
to get autopilot set up :) ). However here is a graph of iostat IOPS/svc_t 
sampled in 10sec intervals during a run of untarring an eclipse tarbal 40 times 
from two hosts. I'm using 1 vmod here.

http://www.science.uva.nl/~jeroen/zil_1slog_e1000_iostat_iops_svc_t_10sec_interval.pdf

Maximum svc_t is around 2.7ms averaged over 10s.

Still wondering why this won't scale out though. We don't seem to be CPU bound, 
unless ZFS limits itself to max 30% cputime?

With kind regards,

Jeroen
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-03-30 Thread Jeroen Roodhart
If you are going to trick the system into thinking a volatile cache is 
nonvolatile, you
might as well disable the ZIL -- the data corruption potential is the same.

I'm sorry? I believe the F20 has a supercap or the like? The advise on:

http://wikis.sun.com/display/Performance/Tuning+ZFS+for+the+F5100#TuningZFSfortheF5100-ZFSF5100

Is to disable write caching altogether. We opted not to do _that_ though... :)

Are you sure about disabling write cache on the F20 is a bad thing to do?

With kind regards,

Jeroen
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Benchmarks results for ZFS + NFS, using SSD's as slog devices (ZIL)

2009-12-25 Thread Jeroen Roodhart
-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160

Hi Freddie, list,

 Option 4 is to re-do your pool, using fewer disks per raidz2 vdev,
 giving more vdevs to the pool, and thus increasing the IOps for the
 whole pool.

 14 disks in a single raidz2 vdev is going to give horrible IO,
 regardless of how fast the individual disks are.

 Redoing it with 6-disk raidz2 vdevs, or even 8-drive raidz2 vdevs
 will give you much better throughput.

We are aware of the configuration being possibly suboptimal. However,
before we had the SSDs, we did test earlier with 6x7 Z2 and even 2way
mirrorset setups. These gave better IOPS but not significantly enough
improvement (I would expect roughly a bit more than double the
performance in 14x3 vs 6x7) .  In the end it is indeed a choice
between performance, space and security.  Our hope is that the SSD
slogs serialise the  data flow  enough  to make this work. But you
have a fair point and we will also look into the combination of SSDs
and pool-configurations.

Also, possibly the Vortex Turbo SSDs aren't as good latency wise as I
expected. Maybe the Sun SSDs will do a lot better. We will find this
out when they arrive (due somewhere in february).

With kind regards,

Jeroen

- --
Jeroen Roodhart  
IT Consultant
 University of Amsterdam 
j.r.roodh...@uva.nl  Informatiseringscentrum   
Tel. 020 525 7203
- --
See http://www.science.uva.nl/~jeroen for openPGP public key
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iD8DBQFLNURP37AP1zFtDU0RA0QqAKDg/hr68JPjLvc0gaOmCe4RxPXY3QCg1G+g
e6BTEKqq6QxpePonnn54fOo=
=JQcb
-END PGP SIGNATURE-

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Benchmarks results for ZFS + NFS, using SSD's as slog devices (ZIL)

2009-12-24 Thread Jeroen Roodhart
-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160

Mattias Pantzare wrote:

 The ZIL is _not_ optional as the log is in UFS.

Right, thanks (also to Richard and Daniel) for the explanation. I was
afraid this was to good to be true, nice to see it stated this clearly
though.

That  would leave us with three options;

1) Deal with it and accept performance as it is.
2) Find a way to speed things up further for this workload
3) Stop trying to use ZFS for this workload

Option 1 is not going to be feasible, so we're left with 2 and 3.

We will have to do some more benchmarks in the new year. Maybe if
trying different NFS wsize-s  results in different figures. Also we'll
look at UFS on the Thor, although I am not looking forward to handle
large amounts of data on anything other than ZFS. Spoiled for life
probably :)

In the mean time, if any of you would have time to look at our iozone
data and spotted glaring mistakes, we would definitely appreciate your
comments.

Thanks for your help,

With kind regards,

Jeroen

- --
Jeroen Roodhart   
IT Consultant
 University of Amsterdam  
j.r.roodh...@uva.nl  Informatiseringscentrum
Tel. 020 525 7203
- --
See http://www.science.uva.nl/~jeroen for openPGP public key
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iD4DBQFLMynd37AP1zFtDU0RA9dNAJdhfwGH7Dj3cXBCX3MS/zTaV/c+AKCozNJn
kmxtdS9Vu/sM/icXTE0hsA==
=fP4d
-END PGP SIGNATURE-

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Benchmarks results for ZFS + NFS, using SSD's as slog devices (ZIL)

2009-12-24 Thread Jeroen Roodhart
-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160

Jeroen Roodhart wrote:

 Questions: 1. Client wsize?

 We usually set these to 342768 but this was tested with CenOS
 defaults: 8192 (were doing this over NFSv3)

Is stand corrected here. Looking at proc/mounts I see we are in fact
using different values:

...
10.0.0.1:/mypool/test_FS /nfs nfs
rw,vers=3,rsize=1048576,wsize=1048576,hard,proto=tcp,timeo=600,retrans=2,sec=sys,addr=10.0.0.1
0 0

So wsize was 1048576 during the iozone tests. That'll teach me to rely
on manuals :) So
repeating these tests with different wsizes seems to be a smart thing
to do.

With kind regards,

Jeroen

- --
Jeroen Roodhart  
IT Consultant
 University of Amsterdam 
j.r.roodh...@uva.nl  Informatiseringscentrum   
Tel. 020 525 7203
- --
See http://www.science.uva.nl/~jeroen for openPGP public key
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iD8DBQFLMy0737AP1zFtDU0RA5OmAKDtt7LMRu/a36PBa+Fg5vL2pDKdGACdFIkU
fWkmBPJvNwoQpm2A4Y3SorA=
=gaX7
-END PGP SIGNATURE-

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Benchmarks results for ZFS + NFS, using SSD's as slog devices (ZIL)

2009-12-24 Thread Jeroen Roodhart
-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160

Hi Richard,

Richard Elling wrote:
 How about posting the data somewhere we can see it?

As stated in an earlier posting it should be accessible at:

http://init.science.uva.nl/~jeroen/solaris11_iozone_nfs2zfs

Happy holidays!

~Jeroen

- --
Jeroen Roodhart   
IT Consultant
 University of Amsterdam  
j.r.roodh...@uva.nl  Informatiseringscentrum
Tel. 020 525 7203
- --
See http://www.science.uva.nl/~jeroen for openPGP public key
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iD8DBQFLM9Pf37AP1zFtDU0RA0ZxAJ9IJmn8fc6sBWi2KG7sVlL8RLiZ0wCbBcbL
4+5IT9mxhgIqRm+j5Mx9Kqk=
=nKUC
-END PGP SIGNATURE-

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Benchmarks results for ZFS + NFS, using SSD's as slog devices (ZIL)

2009-12-23 Thread Jeroen Roodhart
-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160

Hi Richard, ZFS-discuss.

 Message: 2
 Date: Wed, 23 Dec 2009 09:49:18 -0800
 From: Richard Elling richard.ell...@gmail.com
 To: Auke Folkerts folke...@science.uva.nl
 Cc: zfs-discuss@opensolaris.org
 Subject: Re: [zfs-discuss] Benchmarks results for ZFS + NFS,using
 SSD's as slog devices (ZIL)
 Message-ID: 40070921-f894-4146-9e4c-7570d52c8...@gmail.com
 Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes

 Some questions below...

 On Dec 23, 2009, at 8:27 AM, Auke Folkerts wrote:



Filling in for Auke here,

  The raw data as well as the graphs that I created are available on
  request, should people be interested.

 Yes, can you post somewhere?

I've put the results here, tests are run under nv129:

http://www.science.uva.nl/~jeroen/solaris11_iozone_nfs2zfs

Original measurements (with iozone headers) are in:

http://www.science.uva.nl/~jeroen/solaris11_iozone_nfs2zfs/originals/



 Questions:
 1. Client wsize?

We usually set these to 342768 but this was tested with CenOS
defaults: 8192 (were doing this over NFSv3)
 2. Client NFS version?

NFSv3 (earlier tests show about 15% improvement using v4, but we still
use v3 in production).


 3. logbias settings?

Throughput for runs labeled throughput otherwise defaults.


 4. Did you test with a Solaris NFS client?  If not, why not?

We didn't, because our production environment consists of Solaris
servers and Linux/MS Windows clients.


 UFS is a totally different issue, sync writes are always sync'ed.

 I don't work for Sun, but it would be unusual for a company to accept
 willful negligence as a policy.  Ambulance chasing lawyers love that
 kind of thing.

The Thor replaces a geriatric Enterprise system running Solaris 8 over
UFS. For these workloads it beat the pants out of our current setup
and somehow the but you're safer now argument doesn't go over very
well :)

We are under the impression that a setup that server NFS over UFS has
the same assurance level than a setup using ZFS without ZIL. Is this
impression false?

If it isn't then offering a tradeoff between same assurance level as
you are used to with better performance or better assurance level
but for random-IO significant performance hits doesn't seem too wrong
to me. In the first case you still have the ZFS guarantees once data
is on disk...

Thanks in advance for your insights,

With kind regards,

Jeroen

- --
Jeroen Roodhart
IT Consultant
 University of Amsterdam   
j.r.roodh...@uva.nl  Informatiseringscentrum 
Tel. 020 525 7203
- --
See http://www.science.uva.nl/~jeroen for openPGP public key
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iD8DBQFLMqKT37AP1zFtDU0RAxeCAKDcglo2n0Q8Sx0tGyzx+MEGJt90TwCfWm59
JbGdTavhenqSrQEtGUvPZaw=
=K25S
-END PGP SIGNATURE-

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-11-19 Thread Jeroen Roodhart
How did your migration to ESXi go? Are you using it on the same hardware or 
did you just switch that server to an NFS server and run the VMs on another 
box?

The latter, we run these VMs over NFS anyway and had ESXi boxes under test 
already. we were already separating data exports from VM exports. We use an 
in-house developed configuration management/bare metal system which allows us 
to install new machines pretty easily. In this case we just provisioned the 
ESXi VMs to  new VM exports on the Thor whilst re-using the data-exports as 
they were...

Works pretty well, although the Sun x1027A 10G NICs aren't yet supported under 
ESXi 4...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-11-12 Thread Jeroen Roodhart
 I'm running nv126 XvM right now. I haven't tried it
 without XvM.

Without XvM we do not see these issues. We're running the VMs through NFS now 
(using ESXi)...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-11-03 Thread Jeroen Roodhart
We see the same issue on a x4540 Thor system with 500G disks:

lots of:
...
Nov  3 16:41:46 uva.nl scsi: [ID 107833 kern.warning] WARNING: 
/p...@3c,0/pci10de,3...@f/pci1000,1...@0 (mpt5):
Nov  3 16:41:46 encore.science.uva.nl   Disconnected command timeout for Target 
7
...

This system is running nv125 XvM. Seems to occur more when we are using vm-s. 
This of course causes very long interruptions on the vm-s as well...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss