Re: [zfs-discuss] SPARC SATA, please.

2009-06-24 Thread Justin Stringfellow

Richard Elling wrote:

Miles Nordin wrote:

ave == Andre van Eyssen an...@purplecow.org writes:
et == Erik Trimble erik.trim...@sun.com writes:
ea == Erik Ableson eable...@mac.com writes:
edm == Eric D. Mudama edmud...@bounceswoosh.org writes:



   ave The LSI SAS controllers with SATA ports work nicely with
   ave SPARC.

I think what you mean is ``some LSI SAS controllers work nicely with
SPARC''.  It would help if you tell exactly which one you're using.

I thought the LSI 1068 do not work with SPARC (mfi driver, x86 only).
  


Sun has been using the LSI 1068[E] and its cousin, 1064[E] in
SPARC machines for many years.  In fact, I can't think of a
SPARC machine in the current product line that does not use
either 1068 or 1064 (I'm sure someone will correct me, though ;-)
-- richard


Might be worth having a look at the T1000 to see what's in there. We used to 
ship those with SATA drives in.

cheers,
--justin
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Purpose of zfs_acl_split_ace

2009-06-24 Thread Nils Goroll

Hi,

in nfs-discuss, Andrwe Watkins has brought up the question, why an inheritable 
ACE is split into two ACEs when a descendant directory is created.


Ref: 
http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/zfs_acl.c#1506


I must admit that I had observed this behavior many times, but never asked 
myself why ACE inheritance is implemented like this.


The best explanation I can come up with is that chmod calls on the mode bits 
should not change inheritable ACEs, and splitting inheritable (non inherit-only) 
ACEs is an easy way to achieve this.


Does this interpretation match the original intention, or are there any other or 
better reasons? It there a reason why inheritable ACEs are split always, even if 
the particular chmod call would not require splitting them?


Thank you, Nils
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Increase size of ZFS mirror

2009-06-24 Thread Ben
Hi all,

I have a ZFS mirror of two 500GB disks, I'd like to up these to 1TB disks, how 
can I do this?  I must break the mirror as I don't have enough controller on my 
system board.  My current mirror looks like this:

[b]r...@beleg-ia:/share/media# zpool status share
pool: share
state: ONLINE
scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
share   ONLINE   0 0 0
mirrorONLINE   0 0 0
c5d0s0  ONLINE   0 0 0
c5d1s0  ONLINE   0 0 0

errors: No known data errors[/b]

If I detach c5d1s0, add a 1TB drive, attach that, wait for it to resilver, then 
detach c5d0s0 and add another 1TB drive and attach that to the zpool, will that 
up the storage of the pool?

Thanks very much,
Ben
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is the PROPERTY compression will increase the ZFS I/O throughput?

2009-06-24 Thread Chookiex
Thank you for your reply.
I had read the blog. The most interesting thing is WHY is there no performance 
improve when it set any compression?
The compressed read I/O is less than uncompressed data,  and decompress is 
faster than compress.
so if lzjb write is better than non-compressed, the lzjb read would be better 
than write?

Is the ARC or L2ARC do any tricks?

Thanks




From: David Pacheco david.pach...@sun.com
To: Chookiex hexcoo...@yahoo.com
Cc: zfs-discuss@opensolaris.org
Sent: Wednesday, June 24, 2009 4:53:37 AM
Subject: Re: [zfs-discuss] Is the PROPERTY compression will increase the ZFS 
I/O throughput?

Chookiex wrote:
 Hi all.
 
 Because the property compression could decrease the file size, and the file 
 IO will be decreased also.
 So, would it increase the ZFS I/O throughput with compression?
 
 for example:
 I turn on gzip-9,on a server with 2*4core Xeon, 8GB RAM.
 It could compress my files with compressratio 2.5x+. could it be?
 or I turn on lzjb, about 1.5x with the same files..

It's possible, but it depends on a lot of factors, including what your 
bottleneck is to begin with, how compressible your data is, and how hard you 
want the system to work compressing it. With gzip-9, I'd be shocked if you saw 
bandwidth improved. It seems more common with lzjb:

http://blogs.sun.com/dap/entry/zfs_compression

(skip down to the results)

-- Dave

 
 could it be? Is there anyone have a idea?
 
 thanks 
 
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


-- David Pacheco, Sun Microsystems Fishworks.    http://blogs.sun.com/dap/



  ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Increase size of ZFS mirror

2009-06-24 Thread dick hoogendijk
On Wed, 24 Jun 2009 03:14:52 PDT
Ben no-re...@opensolaris.org wrote:

 If I detach c5d1s0, add a 1TB drive, attach that, wait for it to
 resilver, then detach c5d0s0 and add another 1TB drive and attach
 that to the zpool, will that up the storage of the pool?

That will do the trick perfectly. I just did the same last week ;-)

-- 
Dick Hoogendijk -- PGP/GnuPG key: 01D2433D
+ http://nagual.nl/ | nevada / OpenSolaris 2009.06 release
+ All that's really worth doing is what we do for others (Lewis Carrol)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpoll status -x output

2009-06-24 Thread Tomasz Kłoczko
Hi,

In company where I'm working we are using zpool status -x command output
in monitoring scripts for check health all ZFS pools. Everything is OK
except few systems where zpool status -x is exactly the same as zpool
status. I'm not sure but looks like this behavior is not OS version
specific (I observe this on one latest OpenSolaris build but also on some
previous on on two boxes with Solarises 10).
I found that in all these cases in command output I see some additional
notes like:

status: The pool is formatted using an older on-disk format.  The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on older software versions.

If zpoll status -x output is not single line message all pools are
healthy by using old on disk format IMO this behavior is inccorrect
because in zpool(1M) description for -x option I see:

 -x   Only display  status  for  pools  that  are
  exhibiting errors or are otherwise unavail-
  able.

In this case there is no errors in pools and all resources still are
available.

Comments? Should I open case for this?

Tomasz


--
Wydział Zarządzania i Ekonomii 
Politechnika Gdańska
http://www.zie.pg.gda.pl/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Turn off the time slider on some zpools

2009-06-24 Thread Mykola Maslov



How to turn off the timeslider snapshots on certain file systems?


http://wikis.sun.com/display/OpenSolarisInfo/How+to+Manage+the+Automatic+ZFS+Snapshot+Service


Thank you, very handy stuff!

BTW - will zfs automatically delete snapshots, when I`ll go low on disk space?


--
With respect,
Nik Maslov
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Migration: 1 x 160GB IDE boot drive --- 2 x 30GB SATA SSDs

2009-06-24 Thread Simon Breden
Hi,

I have OpenSolaris 2009.06 currently installed on a 160 GB IDE drive.
I want to replace this with a 2-way mirror 30 GB SATA SSD boot setup.

I found these 2 threads which seem to answer some questions I had, but I still 
have some questions.
http://opensolaris.org/jive/thread.jspa?messageID=386577
http://opensolaris.org/jive/thread.jspa?threadID=104656

FIRST QUESTION:
Although, it seems possible to add a drive to form a mirror for the ZFS boot 
pool 'rpool', the main problem I see is that in my case, I would be attempting 
to form a mirror using a smaller drive (30GB) than the initial 160GB drive.
Is there an easy solution to this problem, or would it be simpler to just do a 
reinstall of OpenSolaris 2009.06 onto 2 brand new 30GB SSDs? I have the option 
of the fresh install, as I haven't invested much time in configuring this 
OS2009.06 boot environment yet.

SECOND QUESTION:
I also want the possibility to have multiple boot environments within 
OpenSolaris 2009.06 to allow easy rollback to a working boot environment in 
case of an IPS update problem. I presume this will not cause any additional 
complications?


THIRD QUESTION:
This is for a home fileserver so I don't want to spend too much, but does 
anyone see any problem with having the OS installed on MLC SSDs, which are 
cheaper than SLC SSDs. I'm thinking here specifically about wearing out the SSD 
if the OS does too many writes to the SSDs.

I agree SSDs are a bit overkill, and using standard spinning metal would be 
cheaper, but the case is vibrating like crazy as I ran out of drive slots and 
had to use non-grommeted attachments for the boot drive. But the SSDs should be 
silent and should certainly speed up boot and shutdown times dramatically :)

Thanks,
Simon

http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Increase size of ZFS mirror

2009-06-24 Thread Thomas Maier-Komor
dick hoogendijk schrieb:
 On Wed, 24 Jun 2009 03:14:52 PDT
 Ben no-re...@opensolaris.org wrote:
 
 If I detach c5d1s0, add a 1TB drive, attach that, wait for it to
 resilver, then detach c5d0s0 and add another 1TB drive and attach
 that to the zpool, will that up the storage of the pool?
 
 That will do the trick perfectly. I just did the same last week ;-)
 

Doesn't detaching render the detach disk command the detached disk as a
disk unassociated with a pool? I think it might be better to import the
pool with only one half of the mirror without detaching the disk, and
the do a zpool replace. In this case if something goes wrong during
resilver, you still have the other half of the mirror to bring your pool
back up again. If you detach the disk upfront this won't be possible.

Just an idea...

- Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Increase size of ZFS mirror

2009-06-24 Thread Ben
Thomas, 

Could you post an example of what you mean (ie commands in the order to use 
them)?  I've not played with ZFS that much and I don't want to muck my system 
up (I have data backed up, but am more concerned about getting myself in a mess 
and having to reinstall, thus losing my configurations).

Many thanks for both of your replies,
Ben
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS for iSCSI based SAN

2009-06-24 Thread Philippe Schwarz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,
i'm getting involved in a pre-production test and want to be sure of the
 means i'll have to use.

Take 2 SunFire x4150  1 3750 Gb Cisco Switche

1 private VLAN on the Gb ports of the SW.

1 x4150 is going to be ESX4 aka VSphere Server ( 1 Hardware mirror of
146G  32G Ram)
Booting ESX 4 on local disk.


The other is going to be used as a poor-man-SAN :
8X146G SAS 15k
8Go Ram
Solaris 10

2 first disks Hardware mirror of 146Go with Sol10  UFS filesystem on it.
The next 6 others will be used as a raidz2 ZFS volume of 535G,
compression and shareiscsi=on.
I'm going to CHAP protect it soon...


I'm going to put two zfs slices on it:
zfs create -V 250G SAN/ESX1
zfs create -V 250G SAN/ESX2

And using it for VMFS.

Oh, by the way, i've no VMotion plugin.

In my tests ESX4 seems to work fine with this, but i haven't already
stressed it ;-)

Therefore, i don't know if the 1Gb FDuplex per port will be enough, i
don't know either i'have to put sort of redundant access form ESX to
SAN,etc

Is my configuration OK ? It's only a preprod install, i'm able to break
almost everything if it's necessary.


Thanks for all yours answers.

Yours, faithfully.


- --
Cordialement.
- - Lycée Alfred Nobel,Clichy sous bois http://www.lyceenobel.org
KeyID 0x46EA1D16 FingerPrint 997B164F4F606A61E7B1FC61961A821646EA1D16

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpCHx4ACgkQlhqCFkbqHRbf/ACfbV1amZJxHfVHKDknoh2hT/5y
SpwAoJktgPqvEkFa5jHgUGXNnkv7TX99
=zH2Q
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Turn off the time slider on some zpools

2009-06-24 Thread Harry Putnam
cindy.swearin...@sun.com writes:

 Hi Harry,

 Are you attempting this change when logged in as yourself or
 as root?

my user

 The top section of this procedure describes how to add yourself
 to zfssnap role. Otherwise, if you are doing this step as a
 non-root user, it probably won't work.

my user is in role zfssnap.  And in role `root'

$ roles
postgres,root,zfssnap

I'm not sure how a user can access the gui tool without being logged
in as themselves.  Since the tool is on the dropdown menu System.

Do you mean root has to log into X?

But anyway.. the command lines shown on that page are much handier
anyway.  I just wondered if the Gui tool was working as expected.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Increase size of ZFS mirror

2009-06-24 Thread Ben
Many thanks Thomas, 

I have a test machine so I shall try it on that before I try it on my main 
system.

Thanks very much once again,
Ben
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Purpose of zfs_acl_split_ace

2009-06-24 Thread Mark Shellenbaum

Nils Goroll wrote:

Hi,

I just noticed that Mark Shellenbaum has replied to the same question in 
a thread ACL not being inherited correctly on zfs-discuss.


Sorry for the noise.

Out of curiosity, I would still be interested in answers to this question:

  It there a reason why inheritable ACEs are  split always, even if the
  particular chmod call would not require splitting them?

For instance, a mode bit change would never influence 
n...@owner/@group/@everyone ACEs and even for the 
@owner/@group/@everyone, one could check if the mode bits are actually 
changed by the chmod call.


Any group entry could have its permissions modified in some situations 
(i.e. group has greater permissions than owner).  Its true that a user 
entry wouldn't necessarily need it, but in order to keep the algorithm 
simpler we just always do the split.


It would be simple enough to exclude user entries from splitting.  Feel 
free to open a bug on this.




Does this make any sense?

Thank you,

Nils
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Increase size of ZFS mirror

2009-06-24 Thread Victor Latushkin

On 24.06.09 17:10, Thomas Maier-Komor wrote:

Ben schrieb:
Thomas, 


Could you post an example of what you mean (ie commands in the order to use 
them)?  I've not played with ZFS that much and I don't want to muck my system 
up (I have data backed up, but am more concerned about getting myself in a mess 
and having to reinstall, thus losing my configurations).

Many thanks for both of your replies,
Ben


I'm not an expert on this, and I haven't tried it, so beware:


1) If the pool you want to expand is not the root pool:

$ zpool export mypool
# now replace one of the disks with a new disk
$ zpool import mypool
# zpool status will show that mypool is in degraded state because of a
missing disk
$ zpool replace mypool replaceddisk
# now the pool will start resilvering

# Once it is done with resilvering:
$ zpool detach mypool otherdisk
#  now physically replace otherdisk
$ zpool replace mypool otherdisk


Last command would fail as there would be no longer otherdisk in mypool.

Though you can always play with files first (or with VirtualBox etc):

# preparation

mkdir -p /var/tmp/disks/removed
mkfile -n 64m /var/tmp/disks/disk0
mkfile -n 64m /var/tmp/disks/disk1
mkfile -n 128m /var/tmp/disks/bigdisk0
mkfile -n 128m /var/tmp/disks/bigdisk1
zpool create test mirror /var/tmp/disks/disk0 /var/tmp/disks/disk1
zpool list test

# let's start by making sure there's no latent errors:

zpool scrub test
while zpool status -v test | grep % ; do sleep 1; done
zpool status -v test

zpool export test
mv /var/tmp/disks/disk0 /var/tmp/disks/removed/disk0

# you don't need '-d /path' with real disks
zpool import -d /var/tmp/disks test
zpool status -v test

# insert new disk
mv /var/tmp/disks/bigdisk0 /var/tmp/disks/disk0
zpool replace test /var/tmp/disks/disk0

while zpool status -v test | grep % ; do sleep 1; done
zpool status -v test

# make sure that resilvering is complete
zpool detach test /var/tmp/disks/disk1
mv /var/tmp/disks/disk1 /var/tmp/disks/removed/disk1

# insert new disk
mv /var/tmp/disks/bigdisk1 /var/tmp/disks/disk1
zpool attach test /var/tmp/disks/disk0 /var/tmp/disks/disk1
while zpool status -v test | grep % ; do sleep 1; done
zpool status -v test
zpool list test


hth,
victor
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS for iSCSI based SAN

2009-06-24 Thread David Magda
On Wed, June 24, 2009 08:42, Philippe Schwarz wrote:

 In my tests ESX4 seems to work fine with this, but i haven't already
 stressed it ;-)

 Therefore, i don't know if the 1Gb FDuplex per port will be enough, i
 don't know either i'have to put sort of redundant access form ESX to
 SAN,etc

 Is my configuration OK ? It's only a preprod install, i'm able to break
 almost everything if it's necessary.

At least in 3.x, VMware had a limitation of only being able to use one
connection per iSCSI target (even if there were multiple LUNs on it):

http://mail.opensolaris.org/pipermail/zfs-discuss/2009-June/028731.html

Not sure if that's changed in 4.x, so if you're going to have more than
one LUN, then having more than one target may be advantageous. See also:

http://www.vmware.com/files/pdf/iSCSI_design_deploy.pdf

You may want to go to the VMware lists / forums to see what the people
there say as well.

Out of curiosity, any reason why went with iSCSI and not NFS? There seems
to be some debate on which is better under which circumstances.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpoll status -x output

2009-06-24 Thread Richard Elling

It might be easier to look for the pool status thusly
   zpool get health poolname

-- richard

Tomasz Kłoczko wrote:

Hi,

In company where I'm working we are using zpool status -x command output
in monitoring scripts for check health all ZFS pools. Everything is OK
except few systems where zpool status -x is exactly the same as zpool
status. I'm not sure but looks like this behavior is not OS version
specific (I observe this on one latest OpenSolaris build but also on some
previous on on two boxes with Solarises 10).
I found that in all these cases in command output I see some additional
notes like:

status: The pool is formatted using an older on-disk format.  The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on older software versions.

If zpoll status -x output is not single line message all pools are
healthy by using old on disk format IMO this behavior is inccorrect
because in zpool(1M) description for -x option I see:

 -x   Only display  status  for  pools  that  are
  exhibiting errors or are otherwise unavail-
  able.

In this case there is no errors in pools and all resources still are
available.

Comments? Should I open case for this?

Tomasz


--
Wydział Zarządzania i Ekonomii 
Politechnika Gdańska

http://www.zie.pg.gda.pl/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS for iSCSI based SAN

2009-06-24 Thread milosz
 2 first disks Hardware mirror of 146Go with Sol10  UFS filesystem on it.
 The next 6 others will be used as a raidz2 ZFS volume of 535G,
 compression and shareiscsi=on.
 I'm going to CHAP protect it soon...

you're not going to get the random read  write performance you need
for a vm backend out of any kind of parity raid.  just go with 3 sets
of mirrors.  unless you're ok with subpar performance (and if you
think you are, you should really reconsider).  also you might get
significant mileage out of putting an ssd in and using it for zil.

here's a good post from roch's blog about parity vs mirrored setups:

http://blogs.sun.com/roch/entry/when_to_and_not_to
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS for iSCSI based SAN

2009-06-24 Thread Scott Meilicke
See this thread for information on load testing for vmware:
http://communities.vmware.com/thread/73745?tstart=0start=0

Within the thread there are instructions for using iometer to load test your 
storage. You should test out your solution before going live, and compare what 
you get with what you need. Just because striping 3 mirrors *will* give you 
more performance than raidz2 doesn't always mean that is the best solution. 
Choose the best solution for your use case.

You should have at least two NICs per connection to storage and LAN (4 total in 
this simple example), for redundancy if nothing else. Performance wise, vsphere 
can now have multiple SW iSCSI connections to a single LUN. 

My testing showed compression increased iSCSI performance by 1.7x, so I like 
compression. But again, these are my tests in my situation. Your results may 
differ from mine.

Regarding ZIL usage, from what I have read you will only see benefits if you 
are using NFS backed storage, but that it can be significant. Remove the ZIL 
for testing to see the max benefit you could get. Don't do this in production!

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-24 Thread Ethan Erchinger
  http://opensolaris.org/jive/thread.jspa?threadID=105702tstart=0
 
 Yes, this does sound very similar.  It looks to me like data from read
 files is clogging the ARC so that there is no more room for more
 writes when ZFS periodically goes to commit unwritten data.  

I'm wondering if changing txg_time to a lower value might help.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS for iSCSI based SAN

2009-06-24 Thread Erik Ableson
Bottim line with virtual machines is that your IO will be random by  
definition since it all goes into the same pipe. If you want to be  
able to scale, go with RAID 1 vdevs. And don't skimp on the memory.


Our current experience hasn't shown a need for an SSD for the ZIL but  
it might be useful for L2ARC (using iSCSI for VMs, NFS for templates  
and iso images)


Cordialement,

Erik Ableson

+33.6.80.83.58.28
Envoyé depuis mon iPhone

On 24 juin 2009, at 18:56, milosz mew...@gmail.com wrote:

Within the thread there are instructions for using iometer to load  
test your storage. You should test out your solution before going  
live, and compare what you get with what you need. Just because  
striping 3 mirrors *will* give you more performance than raidz2  
doesn't always mean that is the best solution. Choose the best  
solution for your use case.


multiple vm disks that have any kind of load on them will bury a raidz
or raidz2.  out of a 6x raidz2 you are going to get the iops and
random seek latency of a single drive (realistically the random seek
will probably be slightly worse, actually).  how could that be
adequate for a virtual machine backend?  if you set up a raidz2 with
6x15k drives, for the majority of use cases, you are pretty much
throwing your money away.  you are going to roll your own san, buy a
bunch of 15k drives, use 2-3u of rackspace and four (or more)
switchports, and what you're getting out of it is essentially a 500gb
15k drive with a high mttdl and a really huge theoretical transfer
speed for sequential operations (which you won't be able to saturate
anyway because you're delivering over gige)?  for this particular
setup i can't really think of a situation where that would make sense.

Regarding ZIL usage, from what I have read you will only see  
benefits if you are using NFS backed storage, but that it can be  
significant.


link?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-24 Thread Bob Friesenhahn

On Wed, 24 Jun 2009, Ethan Erchinger wrote:


http://opensolaris.org/jive/thread.jspa?threadID=105702tstart=0


Yes, this does sound very similar.  It looks to me like data from read
files is clogging the ARC so that there is no more room for more
writes when ZFS periodically goes to commit unwritten data.


I'm wondering if changing txg_time to a lower value might help.


There is no doubt that having ZFS sync the written data more often 
would help.  However, it should not be necessary to tune the OS for 
such a common task as batch processing a bunch of files.


A more appropriate solution is for ZFS to notice that more than XXX 
megabytes are uncommitted, so maybe it should wake up and go write 
some data.  It is useful for ZFS to defer data writes in case the same 
file is updated many times.  In the case where the same file is 
updated many times, the total uncommitted data is still limited by the 
amount of data which is re-written and so the 30 second cycle is fine. 
In my case the amount of uncommitted data is limited by available RAM 
and how fast my application is able to produce new data to write.


The problem is very much related to how fast the data is output.  If 
the new data is created at a slower rate (output files are smaller) 
then the problem just goes away.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Narrow escape!

2009-06-24 Thread Ross
Ok, this is getting weird.  I just ran a zpool clear, and now it says:

# zpool clear zfspool
# zpool status
  pool: zfspool
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on older software versions.
 scrub: scrub completed after 6h35m with 0 errors on Wed Jun 24 02:46:58 2009
config:

NAMESTATE READ WRITE CKSUM
zfspool ONLINE   0 0 0
  raidz2ONLINE   0 0 0
c1t1d0  ONLINE   0 0 0  107G repaired
c1t2d0  ONLINE   0 0 0
c1t3d0  ONLINE   0 0 0  688K repaired
c1t4d0  ONLINE   0 0 0  774K repaired
c1t5d0  ONLINE   0 0 0

errors: No known data errors
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Increase size of ZFS mirror

2009-06-24 Thread George Wilson

Ben wrote:

Hi all,

I have a ZFS mirror of two 500GB disks, I'd like to up these to 1TB disks, how 
can I do this?  I must break the mirror as I don't have enough controller on my 
system board.  My current mirror looks like this:

[b]r...@beleg-ia:/share/media# zpool status share
pool: share
state: ONLINE
scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
share   ONLINE   0 0 0
mirrorONLINE   0 0 0
c5d0s0  ONLINE   0 0 0
c5d1s0  ONLINE   0 0 0

errors: No known data errors[/b]

If I detach c5d1s0, add a 1TB drive, attach that, wait for it to resilver, then 
detach c5d0s0 and add another 1TB drive and attach that to the zpool, will that 
up the storage of the pool?

Thanks very much,
Ben
  


The following changes, which went into snv_116, change this behavior:

PSARC 2008/353 zpool autoexpand property
6475340 when lun expands, zfs should expand too
6563887 in-place replacement allows for smaller devices
6606879 should be able to grow pool without a reboot or export/import
6844090 zfs should be able to mirror to a smaller disk

With this change we introduced a new property ('autoexpand') which you must 
enable if you want devices to automatically grow (this includes replacing them 
with larger ones). You can alternatively use the '-e' (expand) option to 'zpool 
online' to grow individual drives even if 'autoexpand' is disabled. The reason 
we made this change was so that all device expansion would be managed in the 
same way. I'll try to blog about this soon but for now be aware that post 
snv_116 the typical method of growing pools by replacing devices will require 
at least one additional step.

Thanks,
George


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is the PROPERTY compression will increase the ZFS I/O throughput?

2009-06-24 Thread David Pacheco

Chookiex wrote:

Thank you for your reply.
I had read the blog. The most interesting thing is WHY is there no 
performance improve when it set any compression?


There are many potential reasons, so I'd first try to identify what your 
current bandwidth limiter is. If you're running out of CPU on your 
current workload, for example, adding compression is not going to help 
performance. If this is over a network, you could be saturating the 
link. Or you might not have enough threads to drive the system to bandwidth.


Compression will only help performance if you've got plenty of CPU and 
other resources but you're out of disk bandwidth. But even if that's the 
case, it's possible that compression doesn't save enough space that you 
actually decrease the number of disk I/Os that need to be done.


The compressed read I/O is less than uncompressed data,  and decompress 
is faster than compress.


Out of curiosity, what's the compression ratio?

-- Dave

so if lzjb write is better than non-compressed, the lzjb read would be 
better than write?
 
Is the ARC or L2ARC do any tricks?
 
Thanks



*From:* David Pacheco david.pach...@sun.com
*To:* Chookiex hexcoo...@yahoo.com
*Cc:* zfs-discuss@opensolaris.org
*Sent:* Wednesday, June 24, 2009 4:53:37 AM
*Subject:* Re: [zfs-discuss] Is the PROPERTY compression will increase 
the ZFS I/O throughput?


Chookiex wrote:
  Hi all.
 
  Because the property compression could decrease the file size, and 
the file IO will be decreased also.

  So, would it increase the ZFS I/O throughput with compression?
 
  for example:
  I turn on gzip-9,on a server with 2*4core Xeon, 8GB RAM.
  It could compress my files with compressratio 2.5x+. could it be?
  or I turn on lzjb, about 1.5x with the same files.

It's possible, but it depends on a lot of factors, including what your 
bottleneck is to begin with, how compressible your data is, and how hard 
you want the system to work compressing it. With gzip-9, I'd be shocked 
if you saw bandwidth improved. It seems more common with lzjb:


http://blogs.sun.com/dap/entry/zfs_compression

(skip down to the results)

-- Dave

 
  could it be? Is there anyone have a idea?
 
  thanks
 
  
 
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


-- David Pacheco, Sun Microsystems Fishworks.http://blogs.sun.com/dap/




--
David Pacheco, Sun Microsystems Fishworks. http://blogs.sun.com/dap/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SPARC SATA, please.

2009-06-24 Thread Jacob Ritorto
	I think this is the board that shipped in the original T2000 machines 
before they began putting the sas/sata onboard:  LSISAS3080X-R


Can anyone verify this?



Justin Stringfellow wrote:

Richard Elling wrote:

Miles Nordin wrote:

ave == Andre van Eyssen an...@purplecow.org writes:
et == Erik Trimble erik.trim...@sun.com writes:
ea == Erik Ableson eable...@mac.com writes:
edm == Eric D. Mudama edmud...@bounceswoosh.org writes:



   ave The LSI SAS controllers with SATA ports work nicely with
   ave SPARC.

I think what you mean is ``some LSI SAS controllers work nicely with
SPARC''.  It would help if you tell exactly which one you're using.

I thought the LSI 1068 do not work with SPARC (mfi driver, x86 only).
  


Sun has been using the LSI 1068[E] and its cousin, 1064[E] in
SPARC machines for many years.  In fact, I can't think of a
SPARC machine in the current product line that does not use
either 1068 or 1064 (I'm sure someone will correct me, though ;-)
-- richard


Might be worth having a look at the T1000 to see what's in there. We 
used to ship those with SATA drives in.


cheers,
--justin
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on 32 bit?

2009-06-24 Thread roland
Dennis is correct in that there are significant areas where 32-bit
systems will remain the norm for some time to come. 

think of that hundreds of thousands of VMWare ESX/Workstation/Player/Server 
installations on non VT capable cpu`s - even if the cpu has 64bit capability, a 
VM cannot run in 64bit mode the cpu is missing VT support. And VT isn`t 
available for so long, and still there are even recent CPUs which don`t have VT 
support
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SPARC SATA, please.

2009-06-24 Thread Miles Nordin
 jr == Jacob Ritorto jacob.rito...@gmail.com writes:

jr I think this is the board that shipped in the original
jr T2000 machines before they began putting the sas/sata onboard:
jr LSISAS3080X-R

jr Can anyone verify this?

can't verify but FWIW i fucked it up:

 I thought the LSI 1068 do not work with SPARC (mfi driver,
 x86 only).

 ^ me.  this is wrong.

mega_sas, the open source driver for 1078/PERC, is x86-only.  
  http://mail.opensolaris.org/pipermail/zfs-discuss/2009-March/027338.html

and mpt is the 1068 driver, proprietary, works on x86 and SPARC.

mfi is some other (abandoned?) random third-party open-source driver
for some of these cards that no one's mentioned using yet, at 
  https://svn.itee.uq.edu.au/repo/mfi/

then there is also itmpt, the third-party-downloadable closed-source
driver from LSI Logic, dunno much about it but someone here used it.

sorry.

There's also been talk of two tools, MegaCli and lsiutil, which are
both binary only and exist for both Linux and Solaris, and I think are
used only with the 1078 cards but maybe not.


pgpwVYyC49Hzf.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Increase size of ZFS mirror

2009-06-24 Thread Scott Lawson



Thomas Maier-Komor wrote:

Ben schrieb:
  
Thomas, 


Could you post an example of what you mean (ie commands in the order to use 
them)?  I've not played with ZFS that much and I don't want to muck my system 
up (I have data backed up, but am more concerned about getting myself in a mess 
and having to reinstall, thus losing my configurations).

Many thanks for both of your replies,
Ben



I'm not an expert on this, and I haven't tried it, so beware:


1) If the pool you want to expand is not the root pool:

$ zpool export mypool
# now replace one of the disks with a new disk
$ zpool import mypool
# zpool status will show that mypool is in degraded state because of a
missing disk
$ zpool replace mypool replaceddisk
# now the pool will start resilvering

# Once it is done with resilvering:
$ zpool detach mypool otherdisk
#  now physically replace otherdisk
$ zpool replace mypool otherdisk

  
This will all work well. But I have a couple of suggestions for you as 
well.


If you are using mirrored vdevs then you can also grow the vdev by 
making it a 3 or
a 4 way mirror. This way you don't lose your resiliency in your vdev 
whilst you are migrating
to larger disks.  Now of course you have to be able to take the extra 
device in your system
either via a spare drive bay in a storage enclosure or  SAN or iSCSI 
based LUNS.


When you have a lot of data and the business requires you to minimize 
any risk as much
as possible this is a good idea. The pool was only offline for 14 
seconds to gain the extra

space and at all times there were *always* two devices in my mirror vdev.

Here is a cut and paste from  this process from just the other day with 
a live production server where
the maintenance window was only 5 minutes. This pool was increased from 
300 to 500 GB on LUNS

from two disparate datacentres.

2009-06-17.13:57:05 zpool attach blackboard 
c4t600C0FF00924686710D4CF02d0 c4t600C0FF00082CA2312B99E05d0


2009-06-17.18:12:14 zpool detach blackboard 
c4t600C0FF00080797CC7A87F02d0


2009-06-17.18:12:57 zpool attach blackboard 
c4t600C0FF00924686710D4CF02d0 c4t600C0FF00086136F22B65F05d0


2009-06-17.20:02:00 zpool detach blackboard 
c4t600C0FF00924686710D4CF02d0


2009-06-18.05:58:52 zpool export blackboard

2009-06-18.05:59:06 zpool import blackboard

For home users this is probably overkill, but I thought I would mention 
it for more enterprise type

people that are maybe familiar with disksuiite and not ZFS as much.


2) if you are working on the root pool, just skip export/import part and
boot with only one half of the mirror. Don't forget to run installgrub
after replacing a disk.

HTH,
Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  


--
___


Scott Lawson
Systems Architect
Manukau Institute of Technology
Information Communication Technology Services Private Bag 94006 Manukau
City Auckland New Zealand

Phone  : +64 09 968 7611
Fax: +64 09 968 7641
Mobile : +64 27 568 7611

mailto:sc...@manukau.ac.nz

http://www.manukau.ac.nz




perl -e 'print
$i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);'

 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best controller card for 8 SATA drives ?

2009-06-24 Thread Orvar Korvar
Hey sbreden! :o)

No, I havent tried to tinker with my drives. They have been functioning all the 
time. I suspect (can not remember) that each SATA slot in the card has a number 
attached to it? Can anyone confirm this? If I am right, OpenSolaris will say 
something about disc 6 is broken and on the card there is a number 6? Then 
you can identify the disc?

I thought of exchanging my PCI card with a PCIe card variant instead to reach 
higher speeds. PCI-X is legacy. The problem with PCIe cards is that soon SSD 
drives will be common. A ZFS raid with SSD would need maybe PCIe x 16 or so, to 
reach max band width. The PCIe cards are all PCIe x 4 or something of today. I 
need a PCIe x 16 card to make it future proof for the SSD discs. Maybe the best 
bet would be to attach a SSD disc directly to a PCIe slot, to reach max 
transfer speed? Or wait for SATA 3? I dont know. I want to wait until SSD raids 
are tested out. Then I will buy an apropriate card capable of SSD raids. Maybe 
SSD discs should never be used in conjunction with a card, and always connect 
directly to the SATA port? Until I know more on this, my PCI card will be fine. 
150MB/sec is ok for my personal needs. 

(My ZFS raid is connected to my Desktop PC.  I dont have a server that is on 
24/7 using power. I want to save power. Save the earth! :o)  All my 5 ZFS raid 
discs are connected to one Molex. That molex has a power switch. So I just turn 
on the ZFS raid and copy all files I need to my system disc (which is 500GB) 
and then immediately reboot and turn off the ZFS raid. This way I only have one 
disc active, which I use as a cache. When my data are ready, I copy them to the 
ZFS raid and then shut down the power to the ZFS raid discs.)





However I have a question. Which speed will I get with this solution. I have 2 
SSD discs in a PCI slot = 150MB/sec. Now I add 1 SSD disc into a SATA slot and 
another SSD disc into another SATA slot. Then I have
5 disc in PCI = 150MB/sec
1 disc in SATA = 300MB/sec (I assume SATA reach 300MB/sec?)
1 disc in SATA = 300MB/sec.

I connect all the 7 discs into one ZFS raid. Which speed will I get? Will I get 
150 + 300 + 300MB/sec? Or will the PCI slot strangle the SATA ports? Or will 
the fastest speed win and I will only get 300MB/sec?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-24 Thread Ross
Wouldn't it make sense for the timing technique to be used if the data is 
coming in at a rate slower than the underlying disk storage?

But then if the data starts to come at a faster rate, ZFS needs to start 
streaming to disk as quickly as it can, and instead of re-ordering writes in 
blocks, it should just do the best it can with whatever is currently in memory. 
 And when that mode activates, inbound data should be throttled to match the 
current throughput to disk.

That preserves the efficient write ordering that ZFS was originally designed 
for, but means a more graceful degradation under load, with the system tending 
towards a steady state of throughput that matches what you would expect from 
other filesystems on those physical disks.

Of course, I have no idea how difficult this is technically.  But the idea 
seems reasonable to me.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-24 Thread Ian Collins

Bob Friesenhahn wrote:

On Wed, 24 Jun 2009, Marcelo Leal wrote:


Hello Bob,
I think that is related with my post about zio_taskq_threads and TXG 
sync :

( http://www.opensolaris.org/jive/thread.jspa?threadID=105703tstart=0 )
Roch did say that this is on top of the performance problems, and in 
the same email i did talk about the change from 5s to 30s, what i 
think makes this problem worst, if this txg sync interval be fixed.


The problem is that basing disk writes on a simple timeout and 
available memory does not work.  It is easy for an application to 
write considerable amounts of new data in 30 seconds, or even 5 
seconds.  If the application blocks while the data is being comitted, 
then the application is not performing any useful function during that 
time.


Current ZFS write behavior make it not very useful for the creative 
media industries even though otherwise it should be a perfect fit 
since hundreds of terrabytes of working disk (or even petabytes) are 
normal for this industry.  For example, when data is captured to disk 
from film via a datacine (real time = 24 files/second and 6MB to 50MB 
per file), or captured to disk from a high-definition video camera, 
there is little margin for error and blocking on writes will result in 
missed frames or other malfunction.  Current ZFS write behavior is 
based on timing and the amount of system memory and it does not seem 
that throwing more storage hardware at the problem solves anything at 
all.


I wonder whether a filesystem property streamed might be appropriate?  
This could act as hint to ZFS that the data is sequential and should be 
streamed direct to disk.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS for iSCSI based SAN

2009-06-24 Thread Philippe Schwarz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

milosz a écrit :
 Within the thread there are instructions for using iometer to load test your 
 storage. You should test out your solution before going live, and compare 
 what you get with what you need. Just because striping 3 mirrors *will* give 
 you more performance than raidz2 doesn't always mean that is the best 
 solution. Choose the best solution for your use case.
 
 multiple vm disks that have any kind of load on them will bury a raidz
 or raidz2.  out of a 6x raidz2 you are going to get the iops and
 random seek latency of a single drive (realistically the random seek
 will probably be slightly worse, actually).  how could that be
 adequate for a virtual machine backend?  if you set up a raidz2 with
 6x15k drives, for the majority of use cases, you are pretty much
 throwing your money away.  you are going to roll your own san, buy a
 bunch of 15k drives, use 2-3u of rackspace and four (or more)
 switchports, and what you're getting out of it is essentially a 500gb
 15k drive with a high mttdl and a really huge theoretical transfer
 speed for sequential operations (which you won't be able to saturate
 anyway because you're delivering over gige)?  for this particular
 setup i can't really think of a situation where that would make sense.
Ouch !
Pretty direct answer. That's very interesting however.

Let me focus on a few more points :
- - The hardware can't really be extended any more. No budget ;-(

- - the VM will be mostly few IO systems :
- -- WS2003 with Trend Officescan, WSUS (for 300 XP) and RDP
- -- Solaris10 with SRSS 4.2 (Sunray server)

(File and DB servers won't move in a nearby future to VM+SAN)

I thought -but could be wrong- that those systems could afford a high
latency IOs data rate.

what you're getting out of it is essentially a 500gb
 15k drive with a high mttdl

That's what i wanted, a rock-solid disk area, despite a
not-as-good-as-i'd-like random IO.

I'll give it a try with sequential tranfer.

However, thanks for your answer.

 
 Regarding ZIL usage, from what I have read you will only see benefits if you 
 are using NFS backed storage, but that it can be significant.
 
 link?
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 


- --
Cordialement.
- - Lycée Alfred Nobel,Clichy sous bois http://www.lyceenobel.org
KeyID 0x46EA1D16 FingerPrint 997B164F4F606A61E7B1FC61961A821646EA1D16

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpCjKIACgkQlhqCFkbqHRZ8EwCbBbtEsFOimeiUXFMNRBrJI4uO
xuAAnRO8pv3ES2bhIUWfEuyEtp8M1vGl
=kRUK
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS for iSCSI based SAN

2009-06-24 Thread Philippe Schwarz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

David Magda a écrit :
 On Wed, June 24, 2009 08:42, Philippe Schwarz wrote:
 
 In my tests ESX4 seems to work fine with this, but i haven't already
 stressed it ;-)

 Therefore, i don't know if the 1Gb FDuplex per port will be enough, i
 don't know either i'have to put sort of redundant access form ESX to
 SAN,etc

 Is my configuration OK ? It's only a preprod install, i'm able to break
 almost everything if it's necessary.
 
 At least in 3.x, VMware had a limitation of only being able to use one
 connection per iSCSI target (even if there were multiple LUNs on it):
 
 http://mail.opensolaris.org/pipermail/zfs-discuss/2009-June/028731.html
 
 Not sure if that's changed in 4.x, so if you're going to have more than
 one LUN, then having more than one target may be advantageous. See also:
 
 http://www.vmware.com/files/pdf/iSCSI_design_deploy.pdf
 
 You may want to go to the VMware lists / forums to see what the people
 there say as well.
 


 Out of curiosity, any reason why went with iSCSI and not NFS? There seems
 to be some debate on which is better under which circumstances.
iSCSI instead of NFS ?
Because of the overwhelming difference in transfer rate between
them, In fact, that's what i read.

And setting isCSI target is so simple, that i didn't even search another
solution.

Thanks for your answer.


- --
Cordialement.
- - Lycée Alfred Nobel,Clichy sous bois http://www.lyceenobel.org
KeyID 0x46EA1D16 FingerPrint 997B164F4F606A61E7B1FC61961A821646EA1D16

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpCkpgACgkQlhqCFkbqHRZYtACfc5QMhQmWvC1wAZD36YLJkBNT
XV8An0DPj+te+ppS0fBAlDL8vmFKMGG+
=h0Nv
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-24 Thread Bob Friesenhahn

On Wed, 24 Jun 2009, Marcelo Leal wrote:

I think that is the purpose of the current implementation: 
http://blogs.sun.com/roch/entry/the_new_zfs_write_throttle But seems 
like is not that easy... as i did understand what Roch said, seems 
like the cause is not always a hardy writer.


I see this:

The new code keeps track of the amount of data accepted in a TXG and 
the time it takes to sync. It dynamically adjusts that amount so that 
each TXG sync takes about 5 seconds (txg_time variable). It also 
clamps the limit to no more than 1/8th of physical memory.


It is interesting that it was decided that a TXG sync should take 5 
seconds by default.  That does seem to be about what I am seeing here. 
There is no mention of the devastation to the I/O channel which occurs 
if the kernel writes 5 seconds worth of data (e.g. 2GB) as fast as 
possible on a system using mirroring (2GB becomes 4GB of writes).  If 
it writes 5 seconds of data as fast as possible, then it seems that 
this blocks any opportunity to read more data so that application 
processing can continue during the TXG sync.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-24 Thread Bob Friesenhahn

On Thu, 25 Jun 2009, Ian Collins wrote:


I wonder whether a filesystem property streamed might be appropriate?  This 
could act as hint to ZFS that the data is sequential and should be streamed 
direct to disk.


ZFS does not seem to offer an ability to stream direct to disk other 
than perhaps via the special raw mode known to database developers.


It seems that current ZFS behavior is works as designed.  The write 
transaction time is currently tuned for 5 seconds and so it writes 
data intensely for 5 seconds while either starving the readers 
and/or blocking the writers.  Notice that by the end of TXG write, zfs 
iostat is reporting zero reads:


% zpool iostat Sun_2540 1
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
Sun_2540 456G  1.18T 14  0  1.86M  0
Sun_2540 456G  1.18T  0 19  0  1.47M
Sun_2540 456G  1.18T  0  3.11K  0   385M
Sun_2540 456G  1.18T  0  3.00K  0   385M
Sun_2540 456G  1.18T  0  3.34K  0   387M
Sun_2540 456G  1.18T  0  3.01K  0   386M
Sun_2540 458G  1.18T 19  1.87K  30.2K   220M
Sun_2540 458G  1.18T  0  0  0  0
Sun_2540 458G  1.18T275  0  34.4M  0
Sun_2540 458G  1.18T448  0  56.1M  0
Sun_2540 458G  1.18T468  0  58.5M  0
Sun_2540 458G  1.18T425  0  53.2M  0
Sun_2540 458G  1.18T402  0  50.4M  0
Sun_2540 458G  1.18T364  0  45.5M  0
Sun_2540 458G  1.18T339  0  42.4M  0
Sun_2540 458G  1.18T376  0  47.0M  0
Sun_2540 458G  1.18T307  0  38.5M  0
Sun_2540 458G  1.18T380  0  47.5M  0
Sun_2540 458G  1.18T148  1.35K  18.3M   117M
Sun_2540 458G  1.18T 20  3.01K  2.60M   385M
Sun_2540 458G  1.18T 15  3.00K  1.98M   384M
Sun_2540 458G  1.18T  4  3.03K   634K   388M
Sun_2540 458G  1.18T  0  3.01K  0   386M
Sun_2540 460G  1.18T142792  15.8M  82.7M
Sun_2540 460G  1.18T375  0  46.9M  0

Here is an interesting discussion thread on another list that I had 
not seen before:


http://opensolaris.org/jive/thread.jspa?messageID=347212

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS for iSCSI based SAN

2009-06-24 Thread milosz
 - - the VM will be mostly few IO systems :
 - -- WS2003 with Trend Officescan, WSUS (for 300 XP) and RDP
 - -- Solaris10 with SRSS 4.2 (Sunray server)

 (File and DB servers won't move in a nearby future to VM+SAN)

 I thought -but could be wrong- that those systems could afford a high
 latency IOs data rate.

might be fine most of the time...  rdp in particular is vulnerable to
io spiking and disk latency.  depends on how many users you have on
that rdp vm.  also wsus is surprisingly (or not, given it's a
microsoft production) resource-hungry.

if those servers are on physical boxes right now i'd do some perfmon
caps and add up the iops.

what you're getting out of it is essentially a 500gb
 15k drive with a high mttdl

 That's what i wanted, a rock-solid disk area, despite a
 not-as-good-as-i'd-like random IO.

fair enough.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best controller card for 8 SATA drives ?

2009-06-24 Thread Eric D. Mudama

On Wed, Jun 24 at 15:38, Bob Friesenhahn wrote:

On Wed, 24 Jun 2009, Orvar Korvar wrote:

I thought of exchanging my PCI card with a PCIe card variant instead  
to reach higher speeds. PCI-X is legacy. The problem with PCIe cards  
is that soon SSD drives will be common. A ZFS raid with SSD would need 
maybe PCIe x 16 or so, to reach max band width. The PCIe cards are all 
PCIe x 4 or something of today. I need a PCIe x 16 card to make it 
future proof for the SSD discs. Maybe the best bet would be to attach a 
SSD disc directly to a PCIe slot, to reach max transfer speed? Or wait 
for SATA 3? I dont know. I want to wait until SSD


I don't think this is valid thinking because it assumes that write rates 
for SSDs are higher than for traditional hard drives.  This assumption is 
not often correct.  Maybe someday.


SSDs offer much lower write latencies (no head seek!) but their bulk  
sequential data transfer properties are not yet better than hard drives.


The main purpose for using SSDs with ZFS is to reduce latencies for  
synchronous writes required by network file service and databases.


In the available 5 months ago category, the Intel X25-E will write
sequentially at ~170MB/s according to the datasheets.  That is faster
than most, if not all rotating media today.

--eric



--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS for iSCSI based SAN

2009-06-24 Thread David Magda


On Jun 24, 2009, at 16:54, Philippe Schwarz wrote:

Out of curiosity, any reason why went with iSCSI and not NFS? There  
seems

to be some debate on which is better under which circumstances.

iSCSI instead of NFS ?
Because of the overwhelming difference in transfer rate between
them, In fact, that's what i read.


That would depend on I/O pattern, wouldn't it? If you have mostly  
random I/O then it's unlikely you'd saturate a GigE as you're not  
streaming. Well, this is with 3.x. I don't have any experience with  
4.x so I guess it's best to test. Everyone's going to have to build up  
all their knowledge from scratch with the new software. :)


http://tinyurl.com/d8urpx
http://vmetc.com/2009/05/01/reasons-for-using-nfs-with-vmware-virtual-infrastructure/

Cloning Windows images (assuming one VMDK per FS) would be a  
possibility as well. Either way, you may want to tweak some of the TCP  
settings for best results:


http://serverfault.com/questions/13190

And setting isCSI target is so simple, that i didn't even search  
another

solution.


# zfs set sharenfs=on mypool/myfs1

http://docs.sun.com/app/docs/doc/819-5461/gamnd

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-24 Thread Richard Elling

Bob Friesenhahn wrote:

On Wed, 24 Jun 2009, Marcelo Leal wrote:

I think that is the purpose of the current implementation: 
http://blogs.sun.com/roch/entry/the_new_zfs_write_throttle But seems 
like is not that easy... as i did understand what Roch said, seems 
like the cause is not always a hardy writer.


I see this:

The new code keeps track of the amount of data accepted in a TXG and 
the time it takes to sync. It dynamically adjusts that amount so that 
each TXG sync takes about 5 seconds (txg_time variable). It also 
clamps the limit to no more than 1/8th of physical memory.


hmmm... methinks there is a chance that the 1/8th rule might not work so 
well

for machines with lots of RAM and slow I/O.  I'm also reasonably sure that
that sort of machine is not what Sun would typically build for 
performance lab
testing, as a rule.  Hopefully Roch will comment when it is morning in 
Europe.

-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best controller card for 8 SATA drives ?

2009-06-24 Thread Bob Friesenhahn

On Wed, 24 Jun 2009, Eric D. Mudama wrote:


The main purpose for using SSDs with ZFS is to reduce latencies for 
synchronous writes required by network file service and databases.


In the available 5 months ago category, the Intel X25-E will write
sequentially at ~170MB/s according to the datasheets.  That is faster
than most, if not all rotating media today.


Sounds good.  Is that is after the whole device has been re-written a 
few times or just when you first use it?  How many of these devices do 
you own and use?


Seagate Cheetah drives can now support a sustained data rate of 
204MB/second.  That is with 600GB capacity rather than 64GB and at a 
similar price point (i.e. 10X less cost per GB).  Or you can just 
RAID-0 a few cheaper rotating rust drives and achieve a huge 
sequential data rate.


I see that the Intel X25-E claims a sequential read performance of 
250 MB/s.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-24 Thread Bob Friesenhahn

On Wed, 24 Jun 2009, Richard Elling wrote:


The new code keeps track of the amount of data accepted in a TXG and the 
time it takes to sync. It dynamically adjusts that amount so that each TXG 
sync takes about 5 seconds (txg_time variable). It also clamps the limit to 
no more than 1/8th of physical memory.


hmmm... methinks there is a chance that the 1/8th rule might not work so well
for machines with lots of RAM and slow I/O.  I'm also reasonably sure that
that sort of machine is not what Sun would typically build for performance 
lab
testing, as a rule.  Hopefully Roch will comment when it is morning in 
Europe.


Slow I/O is relative.  If I install more memory does that make my I/O 
even slower?


I did some more testing.  I put the input data on a different drive 
and sent application output to the ZFS pool.  I no longer noticed any 
stalls in the execution even though the large ZFS flushes are taking 
place.  This proves that my application is seeing stalled reads rather 
than stalled writes.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Turn off the time slider on some zpools

2009-06-24 Thread Cindy . Swearingen

Hi Mykola,

Yes, if you are speaking of the automatic TimeSlider snapshots,
the snapshots are rotated. I think the threshold is 80% full
disk space.

Cheers,

Cindy

Mykola Maslov wrote:



How to turn off the timeslider snapshots on certain file systems?



http://wikis.sun.com/display/OpenSolarisInfo/How+to+Manage+the+Automatic+ZFS+Snapshot+Service 




Thank you, very handy stuff!

BTW - will zfs automatically delete snapshots, when I`ll go low on disk 
space?




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-24 Thread Lejun Zhu
 On Wed, 24 Jun 2009, Richard Elling wrote:
  
  The new code keeps track of the amount of data
 accepted in a TXG and the 
  time it takes to sync. It dynamically adjusts that
 amount so that each TXG 
  sync takes about 5 seconds (txg_time variable). It
 also clamps the limit to 
  no more than 1/8th of physical memory.
 
  hmmm... methinks there is a chance that the 1/8th
 rule might not work so well
  for machines with lots of RAM and slow I/O.  I'm
 also reasonably sure that
  that sort of machine is not what Sun would
 typically build for performance 
  lab
  testing, as a rule.  Hopefully Roch will comment
 when it is morning in 
  Europe.
 
 Slow I/O is relative.  If I install more memory does
 that make my I/O 
 even slower?
 
 I did some more testing.  I put the input data on a
 different drive 
 and sent application output to the ZFS pool.  I no
 longer noticed any 
 stalls in the execution even though the large ZFS
 flushes are taking 
 place.  This proves that my application is seeing
 stalled reads rather 
 than stalled writes.

There is a bug in the database about reads blocked by writes which may be 
related:

http://bugs.opensolaris.org/view_bug.do?bug_id=6471212

The symptom is sometimes reducing queue depth makes read perform better.

 
 Bob
 --
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us,
 http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,
http://www.GraphicsMagick.org/
 
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discu
 ss
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] BugID formally known as 6746456

2009-06-24 Thread Rob Healey
Does anyone know if related problems to the panic's dismissed as duplicate of 
6746456 ever resulted in Solaris 10 patches? It sounds like they were actually 
solved in OpenSolaris but S10 is still panicing predictably when Linux NFS 
clients try to change a nobody UID/GID on a ZFS exported filesystem.

Specifically the NFS induced panics related to the nobody id not mapping 
correctly, or, more precisely, attempts to change user/group ID nobody causing 
S10u7 to blow chunks in zfs_fuid.c zfs_fuid_table_load's ASSERT?

While the workaround to change the id's on the server is possible, it pretty 
much torpedo's management's view of Solaris' stability and sending fileserver 
duty back to Linux... :( Anybody could create a nobody file and put the system 
into endless boot-loops without this being patched.

I'm hoping further work on this issue was done on the S10 side of the house and 
there is a stealthy patch ID that can fix the issue.

Thanks,

-Rob
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Migration: 1 x 160GB IDE boot drive --- 2 x 30GB SATA SSDs

2009-06-24 Thread Fajar A. Nugraha
On Wed, Jun 24, 2009 at 6:32 PM, Simon Bredenno-re...@opensolaris.org wrote:
 FIRST QUESTION:
 Although, it seems possible to add a drive to form a mirror for the ZFS boot 
 pool 'rpool', the main problem I see is that in my case, I would be 
 attempting to form a mirror using a smaller drive (30GB) than the initial 
 160GB drive.
 Is there an easy solution to this problem, or would it be simpler to just do 
 a reinstall of OpenSolaris 2009.06 onto 2 brand new 30GB SSDs? I have the 
 option of the fresh install, as I haven't invested much time in configuring 
 this OS2009.06 boot environment yet.

Depends on how you define easy. Due to the smaller new drive, you
can't use zpool replace.
Some people will find this easy enough :
http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide#ZFS_Root_Pool_Recovery
Others might find it's too complicated, and opt for reinstall plus
zpool attach


 SECOND QUESTION:
 I also want the possibility to have multiple boot environments within 
 OpenSolaris 2009.06 to allow easy rollback to a working boot environment in 
 case of an IPS update problem. I presume this will not cause any additional 
 complications?

Correct.

 THIRD QUESTION:
 This is for a home fileserver so I don't want to spend too much, but does 
 anyone see any problem with having the OS installed on MLC SSDs, which are 
 cheaper than SLC SSDs. I'm thinking here specifically about wearing out the 
 SSD if the OS does too many writes to the SSDs.

zfs is SSD-friendly due to it's copy-on-write nature. Having a mirror
also provide additional level of protection.

-- 
Fajar
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss