Re: [zfs-discuss] Liveupgrade'd to U8 and now can't boot previous U6 BE :(

2009-10-20 Thread Renil Thomas
Were you able to get more insight about this problem ?
U7 did not encounter such problems.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Liveupgrade'd to U8 and now can't boot previous U6 BE :(

2009-10-20 Thread Philip Brown
Quote: cindys 
3. Boot failure from a previous BE if either #1 or #2 failure occurs.

#1 or #2 were not relevant in my case.  Just found I could not boot into old u7 
be. I am happy with workaround as shinsui points out, so this is purely for 
your information.

Quote: renil82
U7 did not encounter such problems.

my problem occurred from lu 07 to 08. 
again only for information purposes as workaround is sufficient.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS mirror resilver process

2009-10-20 Thread Rasmus Fauske

Hi,

Now I have tried to restart the resilvering by detaching c9t7d0 and then 
attaching it again to the mirror, then the resilvering starts but now 
after almost 24 hours it is still going.


From the iostat it still shows data flowing:
tank-nfs 446G  2,28T112  8  13,5M  35,9K
 mirror 145G   783G107  2  13,4M  12,0K
   c9t6d0  -  -106  2  13,3M  12,0K
   c9t7d0  -  -  0110  0  13,4M

$zpool status -xv
 pool: tank-nfs
state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
   continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scrub: resilver in progress for 23h23m, 100,00% done, 0h0m to go
config:

   NAME STATE READ WRITE CKSUM
   tank-nfs ONLINE   0 0 0
 mirror ONLINE   0 0 0
   c9t6d0   ONLINE   0 0 0
   c9t7d0   ONLINE   0 0 0  1,02T resilvered

This time as you can see it is 0 checksum errors during the resilvering.

Is it something with the build I am using ? (118)

--
Rasmus Fauske

Markus Kovero skrev:

We've noticed this behaviour when theres problem with ram (plenty of checksum 
errors) and on these cases I doubt resilver will ever finnish, you can use 
iostat to monitor if theres anything happening on disk that should be 
resilvered, if not, Id say data wanted to resilver is somehow gone bad due 
broken ram or who knows.
(this is actually resilver and checksumming working as it should, no data that 
is not valid should be written)

Yours
Markus Kovero


  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] iscsi share on different subnet?

2009-10-20 Thread Kent Watsen


I have ZFS/Xen server for my home network.  The box itself has two 
physical NICs.   I want Dom0 to be on my management network and the 
guest domains to be on the dmz and private networks.  The private 
network is where all my home computers are and would like to export 
iscsi volumes directly to them - without having to create a firewall 
rule to grant them access to the management network.  After some 
searching, I have yet to find a way to specify the subnet an iSCSI 
target is visible to - is there any way to do that?


Another idea, I suppose, would be to have one of the guest domains mount 
the volume and then export it itself, but this would be less performant 
and more complicated...


Thanks,
Kent

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] iscsi share on different subnet?

2009-10-20 Thread Darren J Moffat

Kent Watsen wrote:


I have ZFS/Xen server for my home network.  The box itself has two 
physical NICs.   I want Dom0 to be on my management network and the 
guest domains to be on the dmz and private networks.  The private 
network is where all my home computers are and would like to export 
iscsi volumes directly to them - without having to create a firewall 
rule to grant them access to the management network.  After some 
searching, I have yet to find a way to specify the subnet an iSCSI 
target is visible to - is there any way to do that?


Given this is to do with COMSTAR iSCSI (or the old userland iscsi target 
daemon) and not ZFS you are more likely to get an answer on 
storage-disc...@opensolaris.org


Using stmfadm(1M) you can configure which address the target is exposed 
to and set what type of authentication you want to use.   Given what you 
have described you probably want to configure one or most host groups 
with stmfadm(1M).   I'm not a COMSTAR expert so I suggest asking on 
storage-discuss if you need more help than that.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Performance of ZFS and UFS inside local/global zone

2009-10-20 Thread Andre Boegelsack
Dear all,

I was interested in the performance difference between filesystem operations 
inside a local and global zone. Therefore I utilized filebench and made several 
performance tests with the OLTP script for filebench. Here are some of my 
results:

- In the global zone (filebench operates on basis of UFS): [b]281,850.2 
IOPS/sec[/b]
- In the local zone (filebench operates on basis of UFS): [b]181,716 
IOPS/sec[/b]
So a huge difference bewteen the local and global zone when operating on basis 
of UFS.

After I saw the huge difference I wondered if I will see such a huge difference 
when using ZFS. Here are the results:
- In the global zone (filebench operates on basis of ZFS): [b]1,710,268.1 
IOPS/sec[/b]
- In the local zone (filebench operates on basis of ZFS): [b]449,332.6 
IOPS/sec[/b]
I was a little bit surprised to see a big difference again. Besides: ZFS 
outperforms UFS - but this is already known.

So in a first analysis I suspected the loop-back-device driver to cause the 
performance degredation. I repeated the tests without the loop-back-driver by 
mounting UFS device directly inside the zone. I couldn't discover the 
performance degredation again.

This leads me to the assumption, that the loop-back-driver causes the 
performance degredation - but how can I make sure it is the loop-back-driver 
and not anything else? Does anyone has an idea how to explore this phenomen?

Regards
André
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance of ZFS and UFS inside local/global zone

2009-10-20 Thread Casper . Dik


Very easy:

- make a directory
- mount it using lofs

run filebench on both directories.

It seems like that we need to make lofs faster.

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zpool without any redundancy

2009-10-20 Thread jay
I'm no expert but if I was in the same situation, I would definately keep the 
integrity check on.  Especially since your only running a raid5, the sooner you 
know there is a problem the better.  Even if zfs can not fix it for you it can 
still be a useful tool.  Basically a few errors may not be worth fixing 
manually, but if lots of errors start happening, your better off knowing before 
a full drive failure.  Now in certain situations, the extra overhead may not be 
worth the extra relyability.  But that's a much more complex discussion, that 
would need a lot more information.

--Original Message--
From: Espen Martinsen
Sender: zfs-discuss-boun...@opensolaris.org
To: zfs Discuss
Subject: [zfs-discuss] Zpool without any redundancy
Sent: Oct 20, 2009 12:49 AM

Hi,
  This might be a stupid question, but I can't figure it out.

  Let's say I've chosen to live with a zpool without redundancy, 
  (SAN disks, has actually raid5 in disk-cabinet)

m...@mybox:~# zpool status  BACKUP
  pool: BACKUP
 state: ONLINE
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
BACKUP   ONLINE   0 0 0
  c0t200400A0B829BC13d0  ONLINE   0 0 0
  c0t200400A0B829BC13d1  ONLINE   0 0 0
  c0t200400A0B829BC13d2  ONLINE   0 0 0

errors: No known data errors


The question:
Would it be a good idea to torn OFF the 'checksum' property of the ZFS
  filesystems?

I know the manual says it is not recommended to turn off integrity of 
user-data, but what
will happen if the algorithm actually finds one?  I would not have any way to 
fix that, except
delete/overrite the data. (will I be able to point out what files are involved)


Yours

Espen Martinsen
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



Sent from my BlackBerry® smartphone with SprintSpeed
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Antwort: Re: Performance of ZFS and UFS inside local/global zone

2009-10-20 Thread Casper . Dik


I did that.

Isn't that sufficient proof?

Perhaps run both tests in the global zone?

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zpool without any redundancy

2009-10-20 Thread Mark J Musante

On Mon, 19 Oct 2009, Espen Martinsen wrote:

Let's say I've chosen to live with a zpool without redundancy, (SAN 
disks, has actually raid5 in disk-cabinet)


What benefit are you hoping zfs will provide in this situation?  Examine 
your situation carefully and determine what filesystem works best for you. 
There are many reasons to use ZFS, but if your configuration isn't set up 
to take advantage of those reasons, then there's a disconnect somewhere.


The question: Would it be a good idea to torn OFF the 'checksum' 
property of the ZFS filesystems?


No.  It is never a good idea to turn off checksumming.  Why run ZFS at 
all, then?  Without checksums, it can't detect bad data.  Without 
redundancy, it can't repair bad data.  At least if you have checksums on, 
you get to know which files are corrupt and need to be restored from 
backup.


Given the name of your pool, though (BACKUP), it seems to me that you'd 
want this to be as safe as possible.  In other words, both redundancy and 
checksums.  If you can export non-redundant disks from your cabinet, and 
let ZFS manage the redundancy, that seems like it would give you the best 
protection.


I know the manual says it is not recommended to turn off integrity of 
user-data, but what will happen if the algorithm actually finds one?  I 
would not have any way to fix that, except delete/overrite the data. 
(will I be able to point out what files are involved)


Yes, with checksumming on, zfs can tell you exactly which files are bad, 
even in a non-redundant pool.



Regards,
markm
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2009-10-20 Thread Robert Dupuy
A word of caution, be sure not to read a lot into the fact that the F20 is 
included in the Exadata Machine.

From what I've heard the flash_cache feature of 11.2.0 Oracle that was enabled 
in beta, is not working in the production release, for anyone except the 
Exadata 2.

The question is, why did they need to give this machine an unfair software 
advantage?  Is it because of the poor performance they found with the F20?

Oracle bought Sun, they have reason to make such moves.

I have been talking to a Sun rep for weeks now, trying to get the latency specs 
on this F20 card, with no luck in getting that revealed so far.

However, you can look at Sun's other products like the F5100, which are very 
unimpressive and high latency.

I would not assume this Sun tech is in the same league as a Fusion-io ioDrive, 
or a Ramsan-10.  They would not confirm whether its a native PCIe solution, or 
if the reason it comes on a SAS card, is because it requires SAS.

So, test, test, test, and don't assume this card is competitive because it came 
out this year, I am not sure its even competitive with last years ioDrive.

I told my sun reseller that I merely needed it to be faster than the Intel 
X25-E in terms of latency, and they weren't able to demonstrate that, at least 
so far...lots of feet dragging, and I can only assume they want to sell as much 
as they can, before the cards metrics become widely known.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2009-10-20 Thread Tim Cook
On Tue, Oct 20, 2009 at 10:23 AM, Robert Dupuy rdu...@umpublishing.orgwrote:

 A word of caution, be sure not to read a lot into the fact that the F20 is
 included in the Exadata Machine.

 From what I've heard the flash_cache feature of 11.2.0 Oracle that was
 enabled in beta, is not working in the production release, for anyone except
 the Exadata 2.

 The question is, why did they need to give this machine an unfair software
 advantage?  Is it because of the poor performance they found with the F20?

 Oracle bought Sun, they have reason to make such moves.

 I have been talking to a Sun rep for weeks now, trying to get the latency
 specs on this F20 card, with no luck in getting that revealed so far.

 However, you can look at Sun's other products like the F5100, which are
 very unimpressive and high latency.

 I would not assume this Sun tech is in the same league as a Fusion-io
 ioDrive, or a Ramsan-10.  They would not confirm whether its a native PCIe
 solution, or if the reason it comes on a SAS card, is because it requires
 SAS.

 So, test, test, test, and don't assume this card is competitive because it
 came out this year, I am not sure its even competitive with last years
 ioDrive.

 I told my sun reseller that I merely needed it to be faster than the Intel
 X25-E in terms of latency, and they weren't able to demonstrate that, at
 least so far...lots of feet dragging, and I can only assume they want to
 sell as much as they can, before the cards metrics become widely known.
 --



That's an awful lot of assumptions with no factual basis for any of your
claims.

As for your bagging on the F5100... what exactly is your problem with its
latency?  Assuming you aren't using absurdly large block sizes, it would
appear to fly.  0.15ms is bad?
http://blogs.sun.com/BestPerf/entry/1_6_million_4k_iops

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2009-10-20 Thread Robert Dupuy
My post is a caution to test the performance, and get your own results.

http://www.storagesearch.com/ssd.html

Please see the entry for October 12th.  

The result page you linked too, shows that you can use an arbitrarily high 
number of threads, spread evenly across a large number of SAS channels, and get 
the results to scale.

This is Sun's ideal conditions designed to sell the F5100.  Now, real world 
performance is unimpressive.

The results need to be compared to other Flash systems, not to traditional hard 
drives.  Most flash, including Sun's trumps traditional hard drives.

I'm issuing a caution because I think its a benefit.  Look at Sun's numbers for 
latency

http://www.sun.com/storage/disk_systems/sss/f5100/specs.xml

.41ms

Fast compared to hard drives, but quite slow compared to competing SSD.

I've done testing with the X25-E (Intel 32GB 2.5 SATA form factor drives).

I'm cautious about the F20, precisely because I would think Sun would be 
anxious to prove its faster than this competitor.

I have not said its slower, only that its unconfirmed, and so my recommendation 
is to confirm the performance of this card, do not assume.

Good advice.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zvol used apparently greater than volsize for sparse volume

2009-10-20 Thread Cindy Swearingen

Hi Stuart,

The reason why used is larger than the volsize is because we
aren't accounting for metadata, which is covered by this CR:

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6429996
6429996 zvols don't reserve enough space for requisite meta data

Metadata is usually only a small percentage.

Sparse-ness is not a factor here.  Sparse just means we ignore the
reservation so you can create a zvol bigger than what we'd normally
allow.

Cindy

On 10/17/09 13:47, Stuart Anderson wrote:

What does it mean for the reported value of a zvol volsize to be
less than the product of used and compressratio?


For example,

# zfs get -p all home1/home1mm01
NAME PROPERTY VALUE  SOURCE
home1/home1mm01  type volume -
home1/home1mm01  creation 1254440045 -
home1/home1mm01  used 14902492672-
home1/home1mm01  available16240062464-
home1/home1mm01  referenced   14902492672-
home1/home1mm01  compressratio11.20x -
home1/home1mm01  reservation  0  default
home1/home1mm01  volsize  161061273600   -
home1/home1mm01  volblocksize 16384  -
home1/home1mm01  checksum on default
home1/home1mm01  compression  gzip-1 inherited from 
home1

home1/home1mm01  readonly offdefault
home1/home1mm01  shareiscsi   offdefault
home1/home1mm01  copies   1  default
home1/home1mm01  refreservation   0  default



Yet used (14902492672) * compresratio  (11.20) = 166907917926
which is 3.6% larger than volsize.

Is this a bug or a feature for sparse volumes? If a feature, how
much larger than volsize/compressratio can the actual used
storage space grow? e.g., fixed amount overhead and/or
fixed percentage?

Thanks.

--
Stuart Anderson  ander...@ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zpool without any redundancy

2009-10-20 Thread Prasad Unnikrishnan
 What benefit are you hoping zfs will provide in this
 situation?  Examine 
 your situation carefully and determine what
 filesystem works best for you. 
 There are many reasons to use ZFS, but if your
 configuration isn't set up 
 to take advantage of those reasons, then there's a
 disconnect somewhere.

Could you elaborate this? Do you mean to say - dont use ZFS when you have a H/W 
raid configuration? Or create both?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and quota/refqoutoa question

2009-10-20 Thread Matthew Ahrens

Peter Wilk wrote:

tank/appswill be mounted as /apps  -- need to be set with 10G
tank/apps/data1  will need to be mount as /apps/data1, need to be set 
with 20G alone.
 
The question is:
If refquota is being used to set the filesystem sizes on /apps and 
/apps/data1. /apps/data1 will not be inheriting the quota from /apps. In 
other words, /apps/data1 will have full 20G usable disk space and /apps 
will have 10G usable disk space. /apps/data1 will not be inherting disk 
space from /apps.


That is correct.

--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2009-10-20 Thread Bob Friesenhahn

On Tue, 20 Oct 2009, Robert Dupuy wrote:


My post is a caution to test the performance, and get your own results.

http://www.storagesearch.com/ssd.html

Please see the entry for October 12th.


I see an editorial based on no experience and little data.

The result page you linked too, shows that you can use an 
arbitrarily high number of threads, spread evenly across a large 
number of SAS channels, and get the results to scale.


This is Sun's ideal conditions designed to sell the F5100.  Now, 
real world performance is unimpressive.


It is not clear to me that real world performance is unimpressive 
since then it is necessary to define what is meant by real world. 
Many real world environments are naturally heavily threaded.


I'm issuing a caution because I think its a benefit.  Look at Sun's 
numbers for latency


http://www.sun.com/storage/disk_systems/sss/f5100/specs.xml

.41ms

Fast compared to hard drives, but quite slow compared to competing SSD.


You are assuming that the competing SSD vendors measured latency the 
same way that Sun did.


Sun has always been conservative with their benchmarks and their 
specifications.


I've done testing with the X25-E (Intel 32GB 2.5 SATA form factor 
drives).


I'm cautious about the F20, precisely because I would think Sun 
would be anxious to prove its faster than this competitor.


Is the X25-E a competitor?  I never even crossed my mind that a device 
like the X25-E would be a competitor.  They don't satisfy the same 
requirements.  1K non-volatile write IOPS vs 84k non-volatile write 
IOPS.  Seems like night and day to me (and I am sure that Sun prices 
accordingly).


The only thing I agree with is the need to perform real world testing 
for the intended application.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zvol used apparently greater than volsize for sparse volume

2009-10-20 Thread Stuart Anderson

Cindy,
	Thanks for the pointer. Until this is resolved, is there some  
documentation
available that will let me calculate this by hand? I would like to  
know how large

the current 3-4% meta data storage I am observing can potentially grow.

Thanks.


On Oct 20, 2009, at 8:57 AM, Cindy Swearingen wrote:


Hi Stuart,

The reason why used is larger than the volsize is because we
aren't accounting for metadata, which is covered by this CR:

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6429996
6429996 zvols don't reserve enough space for requisite meta data

Metadata is usually only a small percentage.

Sparse-ness is not a factor here.  Sparse just means we ignore the
reservation so you can create a zvol bigger than what we'd normally
allow.

Cindy

On 10/17/09 13:47, Stuart Anderson wrote:

What does it mean for the reported value of a zvol volsize to be
less than the product of used and compressratio?



--
Stuart Anderson  ander...@ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2009-10-20 Thread Robert Dupuy
I agree, that assuming that the F20 works well for your application, because 
its included in the Exadata 2, probably isn't logical.

Equally, assuming it doesn't work, isn't logical.

Yes, the X-25E is clearly a competitor.  It was once part of the Pillar Data 
Systems setup, and was disqualified based on reliability issues...in that 
sense, doesn't seem like a good competitor, but it is a competitior.

I'm not here to promote the X-25E, however Sun does sell a rebadged X-25E in 
their own servers, and my particular salesman, spec'd both an X-25E based 
system, and an F20 based systemso they were clearly pitched against each 
other.

As far as I'm assuming about Sun's .41ms benchmark methodology, really?  am I 
sir?  hardly.

I start that as the basis of a discussion, because Sun published that number.
Seemed logical.

But I think we mostly agree, good idea to test.   My intention was just to save 
anyone from disappointment, should they purchase without testing.

I admit, I haven't posted here before, I registered precisely because google 
was showing this page as being a forum to discuss this card, and...just wanted 
to discuss it some.  

My apologies if I seemed too enthusiastic in my points, from the get go.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] group and user quotas - a temporary hack?

2009-10-20 Thread Matthew Ahrens

Alastair Neil wrote:

However, the user or group quota is applied when a clone or a
snapshot is created from a file system that has a user or group quota.

applied to a clone I understand what that means, applied to a 
snapshot - not so clear does it mean enforced on the original dataset? 


Since snapshots can not be modified, you're right that applying user quotas 
to snapshots doesn't make any sense.  However, the user/group quota/used is 
recorded in the snapshot, and the values (including user quotas) are the 
values at the time the snapshot was taken.  Contrast that with other 
properties (eg, quota, exec), where the snapshot does not have its own value 
for those properties; it just applies the filesystem's current value.


--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help! System panic when pool imported

2009-10-20 Thread Albert Chin
On Mon, Oct 19, 2009 at 09:02:20PM -0500, Albert Chin wrote:
 On Mon, Oct 19, 2009 at 03:31:46PM -0700, Matthew Ahrens wrote:
  Thanks for reporting this.  I have fixed this bug (6822816) in build  
  127.
 
 Thanks. I just installed OpenSolaris Preview based on 125 and will
 attempt to apply the patch you made to this release and import the pool.

Did the above and the zpool import worked. Thanks!

  --matt
 
  Albert Chin wrote:
  Running snv_114 on an X4100M2 connected to a 6140. Made a clone of a
  snapshot a few days ago:
# zfs snapshot a...@b
# zfs clone a...@b tank/a
# zfs clone a...@b tank/b
 
  The system started panicing after I tried:
# zfs snapshot tank/b...@backup
 
  So, I destroyed tank/b:
# zfs destroy tank/b
  then tried to destroy tank/a
# zfs destroy tank/a
 
  Now, the system is in an endless panic loop, unable to import the pool
  at system startup or with zpool import. The panic dump is:
panic[cpu1]/thread=ff0010246c60: assertion failed: 0 == 
  zap_remove_int(mos, ds_prev-ds_phys-ds_next_clones_obj, obj, tx) (0x0 == 
  0x2), file: ../../common/fs/zfs/dsl_dataset.c, line: 1512
 
ff00102468d0 genunix:assfail3+c1 ()
ff0010246a50 zfs:dsl_dataset_destroy_sync+85a ()
ff0010246aa0 zfs:dsl_sync_task_group_sync+eb ()
ff0010246b10 zfs:dsl_pool_sync+196 ()
ff0010246ba0 zfs:spa_sync+32a ()
ff0010246c40 zfs:txg_sync_thread+265 ()
ff0010246c50 unix:thread_start+8 ()
 
  We really need to import this pool. Is there a way around this? We do
  have snv_114 source on the system if we need to make changes to
  usr/src/uts/common/fs/zfs/dsl_dataset.c. It seems like the zfs
  destroy transaction never completed and it is being replayed, causing
  the panic. This cycle continues endlessly.
 

 
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 
 -- 
 albert chin (ch...@thewrittenword.com)
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 

-- 
albert chin (ch...@thewrittenword.com)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS user quota, userused updates?

2009-10-20 Thread Matthew Ahrens
The user/group used can be out of date by a few seconds, same as the used 
and referenced properties.  You can run sync(1M) to wait for these values 
to be updated.  However, that doesn't seem to be the problem you are 
encountering here.


Can you send me the output of:

zfs list zpool1/sd01_mail
zfs get all zpool1/sd01_mail
zfs userspace -t all zpool1/sd01_mail
ls -ls /export/sd01/mail
zdb -vvv zpool1/sd01_mail

--matt

Jorgen Lundman wrote:


Are there way to force ZFS to update, or refresh it in some way when the 
user quota/used value is not true to what is the case? Are there known 
way to make it out of sync that we should avoid?


SunOS x4500-11.unix 5.10 Generic_141445-09 i86pc i386 i86pc
(Solaris 10 10/09 u8)


zpool1/sd01_mail   223M  15.6T   222M  /export/sd01/mail


# zfs userspace zpool1/sd01_mail
TYPENAMEUSED  QUOTA
POSIX User  1029   54.0M   100M

# df -h .
Filesystem size   used  avail capacity  Mounted on
zpool1/sd01_mail16T   222M16T 1%/export/sd01/mail


# ls -lhn
total 19600
-rw---   1 1029 21001.7K Oct 20 12:03 
1256007793.V4700025I1770M252506.vmx06.unix:2,S
-rw---   1 1029 21001.7K Oct 20 12:04 
1256007873.V4700025I1772M63715.vmx06.unix:2,S
-rw---   1 1029 21001.6K Oct 20 12:05 
1256007926.V4700025I1773M949133.vmx06.unix:2,S
-rw---   1 1029 2100 76M Oct 20 12:23 
1256009005.V4700025I1791M762643.vmx06.unix:2,S
-rw---   1 1029 2100 54M Oct 20 12:36 
1256009769.V4700034I179eM739748.vmx05.unix:2,S

-rw--T   1 1029 21002.0M Oct 20 14:39 file

The 54M file appears to to accounted for, but the 76M is not. I recently 
added a 2M by chown to see if it was a local-disk, vs NFS problem. The 
previous had not updated for 2 hours.



# zfs get useru...@1029 zpool1/sd01_mail
NAME  PROPERTY   VALUE  SOURCE
zpool1/sd01_mail  useru...@1029  54.0M  local


Any suggestions would be most welcome,

Lund




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2009-10-20 Thread Bob Friesenhahn

On Tue, 20 Oct 2009, Robert Dupuy wrote:

I'm not here to promote the X-25E, however Sun does sell a rebadged 
X-25E in their own servers, and my particular salesman, spec'd both 
an X-25E based system, and an F20 based systemso they were 
clearly pitched against each other.


Sun salesmen don't always know what they are doing.  The F20 likely 
cost more than the rest of the system.  To date, no one here has 
posted any experience with the F20 so it must be assumed that to date 
it has only been used in the lab or under NDA.


People here dream of using it for the ZFS intent log but it is clear 
that this was not Sun's initial focus for the product.


As far as I'm assuming about Sun's .41ms benchmark methodology, 
really?  am I sir?  hardly.


Measuring and specifying access times for disk drives is much more 
straight-forward than for solid state devices.


I admit, I haven't posted here before, I registered precisely 
because google was showing this page as being a forum to discuss 
this card, and...just wanted to discuss it some.


This is a good place.  We just have to wait until some real world 
users get their hands on some F20s without any NDA in place so that we 
can hear about their experiences.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] group and user quotas - a temporary hack?

2009-10-20 Thread Matthew Ahrens

Alastair Neil wrote:



On Tue, Oct 20, 2009 at 12:12 PM, Matthew Ahrens matthew.ahr...@sun.com 
mailto:matthew.ahr...@sun.com wrote:


Alastair Neil wrote:

   However, the user or group quota is applied when a clone or a
   snapshot is created from a file system that has a user or
group quota.

applied to a clone I understand what that means, applied to a
snapshot - not so clear does it mean enforced on the original
dataset?


Since snapshots can not be modified, you're right that applying
user quotas to snapshots doesn't make any sense.  However, the
user/group quota/used is recorded in the snapshot, and the values
(including user quotas) are the values at the time the snapshot was
taken.  Contrast that with other properties (eg, quota, exec), where
the snapshot does not have its own value for those properties; it
just applies the filesystem's current value.

--matt


So users quotas are not charged for blocks in the snapshot?


That's correct.  Snapshot blocks don't contribute to the filesystem's space 
referenced / refquota / userused / userquota.


--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Slow reads with ZFS+NFS

2009-10-20 Thread Gary Gogick
Heya all,

I'm working on testing ZFS with NFS, and I could use some guidance - read
speeds are a bit less than I expected.

Over a gig-e line, we're seeing ~30 MB/s reads on average - doesn't seem to
matter if we're doing large numbers of small files or small numbers of large
files, the speed seems to top out there.  We've disabled pre-fetching, which
may be having some affect on read speads, but proved necessary due to severe
performance issues on database reads with it enabled.  (Reading from the DB
with pre-fetching enabled was taking 4-5 times as long than with it
disabled.)

Write speed seems to be fine.  Testing is showing ~95 MB/s, which seems
pretty decent considering there's been no real network tuning done.

The NFS server we're testing is a Sun x4500, configured with a storage pool
consisting of 20x 2-disk mirrors, using separate SSD for logging.  It's
running the latest version of Nexenta Core.  (We've also got a second x4500
in with a raidZ2 config, running OpenSolaris proper, showing the same issues
with reads.)

We're using NFS v4 via TCP, serving various Linux clients (the majority are
CentOS 5.3).  Connectivity is presently provided by a single gigabit
ethernet link; entirely conventional configuration (no jumbo frames/etc).

Our workload is pretty read heavy; we're serving both website assets and
databases via NFS.  The majority of files being served are small ( 1MB).
The databases are MySQL/InnoDB, with the data in separate zfs filesystems
with a record size of 16k.  The website assets/etc. are in zfs filesystems
with the default record size.  On the database server side of things, we've
disabled InnoDB's double write buffer.

I'm wondering if there's any other tuning that'd be a good idea for ZFS in
this situation, or if there's some NFS tuning that should be done when
dealing specifically with ZFS.  Any advice would be greatly appreciated.

Thanks,

-- 
--
Gary Gogick
senior systems administrator  |  workhabit,inc.

// email: g...@workhabit.com  |  web: http://www.workhabit.com
// office: 866-workhabit  | fax: 919-552-9690

--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Slow reads with ZFS+NFS

2009-10-20 Thread Richard Elling

cross-posting to nfs-discuss

On Oct 20, 2009, at 10:35 AM, Gary Gogick wrote:


Heya all,

I'm working on testing ZFS with NFS, and I could use some guidance -  
read speeds are a bit less than I expected.


Over a gig-e line, we're seeing ~30 MB/s reads on average - doesn't  
seem to matter if we're doing large numbers of small files or small  
numbers of large files, the speed seems to top out there.  We've  
disabled pre-fetching, which may be having some affect on read  
speads, but proved necessary due to severe performance issues on  
database reads with it enabled.  (Reading from the DB with pre- 
fetching enabled was taking 4-5 times as long than with it disabled.)


What is the performance when reading locally (eliminate NFS from the  
equation)?

 -- richard



Write speed seems to be fine.  Testing is showing ~95 MB/s, which  
seems pretty decent considering there's been no real network tuning  
done.


The NFS server we're testing is a Sun x4500, configured with a  
storage pool consisting of 20x 2-disk mirrors, using separate SSD  
for logging.  It's running the latest version of Nexenta Core.   
(We've also got a second x4500 in with a raidZ2 config, running  
OpenSolaris proper, showing the same issues with reads.)


We're using NFS v4 via TCP, serving various Linux clients (the  
majority are  CentOS 5.3).  Connectivity is presently provided by a  
single gigabit ethernet link; entirely conventional configuration  
(no jumbo frames/etc).


Our workload is pretty read heavy; we're serving both website assets  
and databases via NFS.  The majority of files being served are small  
( 1MB).  The databases are MySQL/InnoDB, with the data in separate  
zfs filesystems with a record size of 16k.  The website assets/etc.  
are in zfs filesystems with the default record size.  On the  
database server side of things, we've disabled InnoDB's double write  
buffer.


I'm wondering if there's any other tuning that'd be a good idea for  
ZFS in this situation, or if there's some NFS tuning that should be  
done when dealing specifically with ZFS.  Any advice would be  
greatly appreciated.


Thanks,

--
--
Gary Gogick
senior systems administrator  |  workhabit,inc.

// email: g...@workhabit.com  |  web: http://www.workhabit.com
// office: 866-workhabit  | fax: 919-552-9690

--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2009-10-20 Thread Matthias Appel

 People here dream of using it for the ZFS intent log but it is clear
 that this was not Sun's initial focus for the product.

At the moment I'm considering using a Gigabyte iRAM as ZIL device.
(see
http://cgi.ebay.com/Gigabyte-IRAM-I-Ram-GC-RAMDISK-SSD-4GB-PCI-card-SATA_W0Q
QitemZ120481489540QQcmdZViewItemQQptZPCC_Drives_Storage_Internal?hash=item1c
0d41a284)

As I am using 2x Gbit Ethernet an 4 Gig of RAM,
4 Gig of RAM for the iRAM should be more than sufficient (0.5 times RAM and
10s worth of IO)

I am aware that this RAM is non-ECC so I plan to mirror the ZIL device.

Any considerations for this setupWill it work as I expect it (speed up
sync. IO especially for NFS)?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS user quota, userused updates?

2009-10-20 Thread Tomas Ögren
On 20 October, 2009 - Matthew Ahrens sent me these 2,2K bytes:

 The user/group used can be out of date by a few seconds, same as the 
 used and referenced properties.  You can run sync(1M) to wait for 
 these values to be updated.  However, that doesn't seem to be the problem 
 you are encountering here.

 Can you send me the output of:

 zfs list zpool1/sd01_mail
 zfs get all zpool1/sd01_mail
 zfs userspace -t all zpool1/sd01_mail
 ls -ls /export/sd01/mail
 zdb -vvv zpool1/sd01_mail

On a related note, there is a way to still have quota used even after
all files are removed, S10u8/SPARC:

# zfs create rpool/quotatest
# zfs set userqu...@stric=5m rpool/quotatest
# zfs userspace -t all rpool/quotatest
TYPE NAME   USED  QUOTA
POSIX Group  root 3K   none
POSIX User   root 3K   none
POSIX User   stric 0 5M
# chmod a+rwt /rpool/quotatest

stric% cd /rpool/quotatest;tar jxvf /somewhere/gimp-2.2.10.tar.bz2
... wait and it will start getting Disc quota exceeded, might have to
help it by running 'sync' in another terminal
stric% sync
stric% rm -rf gimp-2.2.10
stric% sync
... now it's all empty.. but...

# zfs userspace -t all rpool/quotatest
TYPE NAME   USED  QUOTA
POSIX Group  root 3K   none
POSIX Group  tdb  3K   none
POSIX User   root 3K   none
POSIX User   stric3K 5M

Can be repeated for even more lost blocks, I seem to get between 3 and
5 kB each time. I tried this last night, and when I got back in the
morning, it had gone down to zero again. Haven't done any more verifying
than that.

It doesn't seem to trigger if I just write a big file with dd which gets
me into DQE, but unpacking a tarball seems to trigger it. My tests has
been as above.

Output from all of the above + zfs list, zfs get all, zfs userspace, ls
-l and zdb -vvv is at:
http://www.acc.umu.se/~stric/tmp/zfs-userquota.txt

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Adding another mirror to storage pool

2009-10-20 Thread Matthias Appel
Hi,

at the moment I am running a pool consisting of 4 vdefs (Seagate Enterprise
SATA disks) assmebled to 2 mirrors.

Now I want to add two more drives to extend the capacity to 1.5 times the
old capacity.

As these mirrors will be striped in the pool I want to know what will
happen to the existing data  oft he pool.

Will it stay at its location and only new data will be written to the new
mirror or will the existing data be spread over all 3 mirrors?

Will ther be a benefit, resulting in more IOPS/Bandwith or will there only
be more space?

(I hope I expressed my considerations understandable despite english not
beeing my mother tounge)


Regards,

Matthias



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zpool without any redundancy

2009-10-20 Thread Marion Hakanson
mmusa...@east.sun.com said:
 What benefit are you hoping zfs will provide in this situation?  Examine
 your situation carefully and determine what filesystem works best for you.
 There are many reasons to use ZFS, but if your configuration isn't set up  to
 take advantage of those reasons, then there's a disconnect somewhere. 

How about if your config can only take advantage of _some_ of those reasons
to use ZFS?  There are plenty of benefits to using ZFS on a single bare hard
drive, and those benefits apply to using it on an expensive SAN array.  It's
up to each individual to decide if adding redundancy is worthwhile or not.

I'm not saying ZFS is perfect.  And, ZFS is indeed better when it can make
use of redundancy.  But ZFS has lost data even with such redundancy, so
having it does not confer magical protection from all disasters.

Anyway, here's a note describing our experience with this situation:

We've been using ZFS here on two hardware RAID fiberchannel arrays, with
no ZFS-level redundancy, starting September-2006 -- roughly 6TB of data,
checksums enabled, weekly scrubs, regular tape backups.  So far there has
been not one checksum error detected on these arrays.  We've had dumb SAN
connectivity losses, complete power failures on arrays, FC switches, and/or
file servers, and so on, but no loss of data.

Before ZFS, we used a combination of SAM-QFS and UFS filesystems on the
same arrays, and ZFS has proved much easier to manage, reducing data loss
due to human errors in volume and space management.  The checksum feature
makes filesystems without it into second-class offerings, in my opinion.

Is anyone else tired of seeing the word redundancy? (:-)

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Adding another mirror to storage pool

2009-10-20 Thread Bob Friesenhahn

On Tue, 20 Oct 2009, Matthias Appel wrote:

As these mirrors will be striped in the pool I want to know what will
happen to the existing data  oft he pool.

Will it stay at its location and only new data will be written to the new
mirror or will the existing data be spread over all 3 mirrors?


The existing data will remain in its current location.  If the data is 
re-written, then it should be somewhat better distributed across the 
disks.



Will ther be a benefit, resulting in more IOPS/Bandwith or will there only
be more space?


You will see more IOPS/bandwith, but if your existing disks are very 
full, then more traffic may be sent to the new disks, which results in 
less benefit.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS user quota, userused updates?

2009-10-20 Thread Matthew Ahrens

Tomas Ögren wrote:

On a related note, there is a way to still have quota used even after
all files are removed, S10u8/SPARC:


In this case there are two directories that have not actually been removed. 
They have been removed from the namespace, but they are still open, eg due to 
some process's working directory being in them.  This is confirmed by your 
zdb output, there are 2 directories on the delete queue.  You can force it to 
be flushed by unmounting and re-mounting your filesystem.


--matt

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS user quota, userused updates?

2009-10-20 Thread Tomas Ögren
On 20 October, 2009 - Matthew Ahrens sent me these 0,7K bytes:

 Tomas Ögren wrote:
 On a related note, there is a way to still have quota used even after
 all files are removed, S10u8/SPARC:

 In this case there are two directories that have not actually been 
 removed. They have been removed from the namespace, but they are still 
 open, eg due to some process's working directory being in them.

Only a few processes in total were involved in this dir.. cd into the
fs, untar the tarball, remove it all, cd out, run sync. Quota usage
still remains.

 This is confirmed by your zdb output, there are 2 directories on the
 delete queue. You can force it to be flushed by unmounting and
 re-mounting your filesystem.

.. which isn't such a good workaround for a busy home directory server
which I will use this in shortly...

I have to say a big thank you for this userquota anyway, because I tried
the one fs per user way first, and it just didn't scale to our 3-4000
users, but I still want to use ZFS.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zpool without any redundancy

2009-10-20 Thread Matthias Appel

 Is anyone else tired of seeing the word redundancy? (:-)

Only in a perfect world (tm) ;-)

IMHO there is no such thing as too much redundancy.
In the real world the possibilities of redundancy are only limited by money,

be it online redundancy (mirror/RAIDZx,) offline redundancy (tape
backups/off site disk based backups)
or infrastructure redundancy (MPIO,..).


Regards,

Matthias

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zpool without any redundancy

2009-10-20 Thread Bob Friesenhahn

On Tue, 20 Oct 2009, Matthias Appel wrote:


IMHO there is no such thing as too much redundancy.
In the real world the possibilities of redundancy are only limited by money,


Redundancy costs in terms of both time and money.  Redundant hardware 
which fails or feels upset requires time to administer and repair. 
This is why there is indeed such a thing as too much redundancy.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Antwort: Re: Performance of ZFS and UFS inside local/global zone

2009-10-20 Thread André Bögelsack
Hi Casper,

I did that.

1. I created a directory and mounted the device in the global zone - run 
filebench
umount device
2. I created a directory and mounted the device in the local zone - run 
filebench
-- No difference

It seems the loop-back-driver causes the performance degredation - but how 
can I prove this thesis?

andré



Von:
casper@sun.com
An:
Andre Boegelsack boege...@in.tum.de
Kopie:
zfs-discuss@opensolaris.org
Datum:
20.10.2009 15:23
Betreff:
Re: [zfs-discuss] Performance of ZFS and UFS inside local/global zone
Gesendet von:
cas...@holland.sun.com





Very easy:

 - make a directory
 - mount it using lofs

run filebench on both directories.

It seems like that we need to make lofs faster.

Casper


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS user quota, userused updates?

2009-10-20 Thread Matthew Ahrens

Tomas Ögren wrote:

On 20 October, 2009 - Matthew Ahrens sent me these 0,7K bytes:


Tomas Ögren wrote:

On a related note, there is a way to still have quota used even after
all files are removed, S10u8/SPARC:
In this case there are two directories that have not actually been 
removed. They have been removed from the namespace, but they are still 
open, eg due to some process's working directory being in them.


Only a few processes in total were involved in this dir.. cd into the
fs, untar the tarball, remove it all, cd out, run sync. Quota usage
still remains.


This is confirmed by your zdb output, there are 2 directories on the
delete queue. You can force it to be flushed by unmounting and
re-mounting your filesystem.


.. which isn't such a good workaround for a busy home directory server
which I will use this in shortly...


Mark Shellenbaum provides some additional details, and a simpler workaround:

This is a well known problem with negative dnlc (Directory Name Lookup Cache) 
entries on the directory.  The problem affects both zfs and ufs, and is 
covered by bugs 6400251 and 6179228, which are being worked on.


You don't necessarily have to unmount the file system to get it to flush the 
dnlc and recover the space.  All you need to do is cd to the root directory 
of the file system and do a zfs umount dataset.  That will fail, but a side 
effect is that the vfs layer will have purged the dnlc for that file system.

That should cause the files to be deleted.

--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zpool without any redundancy

2009-10-20 Thread Matthias Appel


 Redundancy costs in terms of both time and money.  Redundant hardware
 which fails or feels upset requires time to administer and repair.
 This is why there is indeed such a thing as too much redundancy.


Yes that's true, but all I wanted to say is: If there is infinite of money
there can be infinite redundancy

All the things you mentioned can be accompished by having enough money.

Home users will end up with having a mirrored vdev in a server with non
ECC RAM (IMHO ECC is also a type of redundancy).

Multi billion enterprises have redundant datacenters with multiple storage
arrays (another kind of redundancy).

Missing time to administer the redundancy can also be compensatet with
enough mone (hire more people).

Redundancy is no rocket sience...there is enough knowledge out there to set
up another level of redundancy (I am not speaking of clustering, which can
indeed be limited).

There is not too-much-redundancy...only a reasonable amount for your type of
business needs.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Adding another mirror to storage pool

2009-10-20 Thread Matthias Appel
 You will see more IOPS/bandwith, but if your existing disks are very
 full, then more traffic may be sent to the new disks, which results in
 less benefit.


OK, that means, over time, data will be distributed across all mirrors?
(assuming all blocks will be written once)

I think a useful extension to ZFS would be a background task which
distributes all used blocks across all vdefs.

I don't know if this can be established with ZFS but it will end up with
reading one block of the pool after another assuming that by writing it, it
will be spread across the mirrors.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Adding another mirror to storage pool

2009-10-20 Thread Bruno Sousa

Hi,

Something like 
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6855425 ?


Bruno


Matthias Appel wrote:

You will see more IOPS/bandwith, but if your existing disks are very
full, then more traffic may be sent to the new disks, which results in
less benefit.




OK, that means, over time, data will be distributed across all mirrors?
(assuming all blocks will be written once)

I think a useful extension to ZFS would be a background task which
distributes all used blocks across all vdefs.

I don't know if this can be established with ZFS but it will end up with
reading one block of the pool after another assuming that by writing it, it
will be spread across the mirrors.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

  



--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2009-10-20 Thread Richard Elling

On Oct 20, 2009, at 8:23 AM, Robert Dupuy wrote:
A word of caution, be sure not to read a lot into the fact that the  
F20 is included in the Exadata Machine.


From what I've heard the flash_cache feature of 11.2.0 Oracle that  
was enabled in beta, is not working in the production release, for  
anyone except the Exadata 2.


The question is, why did they need to give this machine an unfair  
software advantage?  Is it because of the poor performance they  
found with the F20?


Oracle bought Sun, they have reason to make such moves.

I have been talking to a Sun rep for weeks now, trying to get the  
latency specs on this F20 card, with no luck in getting that  
revealed so far.


AFAICT, there is no consistent latency measurement in the
industry, yet. With magnetic disks, you can usually get some
sort of average values, which can be useful to the first order.
We do know that for most flash devices read latency is relatively
easy to measure, but write latency can vary by an order of magnitude,
depending on the SSD design and IOP size.  Ok, this is a fancy way
of saying YMMV, but in real life, YMMV.

However, you can look at Sun's other products like the F5100, which  
are very unimpressive and high latency.


I would not assume this Sun tech is in the same league as a Fusion- 
io ioDrive, or a Ramsan-10.  They would not confirm whether its a  
native PCIe solution, or if the reason it comes on a SAS card, is  
because it requires SAS.


So, test, test, test, and don't assume this card is competitive  
because it came out this year, I am not sure its even competitive  
with last years ioDrive.


+1

I told my sun reseller that I merely needed it to be faster than the  
Intel X25-E in terms of latency, and they weren't able to  
demonstrate that, at least so far...lots of feet dragging, and I can  
only assume they want to sell as much as they can, before the cards  
metrics become widely known.


I'd be surprised if anyone could answer such a question while
simultaneously being credible.  How many angels can dance on
the tip of a pin?  Square dance or ballet? :-)  FWIW, Brendan recently
blogged about measuring this at the NFS layer.
http://blogs.sun.com/brendan/entry/hybrid_storage_pool_top_speeds

I think where we stand today, the higher-level systems questions of
redundancy tend to work against builtin cards like the F20. These
sorts of cards have been available in one form or another for more
than 20 years, and yet they still have limited market share -- not
because they are fast, but because the other limitations carry more
weight. If the stars align and redundancy above the block layer gets
more popular, then we might see this sort of functionality implemented
directly on the mobo... at which point we can revisit the notion of file
system. Previous efforts to do this (eg Virident) haven't demonstrated
stellar market movement.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] moving files from one fs to another, splittin/merging

2009-10-20 Thread Mike Bo
Once data resides within a pool, there should be an efficient method of moving 
it from one ZFS file system to another. Think Link/Unlink vs. Copy/Remove.

Here's my scenario... When I originally created a 3TB pool, I didn't know the 
best way carve up the space, so I used a single, flat ZFS file system. Now that 
I'm more familiar with ZFS, managing the sub-directories as separate file 
systems would have made a lot more sense (seperate policies, snapshots, etc.). 
The problem is that some of these directories contain tens of thousands of 
files and many hundreds of gigabytes. Copying this much data between file 
systems within the same disk pool just seems wrong.

I hope such a feature is possible and not too difficult to implement, because 
I'd like to see this capability in ZFS.

Regards,
mikebo
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Slow reads with ZFS+NFS

2009-10-20 Thread Trevor Pretty




Gary

Where you measuring the Linux NFS write performance? It's well know
that Linux can use NFS in a very "unsafe" mode and report the write
complete when it is not all the way to safe storage. This is often
reported as Solaris has slow NFS write performance. This link does not
mention NFS v4 but you might want to check. http://nfs.sourceforge.net/

What's the write performance like between the two OpenSolaris systems?


Richard Elling wrote:

  cross-posting to nfs-discuss

On Oct 20, 2009, at 10:35 AM, Gary Gogick wrote:

  
  
Heya all,

I'm working on testing ZFS with NFS, and I could use some guidance -  
read speeds are a bit less than I expected.

Over a gig-e line, we're seeing ~30 MB/s reads on average - doesn't  
seem to matter if we're doing large numbers of small files or small  
numbers of large files, the speed seems to top out there.  We've  
disabled pre-fetching, which may be having some affect on read  
speads, but proved necessary due to severe performance issues on  
database reads with it enabled.  (Reading from the DB with pre- 
fetching enabled was taking 4-5 times as long than with it disabled.)

  
  
What is the performance when reading locally (eliminate NFS from the  
equation)?
  -- richard

  
  
Write speed seems to be fine.  Testing is showing ~95 MB/s, which  
seems pretty decent considering there's been no real network tuning  
done.

The NFS server we're testing is a Sun x4500, configured with a  
storage pool consisting of 20x 2-disk mirrors, using separate SSD  
for logging.  It's running the latest version of Nexenta Core.   
(We've also got a second x4500 in with a raidZ2 config, running  
OpenSolaris proper, showing the same issues with reads.)

We're using NFS v4 via TCP, serving various Linux clients (the  
majority are  CentOS 5.3).  Connectivity is presently provided by a  
single gigabit ethernet link; entirely conventional configuration  
(no jumbo frames/etc).

Our workload is pretty read heavy; we're serving both website assets  
and databases via NFS.  The majority of files being served are small  
( 1MB).  The databases are MySQL/InnoDB, with the data in separate  
zfs filesystems with a record size of 16k.  The website assets/etc.  
are in zfs filesystems with the default record size.  On the  
database server side of things, we've disabled InnoDB's double write  
buffer.

I'm wondering if there's any other tuning that'd be a good idea for  
ZFS in this situation, or if there's some NFS tuning that should be  
done when dealing specifically with ZFS.  Any advice would be  
greatly appreciated.

Thanks,

-- 
--
Gary Gogick
senior systems administrator  |  workhabit,inc.

// email: g...@workhabit.com  |  web: http://www.workhabit.com
// office: 866-workhabit  | fax: 919-552-9690

--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

  
  
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  




www.eagle.co.nz
This email is confidential and may be legally 
privileged. If received in error please destroy and immediately notify 
us.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2009-10-20 Thread Trevor Pretty




 
Richard Elling wrote:

  

I think where we stand today, the higher-level systems questions of
redundancy tend to work against builtin cards like the F20. These
sorts of cards have been available in one form or another for more
than 20 years, and yet they still have limited market share -- not
because they are fast, but because the other limitations carry more
weight. If the stars align and redundancy above the block layer gets
more popular, then we might see this sort of functionality implemented
directly on the mobo... at which point we can revisit the notion of file
system. Previous efforts to do this (eg Virident) haven't demonstrated
stellar market movement.
  -- richard
  

Richard

You mean presto-serve :-) Putting data on a local NVRAM in the sever
layer, was a bad idea 20 years ago for a lot of
applications. The reasons haven't changed in all those years!

For those who may not have been around in the "good old days" when 1 to
16 MB of NVRAM on an s-bus card was a good idea - or not
http://docs.sun.com/app/docs/doc/801-7289/6i1jv4t2s?a=view

Trevor

  
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  




www.eagle.co.nz
This email is confidential and may be legally 
privileged. If received in error please destroy and immediately notify 
us.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2009-10-20 Thread Robert Dupuy
there is no consistent latency measurement in the industry

You bring up an important point, as did another poster earlier in the thread, 
and certainly its an issue that needs to be addressed.

I'd be surprised if anyone could answer such a question while simultaneously 
being credible.

http://download.intel.com/design/flash/nand/extreme/extreme-sata-ssd-product-brief.pdf

Intel:  X-25E read latency 75 microseconds

http://www.sun.com/storage/disk_systems/sss/f5100/specs.xml

Sun:  F5100 read latency 410 microseconds

http://www.fusionio.com/PDFs/Data_Sheet_ioDrive_2.pdf

Fusion-IO:  read latency less than 50 microseconds

Fusion-IO lists theirs as .05ms


I find the latency measures to be useful.

I know it isn't perfect, and I agree benchmarks can be deceiving, heck I 
criticized one vendors benchmarks in this thread already :)

But, I did find, that for me, I just take a very simple, single thread, read as 
fast you can approach, and get the # of random access per second, as one type 
of measurement, that gives you some data, on the raw access ability of the 
drive.

No doubt in some cases, you want to test multithreaded IO too, but my 
application is very latency sensitive, so this initial test was telling.

As I got into the actual performance of my app, the lower latency drives, 
performed better than the higher latency drives...all of this was on SSD.

(I did not test the F5100 personally, I'm talking about the SSD drives that I 
did test).

So, yes, SSD and HDD are different, but latency is still important.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2009-10-20 Thread Matthias Appel
 So, yes, SSD and HDD are different, but latency is still important.


But on SSD, write performance is much more unpredictable than on HDD.

If you want to write to SSD you will have to erase the used blocks (assuming
this is not a brand-new SSD) before you are able to write to them.

This takes much time, assuming the drive's firmeware doesn't do this by
itself...but who can tell.

I replaced my notebooks internal HDD with an cheap SSD.

At first I was impressed but in the meantime writes are unpredictable (copy
times differ from 1h to 60 seconds on a
Single file).

If there is a difference to enterprise-grade SSDs, please let me know!



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Slow reads with ZFS+NFS

2009-10-20 Thread Ross Walker


But this is concerning reads not writes.

-Ross


On Oct 20, 2009, at 4:43 PM, Trevor Pretty trevor_pre...@eagle.co.nz  
wrote:



Gary

Where you measuring the Linux NFS write performance? It's well know  
that Linux can use NFS in a very unsafe mode and report the write  
complete when it is not all the way to safe storage. This is often  
reported as Solaris has slow NFS write performance. This link does  
not mention NFS v4 but you might want to check. http://nfs.sourceforge.net/


What's the write performance like between the two OpenSolaris systems?


Richard Elling wrote:


cross-posting to nfs-discuss

On Oct 20, 2009, at 10:35 AM, Gary Gogick wrote:



Heya all,

I'm working on testing ZFS with NFS, and I could use some guidance -
read speeds are a bit less than I expected.

Over a gig-e line, we're seeing ~30 MB/s reads on average - doesn't
seem to matter if we're doing large numbers of small files or small
numbers of large files, the speed seems to top out there.  We've
disabled pre-fetching, which may be having some affect on read
speads, but proved necessary due to severe performance issues on
database reads with it enabled.  (Reading from the DB with pre-
fetching enabled was taking 4-5 times as long than with it  
disabled.)




What is the performance when reading locally (eliminate NFS from the
equation)?
  -- richard



Write speed seems to be fine.  Testing is showing ~95 MB/s, which
seems pretty decent considering there's been no real network tuning
done.

The NFS server we're testing is a Sun x4500, configured with a
storage pool consisting of 20x 2-disk mirrors, using separate SSD
for logging.  It's running the latest version of Nexenta Core.
(We've also got a second x4500 in with a raidZ2 config, running
OpenSolaris proper, showing the same issues with reads.)

We're using NFS v4 via TCP, serving various Linux clients (the
majority are  CentOS 5.3).  Connectivity is presently provided by a
single gigabit ethernet link; entirely conventional configuration
(no jumbo frames/etc).

Our workload is pretty read heavy; we're serving both website assets
and databases via NFS.  The majority of files being served are small
( 1MB).  The databases are MySQL/InnoDB, with the data in separate
zfs filesystems with a record size of 16k.  The website assets/etc.
are in zfs filesystems with the default record size.  On the
database server side of things, we've disabled InnoDB's double write
buffer.

I'm wondering if there's any other tuning that'd be a good idea for
ZFS in this situation, or if there's some NFS tuning that should be
done when dealing specifically with ZFS.  Any advice would be
greatly appreciated.

Thanks,

--
--- 
--- 
--- 
--- 
--- 
--- 
--- 
--- 
--- 
--- 


Gary Gogick
senior systems administrator  |  workhabit,inc.

// email: g...@workhabit.com  |  web: http://www.workhabit.com
// office: 866-workhabit  | fax: 919-552-9690

--- 
--- 
--- 
--- 
--- 
--- 
--- 
--- 
--- 
--- 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


www.eagle.co.nz
This email is confidential and may be legally privileged. If  
received in error please destroy and immediately notify us.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Slow reads with ZFS+NFS

2009-10-20 Thread Trevor Pretty





No it concerns the difference between reads and writes. 

The write performance may be being over stated!







Ross Walker wrote:

  
But this is concerning reads not writes.
  
  
  -Ross
  
  
  
  
On Oct 20, 2009, at 4:43 PM, Trevor Pretty trevor_pre...@eagle.co.nz
wrote:
  
  
  
Gary

Where you measuring the Linux NFS write performance? It's well know
that Linux can use NFS in a very "unsafe" mode and report the write
complete when it is not all the way to safe storage. This is often
reported as Solaris has slow NFS write performance. This link does not
mention NFS v4 but you might want to check. http://nfs.sourceforge.net/

What's the write performance like between the two OpenSolaris systems?


Richard Elling wrote:

  cross-posting to nfs-discuss

On Oct 20, 2009, at 10:35 AM, Gary Gogick wrote:

  
  
Heya all,

I'm working on testing ZFS with NFS, and I could use some guidance -  
read speeds are a bit less than I expected.

Over a gig-e line, we're seeing ~30 MB/s reads on average - doesn't  
seem to matter if we're doing large numbers of small files or small  
numbers of large files, the speed seems to top out there.  We've  
disabled pre-fetching, which may be having some affect on read  
speads, but proved necessary due to severe performance issues on  
database reads with it enabled.  (Reading from the DB with pre- 
fetching enabled was taking 4-5 times as long than with it disabled.)

  
  
What is the performance when reading locally (eliminate NFS from the  
equation)?
  -- richard

  
  
Write speed seems to be fine.  Testing is showing ~95 MB/s, which  
seems pretty decent considering there's been no real network tuning  
done.

The NFS server we're testing is a Sun x4500, configured with a  
storage pool consisting of 20x 2-disk mirrors, using separate SSD  
for logging.  It's running the latest version of Nexenta Core.   
(We've also got a second x4500 in with a raidZ2 config, running  
OpenSolaris proper, showing the same issues with reads.)

We're using NFS v4 via TCP, serving various Linux clients (the  
majority are  CentOS 5.3).  Connectivity is presently provided by a  
single gigabit ethernet link; entirely conventional configuration  
(no jumbo frames/etc).

Our workload is pretty read heavy; we're serving both website assets  
and databases via NFS.  The majority of files being served are small  
( 1MB).  The databases are MySQL/InnoDB, with the data in separate  
zfs filesystems with a record size of 16k.  The website assets/etc.  
are in zfs filesystems with the default record size.  On the  
database server side of things, we've disabled InnoDB's double write  
buffer.

I'm wondering if there's any other tuning that'd be a good idea for  
ZFS in this situation, or if there's some NFS tuning that should be  
done when dealing specifically with ZFS.  Any advice would be  
greatly appreciated.

Thanks,

-- 
--
Gary Gogick
senior systems administrator  |  workhabit,inc.

// email: g...@workhabit.com  |  web: http://www.workhabit.com
// office: 866-workhabit  | fax: 919-552-9690

--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

  
  
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  



www.eagle.co.nz 

This email is confidential and may
be legally privileged. If received in error please destroy and
immediately notify us.

  
  
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

  




www.eagle.co.nz
This email is confidential and may be legally 
privileged. If received in error please destroy and immediately notify 
us.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Slow reads with ZFS+NFS

2009-10-20 Thread Brandon High
On Tue, Oct 20, 2009 at 10:35 AM, Gary Gogick g...@workhabit.com wrote:
 We're using NFS v4 via TCP, serving various Linux clients (the majority are
 CentOS 5.3).  Connectivity is presently provided by a single gigabit
 ethernet link; entirely conventional configuration (no jumbo frames/etc).

Linux's NFS v4 (especially the one in Centos 5.3, which is a little
older) is not a complete implementation. It might be worth seeing if
NFS v3 has better performance.

-B

-- 
Brandon High : bh...@freaks.com
If violence doesn't solve your problem, you're not using enough of it.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2009-10-20 Thread Andrew Gabriel

Matthias Appel wrote:

But on SSD, write performance is much more unpredictable than on HDD.

If you want to write to SSD you will have to erase the used blocks (assuming
this is not a brand-new SSD) before you are able to write to them.

This takes much time, assuming the drive's firmeware doesn't do this by
itself...but who can tell.

I replaced my notebooks internal HDD with an cheap SSD.

At first I was impressed but in the meantime writes are unpredictable (copy
times differ from 1h to 60 seconds on a
Single file).

If there is a difference to enterprise-grade SSDs, please let me know!


I haven't seen that on the X25-E disks I hammer as part of the demos on 
the Turbocharge Your Apps discovery days I run.


--
Andrew
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Adding another mirror to storage pool

2009-10-20 Thread Matthias Appel
 

 

Von: Bruno Sousa [mailto:bso...@epinfante.com] 
Gesendet: Dienstag, 20. Oktober 2009 22:20
An: Matthias Appel
Cc: zfs-discuss@opensolaris.org
Betreff: Re: [zfs-discuss] Adding another mirror to storage pool

 

Hi,

Something like
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6855425 ?

Bruno

 

 

 

Yes,  thanks for mentioning that.For my disadvantage I must confess I did
not comb through the bug database.

 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2009-10-20 Thread Richard Elling

On Oct 20, 2009, at 1:58 PM, Robert Dupuy wrote:


there is no consistent latency measurement in the industry

You bring up an important point, as did another poster earlier in  
the thread, and certainly its an issue that needs to be addressed.


I'd be surprised if anyone could answer such a question while  
simultaneously being credible.


http://download.intel.com/design/flash/nand/extreme/extreme-sata-ssd-product-brief.pdf

Intel:  X-25E read latency 75 microseconds


... but they don't say where it was measured or how big it was...


http://www.sun.com/storage/disk_systems/sss/f5100/specs.xml

Sun:  F5100 read latency 410 microseconds


... for 1M transfers... I have no idea what the units are, though...  
bytes?



http://www.fusionio.com/PDFs/Data_Sheet_ioDrive_2.pdf

Fusion-IO:  read latency less than 50 microseconds

Fusion-IO lists theirs as .05ms


...at the same time they quote 119,790 IOPS @ 4KB.  By my calculator,
that is 8.3 microseconds per IOP, so clearly the latency itself doesn't
have a direct impact on IOPs.


I find the latency measures to be useful.


Yes, but since we are seeing benchmarks showing 1.6 MIOPS (mega-IOPS :-)
on a system which claims 410 microseconds of latency, it really isn't
clear to me how to apply the numbers to capacity planning. To wit, there
is some limit to the number of concurrent IOPS that can be processed per
device, so do I need more devices, faster devices, or devices which can
handle more concurrent IOPS?

I know it isn't perfect, and I agree benchmarks can be deceiving,  
heck I criticized one vendors benchmarks in this thread already :)


But, I did find, that for me, I just take a very simple, single  
thread, read as fast you can approach, and get the # of random  
access per second, as one type of measurement, that gives you some  
data, on the raw access ability of the drive.


No doubt in some cases, you want to test multithreaded IO too, but  
my application is very latency sensitive, so this initial test was  
telling.


cool.

As I got into the actual performance of my app, the lower latency  
drives, performed better than the higher latency drives...all of  
this was on SSD.


Note: the F5100 has SAS expanders which add latency.
 -- richard

(I did not test the F5100 personally, I'm talking about the SSD  
drives that I did test).


So, yes, SSD and HDD are different, but latency is still important.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zpool without any redundancy

2009-10-20 Thread Marion Hakanson
I wrote:
 Is anyone else tired of seeing the word redundancy? (:-)

matthias.ap...@lanlabor.com said:
 Only in a perfect world (tm) ;-)
 IMHO there is no such thing as too much redundancy. In the real world the
 possibilities of redundancy are only limited by money, 

Sigh.  I was just joking about how many times the word showed up in
all of our postings.

http://www.imdb.com/title/tt1436296/

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Setting up an SSD ZIL - Need A Reality Check

2009-10-20 Thread Scott Meilicke
I have an Intel X25-E 32G in the mail (actually the kingston version), and 
wanted to get a sanity check before I start.

System:
Dell 2950
16G RAM
16 1.5T SATA disks in a SAS chassis hanging off of an LSI 3801e, no extra drive 
slots, a single zpool.
svn_124, but with my zpool still running at the 2009.06 version (14).

I will likely get another chassis and 16 disks for another pool in the 3-18 
month time frame.

My plan is to put the SSD into an open disk slot on the 2950, but will have to 
configure it as a RAID 0, since the onboard PERC5 controller does not have a 
JBOD mode.

Options I am considering:

A. Use all 32G for the ZIL
B. Use 8G for the ZIL, 24G for an L2ARC. Any issues with slicing up an SSD like 
this?
C. Use 8G for the ZIL, 16G for an L2ARC, and reserve 8G to be used as a ZIL for 
the future zpool.

Since my future zpool would just be used as a backup to disk target, I am 
leaning towards option C. Any gotchas I should be aware of?  

Thanks,
Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Adding another mirror to storage pool

2009-10-20 Thread Bob Friesenhahn

On Tue, 20 Oct 2009, Matthias Appel wrote:


OK, that means, over time, data will be distributed across all mirrors?
(assuming all blocks will be written once)


Yes, but it is quite rare for all files to be re-written.  If you have 
reliable storage somewhere else, you could send your existing pool to 
it, and then re-create your pool from scratch.


ZFS's existing limitations are a good reason to initially 
over-provision the pool and not wait until the pool is close to full 
before adding more disks.  Regardless, the only real loss is the boost 
to available IOPS if all disks can be used to store new data.



I think a useful extension to ZFS would be a background task which
distributes all used blocks across all vdefs.


Yes.  That would be a useful option.  This could be combined with a 
file optimizer which attempts to re-layout large files for most 
efficient access.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up an SSD ZIL - Need A Reality Check

2009-10-20 Thread Bob Friesenhahn

On Tue, 20 Oct 2009, Scott Meilicke wrote:


A. Use all 32G for the ZIL
B. Use 8G for the ZIL, 24G for an L2ARC. Any issues with slicing up an SSD like 
this?
C. Use 8G for the ZIL, 16G for an L2ARC, and reserve 8G to be used as a ZIL for 
the future zpool.

Since my future zpool would just be used as a backup to disk target, 
I am leaning towards option C. Any gotchas I should be aware of?


Option A seems better to me.  The reason why it seems better is that 
any write to the device consumes write IOPS and the X25-E does not 
really have that many to go around.  FLASH SSDs don't really handle 
writes all that well due to the need to erase larger blocks than are 
actually written.  Contention for access will simply make matters 
worse.  With its write cache disabled (which you should do since the 
X25-E's write cache is volatile), the X25-E has been found to offer a 
bit more than 1000 write IOPS.  With 16GB of RAM, you should not need 
a L2ARC for a backup to disk target (a write-mostly application). 
The ZFS ARC will be able to expand to 14GB or so, which is quite a lot 
of read caching already.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up an SSD ZIL - Need A Reality Check

2009-10-20 Thread Richard Elling

On Oct 20, 2009, at 4:44 PM, Bob Friesenhahn wrote:


On Tue, 20 Oct 2009, Scott Meilicke wrote:


A. Use all 32G for the ZIL
B. Use 8G for the ZIL, 24G for an L2ARC. Any issues with slicing up  
an SSD like this?
C. Use 8G for the ZIL, 16G for an L2ARC, and reserve 8G to be used  
as a ZIL for the future zpool.


Since my future zpool would just be used as a backup to disk  
target, I am leaning towards option C. Any gotchas I should be  
aware of?


Option A seems better to me.  The reason why it seems better is  
that any write to the device consumes write IOPS and the X25-E does  
not really have that many to go around.  FLASH SSDs don't really  
handle writes all that well due to the need to erase larger blocks  
than are actually written.  Contention for access will simply make  
matters worse.  With its write cache disabled (which you should do  
since the X25-E's write cache is volatile), the X25-E has been found  
to offer a bit more than 1000 write IOPS.  With 16GB of RAM, you  
should not need a L2ARC for a backup to disk target (a write-mostly  
application). The ZFS ARC will be able to expand to 14GB or so,  
which is quite a lot of read caching already.


The ZIL device will never require more space than RAM.
In other words, if you only have 16 GB of RAM, you won't need
more than that for the separate log.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Slow reads with ZFS+NFS

2009-10-20 Thread Ross Walker


On Oct 20, 2009, at 5:28 PM, Trevor Pretty trevor_pre...@eagle.co.nz  
wrote:




No it concerns the difference between reads and writes.

The write performance may be being over stated!



The clients are Linux, the server is Solaris.

True the mounts on the Linux clients were async, but so are typically  
the mounts on Solaris clients.


The OP was measuring the page cache performance of the client more  
then the actual disk io.


If the Linux client runs an app that does fsync() on the io on an  
async mount then the io will be synchronous.


You are thinking of the Linux NFS server export option 'async' which  
is unsafe.


-Ross




Ross Walker wrote:



But this is concerning reads not writes.

-Ross


On Oct 20, 2009, at 4:43 PM, Trevor Pretty  
trevor_pre...@eagle.co.nz wrote:



Gary

Where you measuring the Linux NFS write performance? It's well  
know that Linux can use NFS in a very unsafe mode and report the  
write complete when it is not all the way to safe storage. This is  
often reported as Solaris has slow NFS write performance. This  
link does not mention NFS v4 but you might want to check. http://nfs.sourceforge.net/


What's the write performance like between the two OpenSolaris  
systems?



Richard Elling wrote:


cross-posting to nfs-discuss

On Oct 20, 2009, at 10:35 AM, Gary Gogick wrote:



Heya all,

I'm working on testing ZFS with NFS, and I could use some  
guidance -

read speeds are a bit less than I expected.

Over a gig-e line, we're seeing ~30 MB/s reads on average -  
doesn't
seem to matter if we're doing large numbers of small files or  
small

numbers of large files, the speed seems to top out there.  We've
disabled pre-fetching, which may be having some affect on read
speads, but proved necessary due to severe performance issues on
database reads with it enabled.  (Reading from the DB with pre-
fetching enabled was taking 4-5 times as long than with it  
disabled.)




What is the performance when reading locally (eliminate NFS from  
the

equation)?
  -- richard



Write speed seems to be fine.  Testing is showing ~95 MB/s, which
seems pretty decent considering there's been no real network  
tuning

done.

The NFS server we're testing is a Sun x4500, configured with a
storage pool consisting of 20x 2-disk mirrors, using separate SSD
for logging.  It's running the latest version of Nexenta Core.
(We've also got a second x4500 in with a raidZ2 config, running
OpenSolaris proper, showing the same issues with reads.)

We're using NFS v4 via TCP, serving various Linux clients (the
majority are  CentOS 5.3).  Connectivity is presently provided  
by a

single gigabit ethernet link; entirely conventional configuration
(no jumbo frames/etc).

Our workload is pretty read heavy; we're serving both website  
assets
and databases via NFS.  The majority of files being served are  
small
( 1MB).  The databases are MySQL/InnoDB, with the data in  
separate
zfs filesystems with a record size of 16k.  The website assets/ 
etc.

are in zfs filesystems with the default record size.  On the
database server side of things, we've disabled InnoDB's double  
write

buffer.

I'm wondering if there's any other tuning that'd be a good idea  
for
ZFS in this situation, or if there's some NFS tuning that should  
be

done when dealing specifically with ZFS.  Any advice would be
greatly appreciated.

Thanks,

--
--- 
--- 
--- 
--- 
--- 
--- 
--- 
--- 
--- 
--- 
--- 
-

Gary Gogick
senior systems administrator  |  workhabit,inc.

// email: g...@workhabit.com  |  web: http://www.workhabit.com
// office: 866-workhabit  | fax: 919-552-9690

--- 
--- 
--- 
--- 
--- 
--- 
--- 
--- 
--- 
--- 
--- 
-

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


www.eagle.co.nz
This email is confidential and may be legally privileged. If  
received in error please destroy and immediately notify us.





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



www.eagle.co.nz
This email is confidential and may be legally privileged. If  
received in error please destroy and immediately notify us.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up an SSD ZIL - Need A Reality Check

2009-10-20 Thread Bob Friesenhahn

On Tue, 20 Oct 2009, Richard Elling wrote:


The ZIL device will never require more space than RAM.
In other words, if you only have 16 GB of RAM, you won't need
more than that for the separate log.


Does the wasted storage space annoy you? :-)

What happens if the machine is upgraded to 32GB of RAM later?

The write performace of the X25-E is likely to be the bottleneck for a 
write-mostly storage server if the storage server has excellent 
network connectivity.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2009-10-20 Thread Tim Cook
On Tue, Oct 20, 2009 at 3:58 PM, Robert Dupuy rdu...@umpublishing.orgwrote:

 there is no consistent latency measurement in the industry

 You bring up an important point, as did another poster earlier in the
 thread, and certainly its an issue that needs to be addressed.

 I'd be surprised if anyone could answer such a question while
 simultaneously being credible.


 http://download.intel.com/design/flash/nand/extreme/extreme-sata-ssd-product-brief.pdf

 Intel:  X-25E read latency 75 microseconds

 http://www.sun.com/storage/disk_systems/sss/f5100/specs.xml

 Sun:  F5100 read latency 410 microseconds

 http://www.fusionio.com/PDFs/Data_Sheet_ioDrive_2.pdf

 Fusion-IO:  read latency less than 50 microseconds

 Fusion-IO lists theirs as .05ms


 I find the latency measures to be useful.

 I know it isn't perfect, and I agree benchmarks can be deceiving, heck I
 criticized one vendors benchmarks in this thread already :)

 But, I did find, that for me, I just take a very simple, single thread,
 read as fast you can approach, and get the # of random access per second, as
 one type of measurement, that gives you some data, on the raw access ability
 of the drive.

 No doubt in some cases, you want to test multithreaded IO too, but my
 application is very latency sensitive, so this initial test was telling.

 As I got into the actual performance of my app, the lower latency drives,
 performed better than the higher latency drives...all of this was on SSD.

 (I did not test the F5100 personally, I'm talking about the SSD drives that
 I did test).

 So, yes, SSD and HDD are different, but latency is still important.



Timeout, rewind, etc.  What workload do you have that 410microsecond latency
is detrimental?  More to the point, what workload do you have that you'd
rather have 5microsecond latency with 1/10th the IOPS?  Whatever it is,
I've never run across such a workload in the real world.  It sounds like
you're comparing paper numbers for the sake of comparison, rather than to
solve a real-world problem...

BTW, latency does not give you # of random access per second.
5microsecond latency for one access != # of random access per second, sorry.
--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Slow reads with ZFS+NFS

2009-10-20 Thread Gary Gogick
Trevor/all,

We've been timing the copying of actual data (1GB of assorted files,
generally  1MB with numerous larger files thrown in) in an attempt to
simulate real world use.  We've been copying different sets of data around
to try and avoid anything being cached anywhere.

I don't recall the specific numbers, but local reading/writing on the x4500
was definitely well over what can be theoretically pushed through a gig-e
line; so I'm pretty convinced the problem is either with the ZFS+NFS combo
or NFS, rather than with ZFS alone.

I'll do some OpenSolaris - OpenSolaris testing tonight and see what
happens.

Thanks for the replies, appreciate the help!



On Tue, Oct 20, 2009 at 1:43 PM, Trevor Pretty trevor_pre...@eagle.co.nzwrote:

  Gary

 Where you measuring the Linux NFS write performance? It's well know that
 Linux can use NFS in a very unsafe mode and report the write complete when
 it is not all the way to safe storage. This is often reported as Solaris has
 slow NFS write performance. This link does not mention NFS v4 but you might
 want to check. http://nfs.sourceforge.net/

 What's the write performance like between the two OpenSolaris systems?


 Richard Elling wrote:

 cross-posting to nfs-discuss

 On Oct 20, 2009, at 10:35 AM, Gary Gogick wrote:



  Heya all,

 I'm working on testing ZFS with NFS, and I could use some guidance -
 read speeds are a bit less than I expected.

 Over a gig-e line, we're seeing ~30 MB/s reads on average - doesn't
 seem to matter if we're doing large numbers of small files or small
 numbers of large files, the speed seems to top out there.  We've
 disabled pre-fetching, which may be having some affect on read
 speads, but proved necessary due to severe performance issues on
 database reads with it enabled.  (Reading from the DB with pre-
 fetching enabled was taking 4-5 times as long than with it disabled.)


  What is the performance when reading locally (eliminate NFS from the
 equation)?
   -- richard



  Write speed seems to be fine.  Testing is showing ~95 MB/s, which
 seems pretty decent considering there's been no real network tuning
 done.

 The NFS server we're testing is a Sun x4500, configured with a
 storage pool consisting of 20x 2-disk mirrors, using separate SSD
 for logging.  It's running the latest version of Nexenta Core.
 (We've also got a second x4500 in with a raidZ2 config, running
 OpenSolaris proper, showing the same issues with reads.)

 We're using NFS v4 via TCP, serving various Linux clients (the
 majority are  CentOS 5.3).  Connectivity is presently provided by a
 single gigabit ethernet link; entirely conventional configuration
 (no jumbo frames/etc).

 Our workload is pretty read heavy; we're serving both website assets
 and databases via NFS.  The majority of files being served are small
 ( 1MB).  The databases are MySQL/InnoDB, with the data in separate
 zfs filesystems with a record size of 16k.  The website assets/etc.
 are in zfs filesystems with the default record size.  On the
 database server side of things, we've disabled InnoDB's double write
 buffer.

 I'm wondering if there's any other tuning that'd be a good idea for
 ZFS in this situation, or if there's some NFS tuning that should be
 done when dealing specifically with ZFS.  Any advice would be
 greatly appreciated.

 Thanks,

 --
 --
 Gary Gogick
 senior systems administrator  |  workhabit,inc.

 // email: g...@workhabit.com  |  web: http://www.workhabit.com
 // office: 866-workhabit  | fax: 919-552-9690

 --
 ___
 zfs-discuss mailing 
 listzfs-disc...@opensolaris.orghttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss

  ___
 zfs-discuss mailing 
 listzfs-disc...@opensolaris.orghttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss

  *

 *

 www.eagle.co.nz

 This email is confidential and may be legally privileged. If received in
 error please destroy and immediately notify us.




-- 
--
Gary Gogick
senior systems administrator  |  workhabit,inc.

// email: g...@workhabit.com  |  web: http://www.workhabit.com
// office: 866-workhabit  | fax: 919-552-9690

--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2009-10-20 Thread Bob Friesenhahn

On Tue, 20 Oct 2009, Richard Elling wrote:


Intel:  X-25E read latency 75 microseconds


... but they don't say where it was measured or how big it was...


Probably measured using a logic analyzer and measuring the time from 
the last bit of the request going in, to the first bit of the response 
coming out.  It is not clear if this latency is a minimum, maximum, 
median, or average.  It is not clear if this latency is while the 
device is under some level of load, or if it is in a quiescent state.


This is one of the skimpiest specification sheets that I have ever 
seen for an enterprise product.



Sun:  F5100 read latency 410 microseconds


... for 1M transfers... I have no idea what the units are, though... bytes?


Sun's testing is likely done while attached to a system and done with 
some standard loading factor rather than while in a quiescent state.



...at the same time they quote 119,790 IOPS @ 4KB.  By my calculator,
that is 8.3 microseconds per IOP, so clearly the latency itself doesn't
have a direct impact on IOPs.


I would be interested to know how many IOPS an OS like Solaris is able 
to push through a single device interface.  The normal driver stack is 
likely limited as to how many IOPS it can sustain for a given LUN 
since the driver stack is optimized for high latency devices like disk 
drives.  If you are creating a driver stack, the design decisions you 
make when requests will be satisfied in about 12ms would be much 
different than if requests are satisfied in 50us.  Limitations of 
existing software stacks are likely reasons why Sun is designing 
hardware with more device interfaces and more independent devices.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send -R slow

2009-10-20 Thread nathulal babulal
bjquinn - on article - 
http://www.opensolaris.org/jive/thread.jspa?threadID=89567 i would like to 
contact you.

i am new to zfs and exactly need what you mentioned your requirements were and 
that you figured out a solution for it.

would you like to share the solution step by step with me. please contact me at 
nathulal [at] babulal dto com
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send -R slow

2009-10-20 Thread nathulal babulal
bjquinn - on article - 
http://www.opensolaris.org/jive/thread.jspa?threadID=89567 i would like to 
contact you.

i am new to zfs and exactly need what you mentioned your requirements were and 
that you figured out a solution for it.

would you like to share the solution step by step with me. please contact me at 
nathulal [at] babulal dto com
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up an SSD ZIL - Need A Reality Check

2009-10-20 Thread Edward Ned Harvey
 System:
 Dell 2950
 16G RAM
 16 1.5T SATA disks in a SAS chassis hanging off of an LSI 3801e, no
 extra drive slots, a single zpool.
 svn_124, but with my zpool still running at the 2009.06 version (14).
 
 My plan is to put the SSD into an open disk slot on the 2950, but will
 have to configure it as a RAID 0, since the onboard PERC5 controller
 does not have a JBOD mode.

You can JBOD with the perc.  It might be technically a raid0 or raid1 with a
single disk in it, but that would be functionally equivalent to JBOD.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up an SSD ZIL - Need A Reality Check

2009-10-20 Thread Frédéric VANNIERE
The ZIL is a write-only log that is only read after a power failure. Several GB 
is large enough for most workloads. 

You can't use the Intel X25-E because it has a 32 or 64 MB volatile cache that 
can't be disabled neither flushed by ZFS. 

Imagine your server has a power failure while writing data to the pool. In 
normal situation, with ZIL on a reliable device, ZFS will read the ZIL and come 
back to a stable state at reboot. You may have lost some data (30 seconds) but 
the zpool works.   With the Intel X25-E as ZIL some log data has been lost with 
the power failure (32/64MB max) which lead to a corrupted log and so ... you 
loose your zpool and all your data !!

For the ZIL you need 2 reliable mirrored SSD devices with a supercapacitor that 
can flush the write cache to NAND when a power failure occurs. 

A hard-disk has a write cache but it can be disabled or flush by the operating 
system.

For more informations : 
http://www.c0t0d0s0.org/archives/5993-Somewhat-stable-Solid-State.html
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss