Re: [zfs-discuss] Slow file system access on zfs

2007-11-08 Thread Adrian Immler
how is the performance on the zfs directly without nfs?
i have experienced big problems running nfs on large volumes (independent on 
the underlaying fs)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Yager on ZFS

2007-11-08 Thread Louwtjie Burger
On 11/8/07, Mark Ashley [EMAIL PROTECTED] wrote:
 Economics for one.

Yep, for sure ... it was a rhetoric question ;)

  Why would I consider a new solution that is safe, fast enough, stable
  .. easier to manage and lots cheaper?

Rephrase, Why would I NOT consider ...? :)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Slow file system access on zfs

2007-11-08 Thread Łukasz K


Dnia 8-11-2007 o godz. 7:58 Walter Faleiro napisał(a):
Hi Lukasz,
The output of the first sript gives 
bash-3.00# ./test.sh 
dtrace: script './test.sh' matched 4 probes
CPU
ID
FUNCTION:NAME
 0
42681
:tick-10s 

 0
42681
:tick-10s 

 0
42681
:tick-10s 

 0
42681
:tick-10s 

 0
42681
:tick-10s 

 0
42681
:tick-10s 

 0
42681
:tick-10s 



and it goes on.It means that you have free blocks :) , or you do not have any I/O writes.run:#zpool iostat 1 and #iostat -zxc 1

The second script gives:

checking pool map size [B]: filer
mdb: failed to dereference symbol: unknown symbol name
423917216903435
Which Solaris version do you use ?Maybe you should patch kernel.Also you can check if there are problems with zfs sync phase.Run #dtrace -n fbt::txg_wait_open:entry'{ stack(); ustack(); }'and wait 10 minutesalso give more information about pool#zfs get all filerI assume 'filer' is you pool name.RegardsLukasOn 11/7/07, Łukasz K [EMAIL PROTECTED] wrote:
Hi,I think your problem is filesystem fragmentation.When available space is less than 40% ZFS might have problems withfinding free blocks. Use this script to check it:#!/usr/sbin/dtrace -s
fbt::space_map_alloc:entry{ self-s = arg1;}fbt::space_map_alloc:return/arg1 != -1/{self-s = 0;}fbt::space_map_alloc:return/self-s  (arg1 == -1)/
{@s = quantize(self-s);self-s = 0;}tick-10s{printa(@s);}Run script for few minutes.You might also have problems with space map size.This script will show you size of space map on disk:
#!/bin/shecho '::spa' | mdb -k | grep ACTIVE \| while read pool_ptr state pool_namedoecho "checking pool map size [B]: $pool_name"echo "${pool_ptr}::walk metaslab|::print -d struct metaslab
ms_smo.smo_objsize" \| mdb -k \| nawk '{sub("^0t","",$3);sum+=$3}END{print sum}'doneIn memory space map takes 5 times more.All space map is loaded into memory all the time, but for example
during snapshot remove all space map might be loaded, so checkif you have enough RAM available on machine.Check ::kmastat in mdb.Space map uses kmem_alloc_40( on thumpers this is a real problem )
Workaround:1. first you can change pool recordsizezfs set recordsize=64K POOLMaybe you wil have to use 32K or even 16K2. You will have to disable ZIL, becuase ZIL always takes 128kBblocks.
3. Try to disable cache, tune vdev cache. Check:http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_GuideLukas Karwacki
Dnia 7-11-2007 o godz. 1:49 Walter Faleiro napisał(a): Hi, We have a zfs file system configured using a Sunfire 280R with a 10T Raidweb array bash-3.00# zpool list
NAMESIZEUSED
AVAILCAPHEALTH
ALTROOT
filer
9.44T 6.97T
2.47T73%ONLINE
- bash-3.00# zpool status pool: backupstate: ONLINEscrub: none requested config:
NAMESTATE
READ WRITE CKSUM
filerONLINE
0 0 0
c1t2d1ONLINE
0 0 0
c1t2d2ONLINE
0 0 0
c1t2d3ONLINE
0 0 0
c1t2d4ONLINE
0 0 0
c1t2d5ONLINE
0 0 0 the file system is shared via nfs. Off late we have seen that the file system access slows down considerably. Running commands like find, du on the zfs system did slow it down, but the intermittent slowdowns
 cannot be explained. Is there a way to trace the I/O on the zfs so that we can list out heavy read/writes to the file system to be responsible for the slowness. Thanks,
 --Walter ___ zfs-discuss mailing list zfs-discuss@opensolaris.org 
http://mail.opensolaris.org/mailman/listinfo/zfs-discussWojna z terrorem wkracza w decydującą fazę:Robert Redford, Meryl Streep i Tom Cruise w filmie
UKRYTA STRATEGIA - w kinach od 9 listopada!http://klik.wp.pl/?adr=http%3A%2F%2Fcorto.www.wp.pl%2Fas%2Fstrategia.htmlsid=90

Wojna z terrorem wkracza w decydującą fazę:Robert Redford, Meryl Streep i Tom Cruise w filmieUKRYTA STRATEGIA - w kinach od 9 listopada!http://klik.wp.pl/?adr=http://corto.www.wp.pl/as/strategia.html=90


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] 3rd posting: ZFS question (case 65730249)

2007-11-08 Thread Dave Bevans

Does anyone have any thoughts on this?

Hi,

I have a customer with the following questions...



*Describe the problem:*
A ZFS Question -  I have one ZFS pool which is made from 2 storage 
arrays (vdevs).  I have to delete  the zfs filesystems with the names of 
/orbits/araid/* and remove one of the arrays from the system.  After I 
delete this data the remaining data easily fits on one array.  The 
question's are:


Can I remove one of the vdev's  from the orbits pool without having to 
unload/rebuild the remaining data in the orbits/myear filesystem? 

Does ZFS know to move any current data from a vdev that is being 
removed from a pool to the remaining devices?



Hardware Platform: Sun Fire V40z
Component Affected: OS File System
OS and Kernel Version: [Please copy and paste output from uname -a]
[EMAIL PROTECTED]:~]uname -a
SunOS hemi 5.10 Generic_118855-36 i86pc i386 i86pc

[EMAIL PROTECTED]:~]zpool list
NAMESIZEUSED   AVAILCAP  HEALTH ALTROOT
orbits 3.17T   2.97T206G93%  ONLINE -

[EMAIL PROTECTED]:~]zpool status
 pool: orbits
state: ONLINE
scrub: none requested
config:

   NAME STATE READ WRITE CKSUM
   orbits   ONLINE   0 0 0
 c3t600C0FF0092BC64980F53900d0  ONLINE   0 0 0
 c3t600C0FF0092B663929C88800d0  ONLINE   0 0 0

errors: No known data errors

[EMAIL PROTECTED]:~]zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
orbits2.97T   155G  27.5K  /orbits
orbits/araid  2.33T   155G  33.5K  /orbits/araid
orbits/araid/cors 22.9G   155G  22.9G  /export/home/cors
orbits/araid/rinex1550G   155G   550G  /rinex1
orbits/araid/rinex2385G   155G   385G  /rinex2
orbits/araid/rinex3503G   155G   503G  /rinex3
orbits/araid/rinex4506G   155G   506G  /rinex4
orbits/araid/rinex5419G   155G   419G  /rinex5
orbits/araid/tst_gnssrnx  24.5K   155G  24.5K  none
orbits/araid/ulc   432M   155G   432M  /orbits/araid/ulc
orbits/myear   656G   155G   656G  /orbits/myear


Regards,
Dave
--



Sun Microsystems
Mailstop ubur04-206
1 Network Drive
Burlington, MA  01803

*Dave Bevans - Technical Support Engineer*
*Phone: 1-800-USA-4SUN (800-872-4786)
(opt-2), (case #) (press 0 for the next available engineer)
* *Email:   david.bevans mailto:[EMAIL PROTECTED]@Sun.com 
mailto:[EMAIL PROTECTED]

TSC Systems Group-OS / Hours: 6AM - 2PM EST / M - F
*
Submit, Check  Update Cases at the Online Support Center 
http://www.sun.com/service/online




This email may contain confidential and privileged material for the sole 
use of the intended recipient. Any review or distribution by others is 
strictly prohibited. If you are not the intended recipient please 
contact the sender and delete all copies.


DAYLIGHT SAVINGS TIME The U.S. Energy Policy Act of 2005 mandates that 
Daylight Saving Time (DST) in the United States of America start on the 
second Sunday in March and end on the first Sunday in November starting 
in 2007. To see how your Sun System or Software may be affected, please 
visit http://www.sun.com/dst


--
Regards,
Dave
--



Sun Microsystems
Mailstop ubur04-206
1 Network Drive
Burlington, MA  01803

*Dave Bevans - Technical Support Engineer*
*Phone: 1-800-USA-4SUN (800-872-4786)
(opt-2), (case #) (press 0 for the next available engineer)
* *Email:   david.bevans mailto:[EMAIL PROTECTED]@Sun.com 
mailto:[EMAIL PROTECTED]

TSC Systems Group-OS / Hours: 6AM - 2PM EST / M - F
*
Submit, Check  Update Cases at the Online Support Center 
http://www.sun.com/service/online




This email may contain confidential and privileged material for the sole 
use of the intended recipient. Any review or distribution by others is 
strictly prohibited. If you are not the intended recipient please 
contact the sender and delete all copies.


DAYLIGHT SAVINGS TIME The U.S. Energy Policy Act of 2005 mandates that 
Daylight Saving Time (DST) in the United States of America start on the 
second Sunday in March and end on the first Sunday in November starting 
in 2007. To see how your Sun System or Software may be affected, please 
visit http://www.sun.com/dst


--
Regards,
Dave
--



Sun Microsystems
Mailstop ubur04-206
1 Network Drive
Burlington, MA  01803

*Dave Bevans - Technical Support Engineer*
*Phone: 1-800-USA-4SUN (800-872-4786)
(opt-2), (case #) (press 0 for the next available engineer)
* *Email:   david.bevans mailto:[EMAIL PROTECTED]@Sun.com 
mailto:[EMAIL PROTECTED]

TSC Systems Group-OS / Hours: 6AM - 2PM EST / M - F
*
Submit, Check  Update Cases at the Online Support Center 
http://www.sun.com/service/online




This email may contain confidential and privileged material for the sole 
use of the intended recipient. Any review or distribution by others is 
strictly prohibited. If you are not the intended 

Re: [zfs-discuss] Yager on ZFS

2007-11-08 Thread Adam Leventhal
On Wed, Nov 07, 2007 at 01:47:04PM -0800, can you guess? wrote:
 I do consider the RAID-Z design to be somewhat brain-damaged [...]

How so? In my opinion, it seems like a cure for the brain damage of RAID-5.

Adam

-- 
Adam Leventhal, FishWorkshttp://blogs.sun.com/ahl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Yager on ZFS

2007-11-08 Thread Mark Ashley
Economics for one.

We run a number of testing environments which mimic the production one. 
But we don't want to spend $750,000 on EMC storage each time when 
something costing $200,000 will do the job we need.

At the moment we have over 100TB on four SE6140s and we're very happy 
with the solution. ZFS is saving a lot of money for us because it 
enables solutions that weren't viable before.

 Hang on, you tell me I can pop in Solaris 10, slap in ZFS ... reduce
 most of my storage footprint to JBOD's ... (and all of this on a
 little old AMD system.).. You must be joking!

 Why would I consider a new solution that is safe, fast enough, stable
 .. easier to manage and lots cheaper? (That's my fanboy hat, please
 excuse)
   
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] X4500 device disconnect problem persists

2007-11-08 Thread Dan Poltawski
That is interesting, again we're having the same problem with our X4500s.

I am trying to work out what is causing the problem with NFS, restarting the 
service causes it to try and stop and not bring it back up. 

Rebooting the whole box fails and it just hangs till a hard reset..
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Yager on ZFS

2007-11-08 Thread can you guess?
 On 11/7/07, can you guess? [EMAIL PROTECTED]
 wrote:
   Monday, November 5, 2007, 4:42:14 AM, you wrote:
  
   cyg Having gotten a bit tired of the level of
 ZFS
   hype floating

...

 But I do believe that some of the hype is justified

Just to make it clear, so do I:  it's the *unjustified* hype that I've objected 
to (as my comments on the Yager article should have made clear).

I believe that ZFS will, for at least some installations and workloads and when 
it has achieved the requisite level of reliability (both actual and perceived), 
allow some people to replace the kind of expensive equipment that you describe 
with commodity gear - and make managing the installation easier in the process. 
 That, in my opinion, is its greatest strength; almost everything else is by 
comparison down in the noise level.

However, ZFS is not the *only* open-source approach which may allow that to 
happen, so the real question becomes just how it compares with equally 
inexpensive current and potential alternatives (and that would make for an 
interesting discussion that I'm not sure I have time to initiate tonight).

- bill
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Yager on ZFS

2007-11-08 Thread can you guess?
 On Wed, Nov 07, 2007 at 01:47:04PM -0800, can you
 guess? wrote:
  I do consider the RAID-Z design to be somewhat
 brain-damaged [...]
 
 How so? In my opinion, it seems like a cure for the
 brain damage of RAID-5.

Nope.

A decent RAID-5 hardware implementation has no 'write hole' to worry about, and 
one can make a software implementation similarly robust with some effort (e.g., 
by using a transaction log to protect the data-plus-parity double-update or by 
using COW mechanisms like ZFS's in a more intelligent manner).

The part of RAID-Z that's brain-damaged is its 
concurrent-small-to-medium-sized-access performance (at least up to request 
sizes equal to the largest block size that ZFS supports, and arguably somewhat 
beyond that):  while conventional RAID-5 can satisfy N+1 small-to-medium read 
accesses or (N+1)/2 small-to-medium write accesses in parallel (though the 
latter also take an extra rev to complete), RAID-Z can satisfy only one 
small-to-medium access request at a time (well, plus a smidge for read accesses 
if it doesn't verity the parity) - effectively providing RAID-3-style 
performance.

The easiest way to fix ZFS's deficiency in this area would probably be to map 
each group of N blocks in a file as a stripe with its own parity - which would 
have the added benefit of removing any need to handle parity groups at the disk 
level (this would, incidentally, not be a bad idea to use for mirroring as 
well, if my impression is correct that there's a remnant of LVM-style internal 
management there).  While this wouldn't allow use of parity RAID for very small 
files, in most installations they really don't occupy much space compared to 
that used by large files so this should not constitute a significant drawback.

- bill
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Yager on ZFS

2007-11-08 Thread can you guess?
  Au contraire:  I estimate its worth quite
 accurately from the undetected error rates reported
 in the CERN Data Integrity paper published last
 April (first hit if you Google 'cern data
 integrity').
 
   While I have yet to see any checksum error
 reported
   by ZFS on
   Symmetrix arrays or FC/SAS arrays with some other
   cheap HW I've seen
   many of them
 
  While one can never properly diagnose anecdotal
 issues off the cuff in a Web forum, given CERN's
 experience you should probably check your
 configuration very thoroughly for things like
 marginal connections:  unless you're dealing with a
 far larger data set than CERN was, you shouldn't have
 seen 'many' checksum errors.
 
 Well single bit error rates may be rare in normal
 operation hard
 drives, but from a systems perspective, data can be
 corrupted anywhere
 between disk and CPU.

The CERN study found that such errors (if they found any at all, which they 
couldn't really be sure of) were far less common than the manufacturer's spec 
for plain old detectable but unrecoverable bit errors or to the one hardware 
problem that they discovered (a disk firmware bug that appeared related to the 
unusual demands and perhaps negligent error reporting of their RAID controller 
and caused errors at a rate about an order of magnitude higher than the nominal 
spec for detectable but unrecoverable errors).

This suggests that in a ZFS-style installation without a hardware RAID 
controller they would have experienced at worst a bit error about every 10^14 
bits or 12 TB (the manufacturer's spec rate for detectable but unrecoverable 
errors) - though some studies suggest that the actual incidence of 'bit rot' is 
considerably lower than such specs.  Furthermore, simply scrubbing the disk in 
the background (as I believe some open-source LVMs are starting to do and for 
that matter some disks are starting to do themselves) would catch virtually all 
such errors in a manner that would allow a conventional RAID to correct them, 
leaving a residue of something more like one error per PB that ZFS could catch 
better than anyone else save WAFL.

  I know you're not interested
 in anecdotal
 evidence,

It's less that I'm not interested in it than that I don't find it very 
convincing when actual quantitative evidence is available that doesn't seem to 
support its importance.  I know very well that things like lost and wild writes 
occur, as well as the kind of otherwise undetected bus errors that you 
describe, but the available evidence seems to suggest that they occur in such 
small numbers that catching them is of at most secondary importance compared to 
many other issues.  All other things being equal, I'd certainly pick a file 
system that could do so, but when other things are *not* equal I don't think it 
would be a compelling attraction.

 but I had a box that was randomly
 corrupting blocks during
 DMA.  The errors showed up when doing a ZFS scrub and
 I caught the
 problem in time.

Yup - that's exactly the kind of error that ZFS and WAFL do a perhaps uniquely 
good job of catching.  Of course, buggy hardware can cause errors that trash 
your data in RAM beyond any hope of detection by ZFS, but (again, other things 
being equal) I agree that the more ways you have to detect them, the better.  
That said, it would be interesting to know who made this buggy hardware.

...

 Like others have said for big business; as a consumer
 I can reasonably
 comforably buy off the shelf cheap controllers and
 disks, and know
 that should any part of the system be flaky enough to
 cause data
 corruption the software layer will catch it which
 both saves money and
 creates peace of mind.

CERN was using relatively cheap disks and found that they were more than 
adequate (at least for any normal consumer use) without that additional level 
of protection:  the incidence of errors, even including the firmware errors 
which presumably would not have occurred in a normal consumer installation 
lacking hardware RAID, was on the order of 1 per TB - and given that it's 
really, really difficult for a consumer to come anywhere near that much data 
without most of it being video files (which just laugh and keep playing when 
they discover small errors) that's pretty much tantamount to saying that 
consumers would encounter no *noticeable* errors at all.

Your position is similar to that of an audiophile enthused about a measurable 
but marginal increase in music quality and trying to convince the hoi polloi 
that no other system will do:  while other audiophiles may agree with you, most 
people just won't consider it important - and in fact won't even be able to 
distinguish it at all.

- bill
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Yager on ZFS

2007-11-08 Thread Richard Elling
can you guess? wrote:
 CERN was using relatively cheap disks and found that they were more 
 than adequate (at least for any normal consumer use) without that 
 additional level of protection: the incidence of errors, even 
 including the firmware errors which presumably would not have occurred 
 in a normal consumer installation lacking hardware RAID, was on the 
 order of 1 per TB - and given that it's really, really difficult for a 
 consumer to come anywhere near that much data without most of it being 
 video files (which just laugh and keep playing when they discover 
 small errors) that's pretty much tantamount to saying that consumers 
 would encounter no *noticeable* errors at all.

bull*
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Major problem with a new ZFS setup

2007-11-08 Thread Michael Stalnaker
We weren't able to do anything at all, and finally rebooted the system. When
we did, everything came back normally, even with the target that was
reporting errors before. We're using an LSI PCI-E controller that's on the
supported device list, and LSI 3801-E. Right now, I'm trying to figure out
if there's a different controller we should be using with Solaris 10 Release
4 (X86) that will handle a drive issue more gracefully. I know folks are
working on this part of the code, but I need to get as far along as I can
right now. :)



On 11/8/07 8:43 PM, Ian Collins [EMAIL PROTECTED] wrote:

 Michael Stalnaker wrote:
 
 Finally trying to do a zpool status yields:
 
 [EMAIL PROTECTED]:/# zpool status -v
   pool: LogData
  state: ONLINE
 status: One or more devices has experienced an unrecoverable error.  An
 attempt was made to correct the error.  Applications are unaffected.
 action: Determine if the device needs to be replaced, and clear the errors
 using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
  scrub: none requested
 
 At which point the shell hangs, and cannot be control-c'd.
 
 
 Any thoughts on how to proceed? I'm guessing we have a bad disk, but I'm not
 sure. Anything you can recommend to diagnose this would be welcome.
 
   
 Are you able to run a zpool scrub?
 
 Ian

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss