Re: [zfs-discuss] Slow file system access on zfs
how is the performance on the zfs directly without nfs? i have experienced big problems running nfs on large volumes (independent on the underlaying fs) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Yager on ZFS
On 11/8/07, Mark Ashley [EMAIL PROTECTED] wrote: Economics for one. Yep, for sure ... it was a rhetoric question ;) Why would I consider a new solution that is safe, fast enough, stable .. easier to manage and lots cheaper? Rephrase, Why would I NOT consider ...? :) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Slow file system access on zfs
Dnia 8-11-2007 o godz. 7:58 Walter Faleiro napisał(a): Hi Lukasz, The output of the first sript gives bash-3.00# ./test.sh dtrace: script './test.sh' matched 4 probes CPU ID FUNCTION:NAME 0 42681 :tick-10s 0 42681 :tick-10s 0 42681 :tick-10s 0 42681 :tick-10s 0 42681 :tick-10s 0 42681 :tick-10s 0 42681 :tick-10s and it goes on.It means that you have free blocks :) , or you do not have any I/O writes.run:#zpool iostat 1 and #iostat -zxc 1 The second script gives: checking pool map size [B]: filer mdb: failed to dereference symbol: unknown symbol name 423917216903435 Which Solaris version do you use ?Maybe you should patch kernel.Also you can check if there are problems with zfs sync phase.Run #dtrace -n fbt::txg_wait_open:entry'{ stack(); ustack(); }'and wait 10 minutesalso give more information about pool#zfs get all filerI assume 'filer' is you pool name.RegardsLukasOn 11/7/07, Łukasz K [EMAIL PROTECTED] wrote: Hi,I think your problem is filesystem fragmentation.When available space is less than 40% ZFS might have problems withfinding free blocks. Use this script to check it:#!/usr/sbin/dtrace -s fbt::space_map_alloc:entry{ self-s = arg1;}fbt::space_map_alloc:return/arg1 != -1/{self-s = 0;}fbt::space_map_alloc:return/self-s (arg1 == -1)/ {@s = quantize(self-s);self-s = 0;}tick-10s{printa(@s);}Run script for few minutes.You might also have problems with space map size.This script will show you size of space map on disk: #!/bin/shecho '::spa' | mdb -k | grep ACTIVE \| while read pool_ptr state pool_namedoecho "checking pool map size [B]: $pool_name"echo "${pool_ptr}::walk metaslab|::print -d struct metaslab ms_smo.smo_objsize" \| mdb -k \| nawk '{sub("^0t","",$3);sum+=$3}END{print sum}'doneIn memory space map takes 5 times more.All space map is loaded into memory all the time, but for example during snapshot remove all space map might be loaded, so checkif you have enough RAM available on machine.Check ::kmastat in mdb.Space map uses kmem_alloc_40( on thumpers this is a real problem ) Workaround:1. first you can change pool recordsizezfs set recordsize=64K POOLMaybe you wil have to use 32K or even 16K2. You will have to disable ZIL, becuase ZIL always takes 128kBblocks. 3. Try to disable cache, tune vdev cache. Check:http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_GuideLukas Karwacki Dnia 7-11-2007 o godz. 1:49 Walter Faleiro napisał(a): Hi, We have a zfs file system configured using a Sunfire 280R with a 10T Raidweb array bash-3.00# zpool list NAMESIZEUSED AVAILCAPHEALTH ALTROOT filer 9.44T 6.97T 2.47T73%ONLINE - bash-3.00# zpool status pool: backupstate: ONLINEscrub: none requested config: NAMESTATE READ WRITE CKSUM filerONLINE 0 0 0 c1t2d1ONLINE 0 0 0 c1t2d2ONLINE 0 0 0 c1t2d3ONLINE 0 0 0 c1t2d4ONLINE 0 0 0 c1t2d5ONLINE 0 0 0 the file system is shared via nfs. Off late we have seen that the file system access slows down considerably. Running commands like find, du on the zfs system did slow it down, but the intermittent slowdowns cannot be explained. Is there a way to trace the I/O on the zfs so that we can list out heavy read/writes to the file system to be responsible for the slowness. Thanks, --Walter ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discussWojna z terrorem wkracza w decydującą fazę:Robert Redford, Meryl Streep i Tom Cruise w filmie UKRYTA STRATEGIA - w kinach od 9 listopada!http://klik.wp.pl/?adr=http%3A%2F%2Fcorto.www.wp.pl%2Fas%2Fstrategia.htmlsid=90 Wojna z terrorem wkracza w decydującą fazę:Robert Redford, Meryl Streep i Tom Cruise w filmieUKRYTA STRATEGIA - w kinach od 9 listopada!http://klik.wp.pl/?adr=http://corto.www.wp.pl/as/strategia.html=90 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] 3rd posting: ZFS question (case 65730249)
Does anyone have any thoughts on this? Hi, I have a customer with the following questions... *Describe the problem:* A ZFS Question - I have one ZFS pool which is made from 2 storage arrays (vdevs). I have to delete the zfs filesystems with the names of /orbits/araid/* and remove one of the arrays from the system. After I delete this data the remaining data easily fits on one array. The question's are: Can I remove one of the vdev's from the orbits pool without having to unload/rebuild the remaining data in the orbits/myear filesystem? Does ZFS know to move any current data from a vdev that is being removed from a pool to the remaining devices? Hardware Platform: Sun Fire V40z Component Affected: OS File System OS and Kernel Version: [Please copy and paste output from uname -a] [EMAIL PROTECTED]:~]uname -a SunOS hemi 5.10 Generic_118855-36 i86pc i386 i86pc [EMAIL PROTECTED]:~]zpool list NAMESIZEUSED AVAILCAP HEALTH ALTROOT orbits 3.17T 2.97T206G93% ONLINE - [EMAIL PROTECTED]:~]zpool status pool: orbits state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM orbits ONLINE 0 0 0 c3t600C0FF0092BC64980F53900d0 ONLINE 0 0 0 c3t600C0FF0092B663929C88800d0 ONLINE 0 0 0 errors: No known data errors [EMAIL PROTECTED]:~]zfs list NAME USED AVAIL REFER MOUNTPOINT orbits2.97T 155G 27.5K /orbits orbits/araid 2.33T 155G 33.5K /orbits/araid orbits/araid/cors 22.9G 155G 22.9G /export/home/cors orbits/araid/rinex1550G 155G 550G /rinex1 orbits/araid/rinex2385G 155G 385G /rinex2 orbits/araid/rinex3503G 155G 503G /rinex3 orbits/araid/rinex4506G 155G 506G /rinex4 orbits/araid/rinex5419G 155G 419G /rinex5 orbits/araid/tst_gnssrnx 24.5K 155G 24.5K none orbits/araid/ulc 432M 155G 432M /orbits/araid/ulc orbits/myear 656G 155G 656G /orbits/myear Regards, Dave -- Sun Microsystems Mailstop ubur04-206 1 Network Drive Burlington, MA 01803 *Dave Bevans - Technical Support Engineer* *Phone: 1-800-USA-4SUN (800-872-4786) (opt-2), (case #) (press 0 for the next available engineer) * *Email: david.bevans mailto:[EMAIL PROTECTED]@Sun.com mailto:[EMAIL PROTECTED] TSC Systems Group-OS / Hours: 6AM - 2PM EST / M - F * Submit, Check Update Cases at the Online Support Center http://www.sun.com/service/online This email may contain confidential and privileged material for the sole use of the intended recipient. Any review or distribution by others is strictly prohibited. If you are not the intended recipient please contact the sender and delete all copies. DAYLIGHT SAVINGS TIME The U.S. Energy Policy Act of 2005 mandates that Daylight Saving Time (DST) in the United States of America start on the second Sunday in March and end on the first Sunday in November starting in 2007. To see how your Sun System or Software may be affected, please visit http://www.sun.com/dst -- Regards, Dave -- Sun Microsystems Mailstop ubur04-206 1 Network Drive Burlington, MA 01803 *Dave Bevans - Technical Support Engineer* *Phone: 1-800-USA-4SUN (800-872-4786) (opt-2), (case #) (press 0 for the next available engineer) * *Email: david.bevans mailto:[EMAIL PROTECTED]@Sun.com mailto:[EMAIL PROTECTED] TSC Systems Group-OS / Hours: 6AM - 2PM EST / M - F * Submit, Check Update Cases at the Online Support Center http://www.sun.com/service/online This email may contain confidential and privileged material for the sole use of the intended recipient. Any review or distribution by others is strictly prohibited. If you are not the intended recipient please contact the sender and delete all copies. DAYLIGHT SAVINGS TIME The U.S. Energy Policy Act of 2005 mandates that Daylight Saving Time (DST) in the United States of America start on the second Sunday in March and end on the first Sunday in November starting in 2007. To see how your Sun System or Software may be affected, please visit http://www.sun.com/dst -- Regards, Dave -- Sun Microsystems Mailstop ubur04-206 1 Network Drive Burlington, MA 01803 *Dave Bevans - Technical Support Engineer* *Phone: 1-800-USA-4SUN (800-872-4786) (opt-2), (case #) (press 0 for the next available engineer) * *Email: david.bevans mailto:[EMAIL PROTECTED]@Sun.com mailto:[EMAIL PROTECTED] TSC Systems Group-OS / Hours: 6AM - 2PM EST / M - F * Submit, Check Update Cases at the Online Support Center http://www.sun.com/service/online This email may contain confidential and privileged material for the sole use of the intended recipient. Any review or distribution by others is strictly prohibited. If you are not the intended
Re: [zfs-discuss] Yager on ZFS
On Wed, Nov 07, 2007 at 01:47:04PM -0800, can you guess? wrote: I do consider the RAID-Z design to be somewhat brain-damaged [...] How so? In my opinion, it seems like a cure for the brain damage of RAID-5. Adam -- Adam Leventhal, FishWorkshttp://blogs.sun.com/ahl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Yager on ZFS
Economics for one. We run a number of testing environments which mimic the production one. But we don't want to spend $750,000 on EMC storage each time when something costing $200,000 will do the job we need. At the moment we have over 100TB on four SE6140s and we're very happy with the solution. ZFS is saving a lot of money for us because it enables solutions that weren't viable before. Hang on, you tell me I can pop in Solaris 10, slap in ZFS ... reduce most of my storage footprint to JBOD's ... (and all of this on a little old AMD system.).. You must be joking! Why would I consider a new solution that is safe, fast enough, stable .. easier to manage and lots cheaper? (That's my fanboy hat, please excuse) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] X4500 device disconnect problem persists
That is interesting, again we're having the same problem with our X4500s. I am trying to work out what is causing the problem with NFS, restarting the service causes it to try and stop and not bring it back up. Rebooting the whole box fails and it just hangs till a hard reset.. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Yager on ZFS
On 11/7/07, can you guess? [EMAIL PROTECTED] wrote: Monday, November 5, 2007, 4:42:14 AM, you wrote: cyg Having gotten a bit tired of the level of ZFS hype floating ... But I do believe that some of the hype is justified Just to make it clear, so do I: it's the *unjustified* hype that I've objected to (as my comments on the Yager article should have made clear). I believe that ZFS will, for at least some installations and workloads and when it has achieved the requisite level of reliability (both actual and perceived), allow some people to replace the kind of expensive equipment that you describe with commodity gear - and make managing the installation easier in the process. That, in my opinion, is its greatest strength; almost everything else is by comparison down in the noise level. However, ZFS is not the *only* open-source approach which may allow that to happen, so the real question becomes just how it compares with equally inexpensive current and potential alternatives (and that would make for an interesting discussion that I'm not sure I have time to initiate tonight). - bill This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Yager on ZFS
On Wed, Nov 07, 2007 at 01:47:04PM -0800, can you guess? wrote: I do consider the RAID-Z design to be somewhat brain-damaged [...] How so? In my opinion, it seems like a cure for the brain damage of RAID-5. Nope. A decent RAID-5 hardware implementation has no 'write hole' to worry about, and one can make a software implementation similarly robust with some effort (e.g., by using a transaction log to protect the data-plus-parity double-update or by using COW mechanisms like ZFS's in a more intelligent manner). The part of RAID-Z that's brain-damaged is its concurrent-small-to-medium-sized-access performance (at least up to request sizes equal to the largest block size that ZFS supports, and arguably somewhat beyond that): while conventional RAID-5 can satisfy N+1 small-to-medium read accesses or (N+1)/2 small-to-medium write accesses in parallel (though the latter also take an extra rev to complete), RAID-Z can satisfy only one small-to-medium access request at a time (well, plus a smidge for read accesses if it doesn't verity the parity) - effectively providing RAID-3-style performance. The easiest way to fix ZFS's deficiency in this area would probably be to map each group of N blocks in a file as a stripe with its own parity - which would have the added benefit of removing any need to handle parity groups at the disk level (this would, incidentally, not be a bad idea to use for mirroring as well, if my impression is correct that there's a remnant of LVM-style internal management there). While this wouldn't allow use of parity RAID for very small files, in most installations they really don't occupy much space compared to that used by large files so this should not constitute a significant drawback. - bill This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Yager on ZFS
Au contraire: I estimate its worth quite accurately from the undetected error rates reported in the CERN Data Integrity paper published last April (first hit if you Google 'cern data integrity'). While I have yet to see any checksum error reported by ZFS on Symmetrix arrays or FC/SAS arrays with some other cheap HW I've seen many of them While one can never properly diagnose anecdotal issues off the cuff in a Web forum, given CERN's experience you should probably check your configuration very thoroughly for things like marginal connections: unless you're dealing with a far larger data set than CERN was, you shouldn't have seen 'many' checksum errors. Well single bit error rates may be rare in normal operation hard drives, but from a systems perspective, data can be corrupted anywhere between disk and CPU. The CERN study found that such errors (if they found any at all, which they couldn't really be sure of) were far less common than the manufacturer's spec for plain old detectable but unrecoverable bit errors or to the one hardware problem that they discovered (a disk firmware bug that appeared related to the unusual demands and perhaps negligent error reporting of their RAID controller and caused errors at a rate about an order of magnitude higher than the nominal spec for detectable but unrecoverable errors). This suggests that in a ZFS-style installation without a hardware RAID controller they would have experienced at worst a bit error about every 10^14 bits or 12 TB (the manufacturer's spec rate for detectable but unrecoverable errors) - though some studies suggest that the actual incidence of 'bit rot' is considerably lower than such specs. Furthermore, simply scrubbing the disk in the background (as I believe some open-source LVMs are starting to do and for that matter some disks are starting to do themselves) would catch virtually all such errors in a manner that would allow a conventional RAID to correct them, leaving a residue of something more like one error per PB that ZFS could catch better than anyone else save WAFL. I know you're not interested in anecdotal evidence, It's less that I'm not interested in it than that I don't find it very convincing when actual quantitative evidence is available that doesn't seem to support its importance. I know very well that things like lost and wild writes occur, as well as the kind of otherwise undetected bus errors that you describe, but the available evidence seems to suggest that they occur in such small numbers that catching them is of at most secondary importance compared to many other issues. All other things being equal, I'd certainly pick a file system that could do so, but when other things are *not* equal I don't think it would be a compelling attraction. but I had a box that was randomly corrupting blocks during DMA. The errors showed up when doing a ZFS scrub and I caught the problem in time. Yup - that's exactly the kind of error that ZFS and WAFL do a perhaps uniquely good job of catching. Of course, buggy hardware can cause errors that trash your data in RAM beyond any hope of detection by ZFS, but (again, other things being equal) I agree that the more ways you have to detect them, the better. That said, it would be interesting to know who made this buggy hardware. ... Like others have said for big business; as a consumer I can reasonably comforably buy off the shelf cheap controllers and disks, and know that should any part of the system be flaky enough to cause data corruption the software layer will catch it which both saves money and creates peace of mind. CERN was using relatively cheap disks and found that they were more than adequate (at least for any normal consumer use) without that additional level of protection: the incidence of errors, even including the firmware errors which presumably would not have occurred in a normal consumer installation lacking hardware RAID, was on the order of 1 per TB - and given that it's really, really difficult for a consumer to come anywhere near that much data without most of it being video files (which just laugh and keep playing when they discover small errors) that's pretty much tantamount to saying that consumers would encounter no *noticeable* errors at all. Your position is similar to that of an audiophile enthused about a measurable but marginal increase in music quality and trying to convince the hoi polloi that no other system will do: while other audiophiles may agree with you, most people just won't consider it important - and in fact won't even be able to distinguish it at all. - bill This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Yager on ZFS
can you guess? wrote: CERN was using relatively cheap disks and found that they were more than adequate (at least for any normal consumer use) without that additional level of protection: the incidence of errors, even including the firmware errors which presumably would not have occurred in a normal consumer installation lacking hardware RAID, was on the order of 1 per TB - and given that it's really, really difficult for a consumer to come anywhere near that much data without most of it being video files (which just laugh and keep playing when they discover small errors) that's pretty much tantamount to saying that consumers would encounter no *noticeable* errors at all. bull* -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Major problem with a new ZFS setup
We weren't able to do anything at all, and finally rebooted the system. When we did, everything came back normally, even with the target that was reporting errors before. We're using an LSI PCI-E controller that's on the supported device list, and LSI 3801-E. Right now, I'm trying to figure out if there's a different controller we should be using with Solaris 10 Release 4 (X86) that will handle a drive issue more gracefully. I know folks are working on this part of the code, but I need to get as far along as I can right now. :) On 11/8/07 8:43 PM, Ian Collins [EMAIL PROTECTED] wrote: Michael Stalnaker wrote: Finally trying to do a zpool status yields: [EMAIL PROTECTED]:/# zpool status -v pool: LogData state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: none requested At which point the shell hangs, and cannot be control-c'd. Any thoughts on how to proceed? I'm guessing we have a bad disk, but I'm not sure. Anything you can recommend to diagnose this would be welcome. Are you able to run a zpool scrub? Ian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss