Re: [zfs-discuss] test for holes in a file?
On Mon, 26 Mar 2012, Mike Gerdts wrote: If file space usage is less than file directory size then it must contain a hole. Even for compressed files, I am pretty sure that Solaris reports the uncompressed space usage. That's not the case. You are right. I should have tested this prior to posting. :-( Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] webserver zfs root lock contention under heavy load
hi you did not answer the question, what is the RAM of the server? how many socket and core etc what is the block size of zfs? what is the cache ram of your san array? what is the block size/strip size of your raid in san array? raid 5 or what? what is your test program and how (from what kind client) regards On 3/26/2012 11:13 PM, Aubrey Li wrote: On Tue, Mar 27, 2012 at 1:15 AM, Jim Klimov wrote: Well, as a further attempt down this road, is it possible for you to rule out ZFS from swapping - i.e. if RAM amounts permit, disable the swap at all (swap -d /dev/zvol/dsk/rpool/swap) or relocate it to dedicated slices of same or better yet separate disks? Thanks Jim for your suggestion! If you do have lots of swapping activity (that can be seen in "vmstat 1" as si/so columns) going on in a zvol, you're likely to get much fragmentation in the pool, and searching for contiguous stretches of space can become tricky (and time-consuming), or larger writes can get broken down into many smaller random writes and/or "gang blocks", which is also slower. At least such waiting on disks can explain the overall large kernel times. I took swapping activity into account, even when the CPU% is 100%, "si" (swap-ins) and "so" (swap-outs) are always ZEROs. You can also see the disk wait times ratio in "iostat -xzn 1" column "%w" and disk busy times ratio in "%b" (second and third from the right). I dont't remember you posting that. If these are accounting in tens, or even close or equal to 100%, then your disks are the actual bottleneck. Speeding up that subsystem, including addition of cache (ARC RAM, L2ARC SSD, maybe ZIL SSD/DDRDrive) and combatting fragmentation by moving swap and other scratch spaces to dedicated pools or raw slices might help. My storage system is not quite busy, and there are only read operations. = # iostat -xnz 3 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 112.40.0 1691.50.0 0.0 0.50.04.8 0 41 c11t0d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 118.70.0 1867.00.0 0.0 0.50.04.5 0 42 c11t0d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 127.70.0 2121.60.0 0.0 0.60.04.7 0 44 c11t0d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 141.30.0 2158.50.0 0.0 0.70.04.6 0 48 c11t0d0 == Thanks, -Aubrey ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Hung-Sheng Tsao Ph D. Founder& Principal HopBit GridComputing LLC cell: 9734950840 http://laotsao.blogspot.com/ http://laotsao.wordpress.com/ http://blogs.oracle.com/hstsao/ <>___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] kernel panic during zfs import
On Tue, Mar 27, 2012 at 3:14 AM, Carsten John wrote: > Hallo everybody, > > I have a Solaris 11 box here (Sun X4270) that crashes with a kernel panic > during the import of a zpool (some 30TB) containing ~500 zfs filesystems > after reboot. This causes a reboot loop, until booted single user and removed > /etc/zfs/zpool.cache. > > > From /var/adm/messages: > > savecore: [ID 570001 auth.error] reboot after panic: BAD TRAP: type=e (#pf > Page fault) rp=ff002f9cec50 addr=20 occurred in module "zfs" due to a > NULL pointer dereference > savecore: [ID 882351 auth.error] Saving compressed system crash dump in > /var/crash/vmdump.2 > I ran into a very similar problem with Solaris 10U9 and the replica (zfs send | zfs recv destination) of a zpool of about 25 TB of data. The problem was an incomplete snapshot (the zfs send | zfs recv had been interrupted). On boot the system was trying to import the zpool and as part of that it was trying to destroy the offending (incomplete) snapshot. This was zpool version 22 and destruction of snapshots is handled as a single TXG. The problem was that the operation was running the system out of RAM (32 GB worth). There is a fix for this and it is in zpool 26 (or newer), but any snapshots created while the zpool is at a version prior to 26 will have the problem on-disk. We have support with Oracle and were able to get a loaner system with 128 GB RAM to clean up the zpool (it took about 75 GB RAM to do so). If you are at zpool 26 or later this is not your problem. If you are at zpool < 26, then test for an incomplete snapshot by importing the pool read only, then `zdb -d | grep '%'` as the incomplete snapshot will have a '%' instead of a '@' as the dataset / snapshot separator. You can also run the zdb against the _un_imported_ zpool using the -e option to zdb. See the following Oracle Bugs for more information. CR# 6876953 CR# 6910767 CR# 7082249 CR#7082249 has been marked as a duplicate of CR# 6948890 P.S. I have a suspect that the incomplete snapshot was also corrupt in some strange way, but could never make a solid determination of that. We think what caused the zfs send | zfs recv to be interrupted was hitting an e1000g Ethernet device driver bug. -- {1-2-3-4-5-6-7-} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, Troy Civic Theatre Company -> Technical Advisor, RPI Players ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] kernel panic during zfs import
2012-03-27 11:14, Carsten John write: I saw a similar effect some time ago on a opensolaris box (build 111b). That time my final solution was to copy over the read only mounted stuff to a newly created pool. As it is the second time this failure occures (on different machines) I'm really concerned about overall reliability Any suggestions? A couple of months ago I reported a similar issue (though with a different stacktrace and code path). I tracked it to code in freeing of deduped blocks where a valid code path could return a NULL pointer, but further routines used the pointer as if it is always valid - thus a NULL dereference when the pool was imported RW and tried to release blocks marked for deletion. Adding a check for non-NULLness in my private rebuild of oi_151a has fixed the issue. I wouldn't be surprised to see similar slackiness in other parts of the code now. Not checking input values in routines seems like an arrogant mistake waiting to fire (and it did for us). I am not sure how to make a webrev and ultimately a signed-off contribution upstream, but I posted my patch and research on the list and in illumos bugtracker. I am not sure how you can fix a S11 system though. If it is at zpool v28 or older, you can try to import it into an openindiana installation, perhaps rebuilt for similar patched code that would check for NULLs and fix your pool (and then reuse it in S11 if you must). The source is there on http://src.illumos.org and your stacktrace should tell you in which functions you should start looking... Good luck, //Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] webserver zfs root lock contention under heavy load
One of the glories of Solaris is that it is so very observable. Then there are the many excellent blog posts, wiki entries, and books - some or which are authored by contributors to this very thread - explaining how Solaris works. But these virtues are also a snare to some, and it is not uncommon for even experienced practitioners to become fixated by this or that. Aubrey, a 70:30 user to system ratio is pretty respectable. Running at 100% is not so pretty (e.g. I would be surprised if you DIDN'T see many involuntary context switches (icsw) in such a scenario). Esteemed experts have assured you that ZFS lock contention is not you main concern. I would run with that. You said it was a stress test. That raises many questions for me. I am much more interested in how systems perform in the real world. In my experience, many of the issues we find in production are not the ones we found in the lab. Indeed, I would argue that too many systems are tuned to run simplistic benchmarks instead of real workloads. However, I don't think it's helpful to continue this discussion here. There are some esteemed experienced practitioners "out there" who would be only too happy to provide holistic systems performance tuning and health check services to you on a commercial basis (I hear that some may even accept PayPal). Phil (p...@harmanholistix.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] kernel panic during zfs import
Hallo everybody, I have a Solaris 11 box here (Sun X4270) that crashes with a kernel panic during the import of a zpool (some 30TB) containing ~500 zfs filesystems after reboot. This causes a reboot loop, until booted single user and removed /etc/zfs/zpool.cache. >From /var/adm/messages: savecore: [ID 570001 auth.error] reboot after panic: BAD TRAP: type=e (#pf Page fault) rp=ff002f9cec50 addr=20 occurred in module "zfs" due to a NULL pointer dereference savecore: [ID 882351 auth.error] Saving compressed system crash dump in /var/crash/vmdump.2 This is what mdb tells: mdb unix.2 vmcore.2 Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp scsi_vhci zfs mpt sd ip hook neti arp usba uhci sockfs qlc fctl s1394 kssl lofs random fcp idm sata fcip cpc crypto ufs logindmux ptm sppp ] $c zap_leaf_lookup_closest+0x45(ff0700ca2a98, 0, 0, ff002f9cedb0) fzap_cursor_retrieve+0xcd(ff0700ca2a98, ff002f9ceed0, ff002f9cef10) zap_cursor_retrieve+0x195(ff002f9ceed0, ff002f9cef10) zfs_purgedir+0x4d(ff0721d32c20) zfs_rmnode+0x57(ff0721d32c20) zfs_zinactive+0xb4(ff0721d32c20) zfs_inactive+0x1a3(ff0721d3a700, ff07149dc1a0, 0) fop_inactive+0xb1(ff0721d3a700, ff07149dc1a0, 0) vn_rele+0x58(ff0721d3a700) zfs_unlinked_drain+0xa7(ff07022dab40) zfsvfs_setup+0xf1(ff07022dab40, 1) zfs_domount+0x152(ff07223e3c70, ff0717830080) zfs_mount+0x4e3(ff07223e3c70, ff07223e5900, ff002f9cfe20, ff07149dc1a0) fsop_mount+0x22(ff07223e3c70, ff07223e5900, ff002f9cfe20, ff07149dc1a0) domount+0xd2f(0, ff002f9cfe20, ff07223e5900, ff07149dc1a0, ff002f9cfe18) mount+0xc0(ff0713612c78, ff002f9cfe98) syscall_ap+0x92() _sys_sysenter_post_swapgs+0x149() I can import the pool readonly. The server is a mirror for our primary file server and is synced via zfs send/receive. I saw a similar effect some time ago on a opensolaris box (build 111b). That time my final solution was to copy over the read only mounted stuff to a newly created pool. As it is the second time this failure occures (on different machines) I'm really concerned about overall reliability Any suggestions? thx Carsten ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] test for holes in a file?
>On Mon, 26 Mar 2012, Andrew Gabriel wrote: > >> I just played and knocked this up (note the stunning lack of comments, >> missing optarg processing, etc)... >> Give it a list of files to check... > >This is a cool program, but programmers were asking (and answering) >this same question 20+ years ago before there was anything like >SEEK_HOLE. > >If file space usage is less than file directory size then it must >contain a hole. Even for compressed files, I am pretty sure that >Solaris reports the uncompressed space usage. > Unfortunately not true with filesystems which compress data. Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss