Re: make release fails on find
On Wed, Oct 31, 2012 at 4:07 AM, Glen Barber g...@freebsd.org wrote: On Tue, Oct 30, 2012 at 08:11:15PM -0400, Glen Barber wrote: Oops, my bad. Yes exact same behavior; make -C release cdrom fails with ... find //tank/cvs/9.1/src/release/dist/doc -empty -delete find //tank/cvs/9.1/src/release/dist/games -empty -delete find: -delete: //tank/cvs/9.1/src/release/dist/games: relative path potentially not safe *** [distributeworld] Error code 1 on 9.1-RC3. I can try with 9-stable as well (tomorrow). Ok, thanks. I do not want to assume anything more at this point. I am still waiting for my build machine to finish a few queued things. Once it frees up, I will roll a release using sudo (just for my own sanity), and without sudo, with your src.conf and make.conf. Anyway, thanks for all of the details you have provided. It is all helpful, and hopefully this will finally be tracked down. Ugh... Ok, so this is my fault. I do not remember why, specifically, but the change in question was not merged to the releng/9.1 branch. Please try the following, in the top-level directory of your releng/9.1 source checkout: svn merge -c240077 ^/head/Makefile.inc1 Makefile.inc1 It worked for me fine. Unfortunately, it is far too late in the release cycle for that change to make it into 9.1-RELEASE. Great that you found the bug! Unfortunately, this does not have anything to do with the recursing in the usr/src tarball. Please let me know if you continue to see that happen, as this is the _single_ most reported issue that I have had zero luck reproducing... With just the merge above now 9.1-RC3 ends up recursing. ( Just tried them one at a time ). Thanks. Glen PS: Sorry about being the cause of your release build failure... No problem really :) Thanks for hunting this down now. /A ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: make release fails on find
On Wed, Oct 31, 2012 at 4:35 AM, Glen Barber g...@freebsd.org wrote: On Tue, Oct 30, 2012 at 11:19:12PM -0400, Glen Barber wrote: So, please also do: svn merge -c241451 ^/head/release release You'll want to merge one more revision: svn merge -c241596 ^/head/release release Same as before - I _think_ this should work. :-) Glen Excelent :) That did the trick, ie no recursion :) Thank you very much for finding the bugs. Will this be merge to 9-stable? On a more whislist topic: I'd really appreciate if .zfs dirs would be excluded from the tarballs. Best regards Andreas ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
9-Stable panic: resource_list_unreserve: can't find resource
Hi I'm running FreeBSD stingray 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #3: Mon Oct 29 16:11:35 CET 2012 tl@stingray:/usr/obj/usr/src/sys/stingray amd64 on a new Dell laptop and keep getting these panics (typically once or twice per day) (kgdb) set pagination off (kgdb) bt #0 doadump (textdump=Variable textdump is not available. ) at pcpu.h:229 #1 0x80425e64 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:448 #2 0x8042634c in panic (fmt=0x1 Address 0x1 out of bounds) at /usr/src/sys/kern/kern_shutdown.c:636 #3 0x8045773e in resource_list_unreserve (rl=Variable rl is not available. ) at /usr/src/sys/kern/subr_bus.c:3338 #4 0x802c3ee4 in acpi_delete_resource (bus=0xfe00052c1100, child=0xfe00052c1500, type=4, rid=3323) at /usr/src/sys/dev/acpica/acpi.c:1405 #5 0x802c62bc in acpi_bus_alloc_gas (dev=0xfe00052c1500, type=0xfe00052b786c, rid=0xfe00052b7978, gas=Variable gas is not available. ) at /usr/src/sys/dev/acpica/acpi.c:1450 #6 0x802d1663 in acpi_PkgGas (dev=0xfe00052c1500, res=Variable res is not available. ) at /usr/src/sys/dev/acpica/acpi_package.c:120 #7 0x802cbf6b in acpi_cpu_cx_cst (sc=0xfe00052b7800) at /usr/src/sys/dev/acpica/acpi_cpu.c:782 #8 0x802cc3a4 in acpi_cpu_notify (h=Variable h is not available. ) at /usr/src/sys/dev/acpica/acpi_cpu.c:1050 #9 0x802a3fca in AcpiEvNotifyDispatch (Context=0x0) at /usr/src/sys/contrib/dev/acpica/events/evmisc.c:283 #10 0x802c26c3 in acpi_task_execute (context=0xfe00051d6800, pending=Variable pending is not available. ) at /usr/src/sys/dev/acpica/Osd/OsdSchedule.c:134 #11 0x804683c4 in taskqueue_run_locked (queue=0xfe00052bc100) at /usr/src/sys/kern/subr_taskqueue.c:308 #12 0x80469366 in taskqueue_thread_loop (arg=Variable arg is not available. ) at /usr/src/sys/kern/subr_taskqueue.c:497 #13 0x803f762f in fork_exit (callout=0x80469320 taskqueue_thread_loop, arg=0x80a20cc8, frame=0xff80002cdb00) at /usr/src/sys/kern/kern_fork.c:992 #14 0x806be6be in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:602 #15 0x in ?? () #16 0x in ?? () #17 0x in ?? () #18 0x in ?? () #19 0x in ?? () #20 0x in ?? () #21 0x in ?? () #22 0x in ?? () #23 0x in ?? () #24 0x in ?? () #25 0x in ?? () #26 0x in ?? () #27 0x in ?? () #28 0x in ?? () #29 0x in ?? () #30 0x in ?? () #31 0x in ?? () #32 0x in ?? () #33 0x in ?? () #34 0x in ?? () #35 0x in ?? () #36 0x in ?? () #37 0x in ?? () #38 0x in ?? () #39 0x00ff in ?? () #40 0x in ?? () #41 0xfe00051e5920 in ?? () #42 0xfe00051e5920 in ?? () #43 0xff80002cd740 in ?? () #44 0xff80002cd6e8 in ?? () #45 0xfe00051c1490 in ?? () #46 0x8044e9b9 in sched_switch (td=0x80469320, newtd=0x80a20cc8, flags=Variable flags is not available. ) at /usr/src/sys/kern/sched_ule.c:1913 Previous frame inner to this frame (corrupt stack?) Hardware details are as follows Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 9.1-PRERELEASE #3: Mon Oct 29 16:11:35 CET 2012 tl@stingray:/usr/obj/usr/src/sys/stingray amd64 CPU: Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz (2591.64-MHz K8-class CPU) Origin = GenuineIntel Id = 0x306a9 Family = 0x6 Model = 0x3a Stepping = 9 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0x7fbae3ffSSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND AMD Features=0x28100800SYSCALL,NX,RDTSCP,LM AMD Features2=0x1LAHF TSC: P-state invariant, performance statistics real memory = 8589934592 (8192 MB) avail memory = 8166604800 (7788 MB) Event timer LAPIC quality 600 ACPI APIC Table: DELL CBX3 FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) x 2 SMT threads cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 cpu4 (AP): APIC ID: 4 cpu5 (AP): APIC ID: 5 cpu6 (AP): APIC ID: 6 cpu7 (AP): APIC ID: 7 ioapic0 Version 2.0 irqs 0-23 on motherboard kbd1 at kbdmux0 acpi0: DELL CBX3on motherboard acpi0: Power Button (fixed) cpu0: ACPI CPU on acpi0 cpu1: ACPI CPU on acpi0 cpu2: ACPI CPU on acpi0 cpu3: ACPI
Re: lock violation in unionfs (9.0-STABLE r230270)
schrieb Attilio Rao am 29.10.2012 23:02 (localtime): On Mon, Oct 29, 2012 at 7:37 PM, Harald Schmalzbauer h.schmalzba...@omnilan.de wrote: schrieb Attilio Rao am 27.10.2012 23:07 (localtime): On Sat, Oct 27, 2012 at 9:46 PM, Attilio Rao atti...@freebsd.org wrote: On Sat, Sep 8, 2012 at 12:48 AM, Attilio Rao atti...@freebsd.org wrote: On Thu, Sep 6, 2012 at 4:52 PM, Harald Schmalzbauer h.schmalzba...@omnilan.de wrote: schrieb Attilio Rao am 09.08.2012 20:26 (localtime): On 8/8/12, Harald Schmalzbauer h.schmalzba...@omnilan.de wrote: schrieb Pavel Polyakov am 06.03.2012 11:20 (localtime): mount -t unionfs -o noatime /usr /mnt insmntque: mp-safe fs and non-locked vp: 0xfe01d96704f0 is not exclusive locked but should be KDB: enter: lock violation Pavel, can you give a spin to this patch?: http://www.freebsd.org/~attilio/unionfs_missing_insmntque_lock.patch I think that the unlocking is due at that point as the vnode lock can be switch later on. Let me know what you think about it and what the test does. Thanks! This patch fixes the problem with lock violation. Sorry I've tested it so late. Hello, this patch still applies cleanly to RELENG_9_1. Was there another fix for the issue or has it just not been PR-sent and thus forgotten? Can you and Pavel try the attached patch? Unfortunately I had no time to test it, I just made in 5 free mins from a non-FreeBSD workstation, Sorry, couldn't test earlier, but now I did: With this patch applied the machine hangs without debug kernel and the latter gives the following panic: System call nmount returning with the following locks held: exclusive lockmgr ufs (ufs) r = 0 (0xc5438278) locked @ src/sys/fs/unionfs/union_vnops.c:1938 panic: witness_warn cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper(c0a04f7f,c0c112c4,d1de3bb4,c097aa8c,fc,...) at db_trace_self_wrapper+0x26 kdb_backtrace(c0a4965f,0,c09c2ede3c1c,0,...) at kdb_backtrace+0x2a witness_warn(2,0,c0a4ac34,c0a0990a,286,...) at witness_warn+0x1e4 syscall(d1de3d08) ar syscall+0x415 Xint0x80_syscall() at Xint0x80_syscall+0x21 --- syscall (0, FreeBSD ELF32, nosys), eip = 0x280b883f,esp = 0xbfbfe46c, ebp = 0xbfbfede8 --- KDB: enter: panic [ thread pid 86 tid 100054 ] Stopped adkdb_enter+0x3a: movl $0,kdb_why db bt Tracing pid 86 tid 100054 td 0xc541b000 kdb_enter(c0a00d16,c0a09130,0,0,0,...) at panix+0x190 witness_warn(2,0,x0a4ac34,c0a0990a,286,...) at witness_warn+0x1e4 syscall(d1de3d08) at syscall+0x415 Xint0x80_syscall() at Xint0x80_syscall+0x21 Hmm, I guess I forgot to install kernel debug symbols... Coming back if I have more Unfortunately unionfs does very wrong things with the insmntque() locking. It basically expects the vnode to return locked in the same way requested by the precedent namei() (when that happens) but when you do insmntque() you can only have an LK_EXCLUSIVE lock on the vnode. Hello, the following patch should workout the issues around unionfs_nodeget() a bit: http://www.freebsd.org/~attilio/unionfs_nodeget2.patch Unfortunately unionfs code is rather messy in the lookup path about locking requirements so follow what it needs to be done there is a bit difficult. I have no way to test this patch, so it is just test-compiled at the moment, but I would need that you also test lookup path (so directory ls, find(1) on the whole unionfs volume, etc.) to validate it someway. On a second thought, I think that locking in lookup (and also other operations) is so fragile and difficult to follow that it makes all vnops real locking landmines. I think that the following patch fixes the insmntque insertion and follows the old approach well enough to be committed separately: http://www.freebsd.org/~attilio/unionfs_nodeget3.patch Unfortunately I have no idea about all those locking strategies and implementations. Applying unionfs_nodeget3.patch results in: sys/fs/unionfs/union_subr.c: In function 'unionfs_nodeget': sys/fs/unionfs/union_subr.c:332: error: expected statement before ')' token *** [union_subr.o] Error code 1 I guess there is a typo in this chunk: @@ -317,11 +328,11 @@ unionfs_nodeget(struct mount *mp, struct vnode *up vref(vp); } else *vpp = vp; - -unionfs_nodeget_out: - if (lkflags LK_TYPE_MASK) - vn_lock(vp, lkflags | LK_RETRY); - + if (lkflags LK_TYPE_MASK) { + if (lkflags == LK_SHARED)) ^ + vn_lock(vp, LK_DOWNGRADE | LK_RETRY); + } else + VOP_UNLOCK(vp, LK_RELEASE); return (0); } After removing the second right parenthesis kernel compiles. But it still crashes: panic: Lock (lockmgr) ufs not locked @ sys/kern/vfs_default.c:512 cpuid = 1 KDB: stack backtrace: ... If you can use the bt info I'll transcribe - no serial console available :-( Am I right that I should only
Re: make release fails on find
First, late me state status more clearly: solved :) Big thanks for fixing it. On a side note, how has re-team not run into this? Best regards Andreas ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: make release fails on find
On Wed, Oct 31, 2012 at 02:07:56PM +0100, Andreas Nilsson wrote: First, late me state status more clearly: solved :) Big thanks for fixing it. Glad to help. To answer one of your previous questions, I've already merged this to stable/9. On a side note, how has re-team not run into this? No, the releases are built within a chroot, and this issue is specific to a few edge-cases outside of that environment. Glen pgpGlkk4SQM1i.pgp Description: PGP signature
Re: make release fails on find
On Wed, Oct 31, 2012 at 08:30:29AM +0100, Andreas Nilsson wrote: On a more whislist topic: I'd really appreciate if .zfs dirs would be excluded from the tarballs. Hmm, I didn't realize this was happening. So I can verify my change works for all environments, are you using any local zfs dataset properties, specifically unhiding the snapshot directory? Glen pgpoJO4lrMq43.pgp Description: PGP signature
ACPI Error: No handler for Region [POWS] (0xffffff000994f380) [IPMI] on Cisco UCS C200 M2
Hi, I am getting the following error on server Cisco UCS C200 M2 running FreeBSD 8.3 amd64 Oct 31 02:15:22 ucs200 kernel: ACPI Error: No handler for Region [POWS] (0xff000994f380) [IPMI] (20101013/evregion-487) Oct 31 02:15:22 ucs200 kernel: ACPI Error: Region IPMI(0x7) has no handler (20101013/exfldio-382) Oct 31 02:15:22 ucs200 kernel: ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPC0.P111._PSR] (Node 0xff0009934080), AE_NOT_EXIST (20101013/psparse-633) Oct 31 02:15:23 ucs200 kernel: ACPI Error: No handler for Region [POWS] (0xff000994f380) [IPMI] (20101013/evregion-487) Oct 31 02:15:23 ucs200 kernel: ACPI Error: Region IPMI(0x7) has no handler (20101013/exfldio-382) Oct 31 02:15:23 ucs200 kernel: ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPC0.P111._PSR] (Node 0xff0009934080), AE_NOT_EXIST (20101013/psparse-633) Oct 31 02:15:23 ucs200 kernel: ACPI Error: No handler for Region [POWS] (0xff000994f380) [IPMI] (20101013/evregion-487) Oct 31 02:15:23 ucs200 kernel: ACPI Error: Region IPMI(0x7) has no handler (20101013/exfldio-382) Oct 31 02:15:23 ucs200 kernel: ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPC0.P111._PSR] (Node 0xff0009934080), AE_NOT_EXIST (20101013/psparse-633) # uname -srmi FreeBSD 8.3-RELEASE amd64 GENERIC I don't know what it means. Should I be worried about it or should I ignore it? Is there something that I can tune to turn this message off or is there something which need to be fixed on FreeBSD side? We are planing to push this machine in to a production in one or two weeks, but until this time I can test patches etc. Miroslav Lachman ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS corruption due to lack of space?
Other info: zpool list tank2 NAMESIZE ALLOC FREECAP DEDUP HEALTH ALTROOT tank219T 18.7T 304G98% 1.00x ONLINE - zfs list tank2 NAMEUSED AVAIL REFER MOUNTPOINT tank2 13.3T 0 13.3T /tank2 Running: 8.3-RELEASE-p4, zpool: v28, zfs: v5 - Original Message - From: Steven Hartland ste...@multiplay.co.uk To: freebsd-stable@freebsd.org; freebsd...@freebsd.org Sent: Wednesday, October 31, 2012 5:25 PM Subject: ZFS corruption due to lack of space? Been running some tests on new hardware here to verify all is good. One of the tests was to fill the zfs array which seems like its totally corrupted the tank. The HW is 7 x 3TB disks in RAIDZ2 with dual 13GB ZIL partitions and dual 100GB L2ARC on Enterprise SSD's. All disks are connected to an LSI 2208 RAID controller run by mfi driver. HD's via a SAS2X28 backplane and SSD's via a passive blackplane backplane. The file system has 31 test files most random data from /dev/random and one blank from /dev/zero. The test running was multiple ~20 dd's under screen with all but one from /dev/random and to final one from /dev/zero e.g. dd if=/dev/random bs=1m of=/tank2/random10 No hardware errors have raised, so no disk timeouts etc. On completion each dd reported no space as you would expect e.g. dd if=/dev/random bs=1m of=/tank2/random13 dd: /tank2/random13: No space left on device 503478+0 records in 503477+0 records out 527933898752 bytes transferred in 126718.731762 secs (4166187 bytes/sec) You have new mail. At that point with the test seemingly successful I went to delete test files which resulted in:- rm random* rm: random1: Unknown error: 122 rm: random10: Unknown error: 122 rm: random11: Unknown error: 122 rm: random12: Unknown error: 122 rm: random13: Unknown error: 122 rm: random14: Unknown error: 122 rm: random18: Unknown error: 122 rm: random2: Unknown error: 122 rm: random3: Unknown error: 122 rm: random4: Unknown error: 122 rm: random5: Unknown error: 122 rm: random6: Unknown error: 122 rm: random7: Unknown error: 122 rm: random9: Unknown error: 122 Error 122 I assume is ECKSUM At this point the pool was showing checksum errors zpool status pool: tank state: ONLINE scan: none requested config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gptid/41fb7e5c-21cf-11e2-92a3-002590881138 ONLINE 0 0 0 gptid/42a1b53c-21cf-11e2-92a3-002590881138 ONLINE 0 0 0 errors: No known data errors pool: tank2 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scan: none requested config: NAME STATE READ WRITE CKSUM tank2 ONLINE 0 0 4.22K raidz2-0 ONLINE 0 0 16.9K mfisyspd0 ONLINE 0 0 0 mfisyspd1 ONLINE 0 0 0 mfisyspd2 ONLINE 0 0 0 mfisyspd3 ONLINE 0 0 0 mfisyspd4 ONLINE 0 0 0 mfisyspd5 ONLINE 0 0 0 mfisyspd6 ONLINE 0 0 0 logs mfisyspd7p3 ONLINE 0 0 0 mfisyspd8p3 ONLINE 0 0 0 cache mfisyspd9ONLINE 0 0 0 mfisyspd10 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: tank2:0x3 tank2:0x8 tank2:0x9 tank2:0xa tank2:0xb tank2:0xf tank2:0x10 tank2:0x11 tank2:0x12 tank2:0x13 tank2:0x14 tank2:0x15 So I tried a scrub, which looks like its going to take 5 days to complete and is reporting many many more errors:- pool: tank2 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scan: scrub in progress since Wed Oct 31 16:13:53 2012 118G scanned out of 18.7T at 42.2M/s, 128h19m to go 49.0M repaired, 0.62% done config: NAME STATE READ WRITE CKSUM tank2 ONLINE 0 0 596K raidz2-0 ONLINE 0 0 1.20M mfisyspd0 ONLINE 0 0 0 (repairing) mfisyspd1 ONLINE 0 0 0 (repairing) mfisyspd2 ONLINE 0 0 0 (repairing) mfisyspd3 ONLINE 0 0 2 (repairing) mfisyspd4 ONLINE
Panic during kernel boot, igb-init related? (8.3-RELEASE)
Hello, We're seeing boot-time panics in about 4% of cases when upgrading from FreeBSD 8.1 to 8.3-RELEASE (i386). This problem is subtle enough that it escaped detection during our regular testing cycle... now with over 100 systems upgraded we're convinced there's a real issue. Our kernel config is essentially PAE (ie. static modules ... with a few drivers added/removed). The hardware is Intel Server System SR1625UR. This appears to match a finding discussed in these threads, having to do with timing of initialization of the igb(4)-based NICs (if I'm understanding it properly): http://lists.freebsd.org/pipermail/freebsd-stable/2011-May/062596.html http://lists.freebsd.org/pipermail/freebsd-stable/2011-June/062949.html http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063867.html http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063958.html These threads include some potential patches and possibility of commit/MFC... but it isn't clear that there was ever final resolution (and MFC to 8-stable). I've cc'd a few folks from back then. A real challenge here is the frequency of occurrence. As mentioned, it only hit's a fraction of our systems. When it _does_ hit, the system may enter a reboot loop for days and then mysteriously break out of it... and thereafter seem to work fine. I'd be very grateful for any help. Some questions: * Was there ever a final blessed patch? o if so, will it apply to RELENG_8_3? * Is there anything that could be said that might help us with reproducing-the-problem / testing / validating-a-fix? Panic message is -- panic: m_getzone: m_getjcl: invalid cluster type cpuid = 0 KDB: stack backtrace: #0 0xc059c717 at kdb_backtrace+0x47 #1 0xc056caf7 at panic+0x117 #2 0xc03c979e at igb_refresh_mbufs+0x25e #3 0xc03c9f98 at igb_rxeof+0x638 #4 0xc03ca135 at igb_msix_que+0x105 #5 0xc0541e2b at intr_event_execute_handlers+0x13b #6 0xc05434eb at ithread_loop+0x6b #7 0xc053efb7 at fork_exit+0x97 #8 0xc0806744 at fork_trampoline+0x8 Thanks very much, Charles -- Charles Owens Great Bay Software, Inc. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 9-Stable panic: resource_list_unreserve: can't find resource
on 31/10/2012 12:14 Tom Lislegaard said the following: Hi I'm running FreeBSD stingray 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #3: Mon Oct 29 16:11:35 CET 2012 tl@stingray:/usr/obj/usr/src/sys/stingray amd64 on a new Dell laptop and keep getting these panics (typically once or twice per day) (kgdb) set pagination off (kgdb) bt #0 doadump (textdump=Variable textdump is not available. ) at pcpu.h:229 #1 0x80425e64 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:448 #2 0x8042634c in panic (fmt=0x1 Address 0x1 out of bounds) at /usr/src/sys/kern/kern_shutdown.c:636 #3 0x8045773e in resource_list_unreserve (rl=Variable rl is not available. ) at /usr/src/sys/kern/subr_bus.c:3338 #4 0x802c3ee4 in acpi_delete_resource (bus=0xfe00052c1100, child=0xfe00052c1500, type=4, rid=3323) at /usr/src/sys/dev/acpica/acpi.c:1405 #5 0x802c62bc in acpi_bus_alloc_gas (dev=0xfe00052c1500, type=0xfe00052b786c, rid=0xfe00052b7978, gas=Variable gas is not available. ) at /usr/src/sys/dev/acpica/acpi.c:1450 #6 0x802d1663 in acpi_PkgGas (dev=0xfe00052c1500, res=Variable res is not available. ) at /usr/src/sys/dev/acpica/acpi_package.c:120 #7 0x802cbf6b in acpi_cpu_cx_cst (sc=0xfe00052b7800) at /usr/src/sys/dev/acpica/acpi_cpu.c:782 #8 0x802cc3a4 in acpi_cpu_notify (h=Variable h is not available. ) at /usr/src/sys/dev/acpica/acpi_cpu.c:1050 #9 0x802a3fca in AcpiEvNotifyDispatch (Context=0x0) at /usr/src/sys/contrib/dev/acpica/events/evmisc.c:283 #10 0x802c26c3 in acpi_task_execute (context=0xfe00051d6800, pending=Variable pending is not available. ) at /usr/src/sys/dev/acpica/Osd/OsdSchedule.c:134 #11 0x804683c4 in taskqueue_run_locked (queue=0xfe00052bc100) at /usr/src/sys/kern/subr_taskqueue.c:308 #12 0x80469366 in taskqueue_thread_loop (arg=Variable arg is not available. ) at /usr/src/sys/kern/subr_taskqueue.c:497 #13 0x803f762f in fork_exit (callout=0x80469320 taskqueue_thread_loop, arg=0x80a20cc8, frame=0xff80002cdb00) at /usr/src/sys/kern/kern_fork.c:992 #14 0x806be6be in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:602 Could you please provide *sc from frame 7? -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS corruption due to lack of space?
On Wed, Oct 31, 2012 at 10:55 AM, Steven Hartland kill...@multiplay.co.uk wrote: At that point with the test seemingly successful I went to delete test files which resulted in:- rm random* rm: random1: Unknown error: 122 ZFS is a logging filesystem. Even removing a file apparently requires some space to write a new record saying that the file is not referenced any more. One way out of this jam is to try truncating some large file in place. Make sure that file is not part of any snapshot. Something like this may do the trick: #dd if=/dev/null of=existing_large_file Or, perhaps even something as simple as 'echo -n large_file' may work. Good luck, --Artem ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS corruption due to lack of space?
On 2012-Oct-31 17:25:09 -, Steven Hartland ste...@multiplay.co.uk wrote: Been running some tests on new hardware here to verify all is good. One of the tests was to fill the zfs array which seems like its totally corrupted the tank. I've accidently filled a pool, and had multiple processes try to write to the full pool, without either emptying the free space reserve (so I could still delete the offending files) or corrupting the pool. Had you tried to read/write the raw disks before you tried the ZFS testing? Do you have compression and/or dedupe enabled on the pool? 1. Given the information it seems like the multiple writes filling the disk may have caused metadata corruption? I don't recall seeing this reported before. 2. Is there anyway to stop the scrub? Other than freeing up some space, I don't think so. If this is a test pool that you don't need, you could try destroying it and re-creating it - that may be quicker and easier than recovering the existing pool. 3. Surely low space should never prevent stopping a scrub? As Artem noted, ZFS is a copy-on-write filesystem. It is supposed to reserve some free space to allow metadata updates (stop scrubs, delete files, etc) even when it is full but I have seen reports of this not working correctly in the past. A truncate-in-place may work. You could also try asking on zfs-disc...@opensolaris.org -- Peter Jeremy pgptbOF1VVAh4.pgp Description: PGP signature
Re: ZFS corruption due to lack of space?
On Wed, Oct 31, 2012 at 4:48 PM, Artem Belevich a...@freebsd.org wrote: One way out of this jam is to try truncating some large file in place. Make sure that file is not part of any snapshot. Something like this may do the trick: #dd if=/dev/null of=existing_large_file Or, perhaps even something as simple as 'echo -n large_file' may work. truncate -s 0? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS corruption due to lack of space?
On 2012-Oct-31 17:25:09 -, Steven Hartland ste...@multiplay.co.uk wrote: Been running some tests on new hardware here to verify all is good. One of the tests was to fill the zfs array which seems like its totally corrupted the tank. I've accidently filled a pool, and had multiple processes try to write to the full pool, without either emptying the free space reserve (so I could still delete the offending files) or corrupting the pool. Same here but its the first time I've had ZIL in place at the time so wondering if that may be playing a factor. Had you tried to read/write the raw disks before you tried the ZFS testing? Yes, didn't see any issues but then it wasn't checksuming so tbh I wouldn't have noticed if it was silently corrupting data. Do you have compression and/or dedupe enabled on the pool? Nope bog standard raidz2 no additional settings 1. Given the information it seems like the multiple writes filling the disk may have caused metadata corruption? I don't recall seeing this reported before. Nore me and we've been using ZFS for years, but never filled a pool with such known simultanious access + ZIL before 2. Is there anyway to stop the scrub? Other than freeing up some space, I don't think so. If this is a test pool that you don't need, you could try destroying it and re-creating it - that may be quicker and easier than recovering the existing pool. Artems trick of cat /dev/null /tank2/bigfile worked and I've now managed to stop the scrub :) 3. Surely low space should never prevent stopping a scrub? As Artem noted, ZFS is a copy-on-write filesystem. It is supposed to reserve some free space to allow metadata updates (stop scrubs, delete files, etc) even when it is full but I have seen reports of this not working correctly in the past. A truncate-in-place may work. Yes it did thanks, but as you said if this metadata update was failing due to out of space lends credability to the fact that the same lack of space and hence failure to update metadata could have also caused the corruption in the first place. Its interesting to note that the zpool is reporting pleanty of free space even when the root zfs volume was showing 0, so you would expect there to be pleanty of space for it be able to stop the scrub but it appears not which is definitely interesting and could point to the underlying cause? zpool list tank2 NAMESIZE ALLOC FREECAP DEDUP HEALTH ALTROOT tank219T 18.7T 304G98% 1.00x ONLINE - zfs list tank2 NAMEUSED AVAIL REFER MOUNTPOINT tank2 13.3T 0 13.3T /tank2 Current state is:- scan: scrub in progress since Wed Oct 31 16:13:53 2012 1.64T scanned out of 18.7T at 62.8M/s, 79h12m to go 280M repaired, 8.76% done Something else that was interesting is while the scrub was running devd was using a good amount of CPU 40% of a 3.3Ghz core, which I've never seen before. Any ideas why its usage would be so high? Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org