Re: [zfs-discuss] x4500 panic report.
Today we had another panic, at least it was during work time :) Just a shame the 999GB ufs takes 80+ mins to fsck. (Yes, it is mounted 'logging'). panic[cpu3]/thread=ff001e70dc80: free: freeing free block, dev:0xb60024, block:13144, ino:1737885, fs:/export /saba1 ff001e70d500 genunix:vcmn_err+28 () ff001e70d550 ufs:real_panic_v+f7 () ff001e70d5b0 ufs:ufs_fault_v+1d0 () ff001e70d6a0 ufs:ufs_fault+a0 () ff001e70d770 ufs:free+38f () ff001e70d830 ufs:indirtrunc+260 () ff001e70dab0 ufs:ufs_itrunc+738 () ff001e70db60 ufs:ufs_trans_itrunc+128 () ff001e70dbf0 ufs:ufs_delete+3b0 () ff001e70dc60 ufs:ufs_thread_delete+da () ff001e70dc70 unix:thread_start+8 () syncing file systems... panic[cpu3]/thread=ff001e70dc80: panic sync timeout dumping to /dev/dsk/c6t0d0s1, offset 65536, content: kernel $c vpanic() vcmn_err+0x28(3, f783a128, ff001e70d678) real_panic_v+0xf7(0, f783a128, ff001e70d678) ufs_fault_v+0x1d0(ff04facf65c0, f783a128, ff001e70d678) ufs_fault+0xa0() free+0x38f(ff001e70d8d0, a6a7358, 2000, 89) indirtrunc+0x260(ff001e70d8d0, a6a42b8, , 0, 89) ufs_itrunc+0x738(ff0550b9fde0, 0, 81, fffec0594db0) ufs_trans_itrunc+0x128(ff0550b9fde0, 0, 81, fffec0594db0) ufs_delete+0x3b0(fffed20e2a00, ff0550b9fde0, 1) ufs_thread_delete+0xda(64704840) thread_start+8() ::panicinfo cpu3 thread ff001e70dc80 message free: freeing free block, dev:0xb60024, block:13144, ino:1737885, fs:/export /saba1 rdi f783a128 rsi ff001e70d678 rdx f783a128 rcx ff001e70d678 r8 f783a128 r90 rax3 rbx0 rbp ff001e70d4d0 r10 fffec3d40580 r10 fffec3d40580 r11 ff001e70dc80 r12 f783a128 r13 ff001e70d678 r143 r15 f783a128 fsbase0 gsbase fffec3d40580 ds 4b es 4b fs0 gs 1c3 trapno0 err0 rip fb83c860 cs 30 rflags 246 rsp ff001e70d488 ss 38 gdt_hi0 gdt_lo 81ef idt_hi0 idt_lo 7fff ldt0 task 70 cr0 8005003b cr2 fed0e010 cr3 2c0 cr4 6f8 Jorgen Lundman wrote: On Saturday the X4500 system paniced, and rebooted. For some reason the /export/saba1 UFS partition was corrupt, and needed fsck. This is why it did not come back online. /export/saba1 is mounted logging,noatime, so fsck should never (-ish) be needed. SunOS x4500-01.unix 5.11 snv_70b i86pc i386 i86pc /export/saba1 on /dev/zvol/dsk/zpool1/saba1 read/write/setuid/devices/intr/largefiles/logging/quota/xattr/noatime/onerror=panic/dev=2d80024 on Sat Jul 5 08:48:54 2008 One possible related bug: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=4884138 What would be the best solution? Go back to latest Solaris 10 and pass it on to Sun support, or find a patch for this problem? Panic dump follows: -rw-r--r-- 1 root root 2529300 Jul 5 08:48 unix.2 -rw-r--r-- 1 root root 10133225472 Jul 5 09:10 vmcore.2 # mdb unix.2 vmcore.2 Loading modules: [ unix genunix specfs dtrace cpu.AuthenticAMD.15 uppc pcplusmp scsi_vhci ufs md ip hook neti sctp arp usba uhci s1394 qlc fctl nca lofs zfs random cpc crypto fcip fcp logindmux nsctl sdbc ptm sv ii sppp rdc nfs ] $c vpanic() vcmn_err+0x28(3, f783ade0, ff001e737aa8) real_panic_v+0xf7(0, f783ade0, ff001e737aa8) ufs_fault_v+0x1d0(fffed0bfb980, f783ade0, ff001e737aa8) ufs_fault+0xa0() dqput+0xce(1db26ef0) dqrele+0x48(1db26ef0) ufs_trans_dqrele+0x6f(1db26ef0) ufs_idle_free+0x16d(ff04f17b1e00) ufs_idle_some+0x152(3f60) ufs_thread_idle+0x1a1() thread_start+8() ::cpuinfo ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 0 fbc2fc10 1b00 60 nono t-0 ff001e737c80 sched 1 fffec3a0a000 1f10 -1 nono t-0ff001e971c80 (idle) 2 fffec3a02ac0 1f00 -1 nono t-1ff001e9dbc80 (idle) 3 fffec3d60580 1f00 -1 no
[zfs-discuss] x4500 panic report.
On Saturday the X4500 system paniced, and rebooted. For some reason the /export/saba1 UFS partition was corrupt, and needed fsck. This is why it did not come back online. /export/saba1 is mounted logging,noatime, so fsck should never (-ish) be needed. SunOS x4500-01.unix 5.11 snv_70b i86pc i386 i86pc /export/saba1 on /dev/zvol/dsk/zpool1/saba1 read/write/setuid/devices/intr/largefiles/logging/quota/xattr/noatime/onerror=panic/dev=2d80024 on Sat Jul 5 08:48:54 2008 One possible related bug: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=4884138 What would be the best solution? Go back to latest Solaris 10 and pass it on to Sun support, or find a patch for this problem? Panic dump follows: -rw-r--r-- 1 root root 2529300 Jul 5 08:48 unix.2 -rw-r--r-- 1 root root 10133225472 Jul 5 09:10 vmcore.2 # mdb unix.2 vmcore.2 Loading modules: [ unix genunix specfs dtrace cpu.AuthenticAMD.15 uppc pcplusmp scsi_vhci ufs md ip hook neti sctp arp usba uhci s1394 qlc fctl nca lofs zfs random cpc crypto fcip fcp logindmux nsctl sdbc ptm sv ii sppp rdc nfs ] $c vpanic() vcmn_err+0x28(3, f783ade0, ff001e737aa8) real_panic_v+0xf7(0, f783ade0, ff001e737aa8) ufs_fault_v+0x1d0(fffed0bfb980, f783ade0, ff001e737aa8) ufs_fault+0xa0() dqput+0xce(1db26ef0) dqrele+0x48(1db26ef0) ufs_trans_dqrele+0x6f(1db26ef0) ufs_idle_free+0x16d(ff04f17b1e00) ufs_idle_some+0x152(3f60) ufs_thread_idle+0x1a1() thread_start+8() ::cpuinfo ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 0 fbc2fc10 1b00 60 nono t-0 ff001e737c80 sched 1 fffec3a0a000 1f10 -1 nono t-0ff001e971c80 (idle) 2 fffec3a02ac0 1f00 -1 nono t-1ff001e9dbc80 (idle) 3 fffec3d60580 1f00 -1 nono t-1ff001ea50c80 (idle) ::panicinfo cpu0 thread ff001e737c80 message dqput: dqp-dq_cnt == 0 rdi f783ade0 rsi ff001e737aa8 rdx f783ade0 rcx ff001e737aa8 r8 f783ade0 r90 rax3 rbx0 rbp ff001e737900 r10 fbc26fb0 r10 fbc26fb0 r11 ff001e737c80 r12 f783ade0 r13 ff001e737aa8 r143 r15 f783ade0 fsbase0 gsbase fbc26fb0 ds 4b es 4b fsbase0 gsbase fbc26fb0 ds 4b es 4b fs0 gs 1c3 trapno0 err0 rip fb83c860 cs 30 rflags 246 rsp ff001e7378b8 ss 38 gdt_hi0 gdt_lo e1ef idt_hi0 idt_lo 77c00fff ldt0 task 70 cr0 8005003b cr2 fee7d650 cr3 2c0 cr4 6f8 ::msgbuf quota_ufs: over hard disk limit (pid 600, uid 178199, inum 941499, fs /export/zero1) quota_ufs: over hard disk limit (pid 600, uid 33647, inum 29504134, fs /export/zero1) panic[cpu0]/thread=ff001e737c80: dqput: dqp-dq_cnt == 0 ff001e737930 genunix:vcmn_err+28 () ff001e737980 ufs:real_panic_v+f7 () ff001e7379e0 ufs:ufs_fault_v+1d0 () ff001e737ad0 ufs:ufs_fault+a0 () ff001e737b00 ufs:dqput+ce () ff001e737b30 ufs:dqrele+48 () ff001e737b70 ufs:ufs_trans_dqrele+6f () ff001e737bc0 ufs:ufs_idle_free+16d () ff001e737c10 ufs:ufs_idle_some+152 () ff001e737c60 ufs:ufs_thread_idle+1a1 () ff001e737c70 unix:thread_start+8 () syncing file systems... -- Jorgen Lundman | [EMAIL PROTECTED] Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4500 panic report.
Jorgen Lundman wrote: On Saturday the X4500 system paniced, and rebooted. For some reason the /export/saba1 UFS partition was corrupt, and needed fsck. This is why it did not come back online. /export/saba1 is mounted logging,noatime, so fsck should never (-ish) be needed. SunOS x4500-01.unix 5.11 snv_70b i86pc i386 i86pc /export/saba1 on /dev/zvol/dsk/zpool1/saba1 read/write/setuid/devices/intr/largefiles/logging/quota/xattr/noatime/onerror=panic/dev=2d80024 on Sat Jul 5 08:48:54 2008 One possible related bug: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=4884138 Yes, that bug is possibly related. However, the panic stacks listed in it do not match yours. What would be the best solution? Go back to latest Solaris 10 and pass it on to Sun support, or find a patch for this problem? Since the panic stack only ever goes through ufs, you should log a call with Sun support. ... ::msgbuf quota_ufs: over hard disk limit (pid 600, uid 178199, inum 941499, fs /export/zero1) quota_ufs: over hard disk limit (pid 600, uid 33647, inum 29504134, fs /export/zero1) panic[cpu0]/thread=ff001e737c80: dqput: dqp-dq_cnt == 0 ff001e737930 genunix:vcmn_err+28 () ff001e737980 ufs:real_panic_v+f7 () ff001e7379e0 ufs:ufs_fault_v+1d0 () ff001e737ad0 ufs:ufs_fault+a0 () ff001e737b00 ufs:dqput+ce () ff001e737b30 ufs:dqrele+48 () ff001e737b70 ufs:ufs_trans_dqrele+6f () ff001e737bc0 ufs:ufs_idle_free+16d () ff001e737c10 ufs:ufs_idle_some+152 () ff001e737c60 ufs:ufs_thread_idle+1a1 () ff001e737c70 unix:thread_start+8 () Although given the entry in the msgbuf, perhaps you might want to fix up your quota settings on that particular filesystem. James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4500 panic report.
Since the panic stack only ever goes through ufs, you should log a call with Sun support. We do have support, but they only speak Japanese, and I'm still quite poor at it. But I have started the process of having it translated and passed along to the next person. It is always fun to see what it becomes at the other end. Meanwhile, I like to research and see if it is a already known problem, rather than just sit around and wait. quota_ufs: over hard disk limit (pid 600, uid 33647, inum 29504134, fs /export/zero1) Although given the entry in the msgbuf, perhaps you might want to fix up your quota settings on that particular filesystem. Customers pay for a certain amount of disk-quota, and being users, always stay close to the edge. Those messages are as constant as precipitation in the rainy season. Are you suggestion that indicate a problem, beyond that the user is out of space? Lund -- Jorgen Lundman | [EMAIL PROTECTED] Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4500 panic report.
Jorgen Lundman wrote: Since the panic stack only ever goes through ufs, you should log a call with Sun support. We do have support, but they only speak Japanese, and I'm still quite poor at it. But I have started the process of having it translated and passed along to the next person. It is always fun to see what it becomes at the other end. Meanwhile, I like to research and see if it is a already known problem, rather than just sit around and wait. That sounds like a learning opportunity :-) quota_ufs: over hard disk limit (pid 600, uid 33647, inum 29504134, fs /export/zero1) Although given the entry in the msgbuf, perhaps you might want to fix up your quota settings on that particular filesystem. Customers pay for a certain amount of disk-quota, and being users, always stay close to the edge. Those messages are as constant as precipitation in the rainy season. Are you suggestion that indicate a problem, beyond that the user is out of space? I don't know, I'm not a UFS expert (heck, I'm not an expert on _anything_). Have you investigated putting your paying customers onto zfs and managing quotas with zfs properties instead of ufs? James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] x4500 panic report.
I don't know, I'm not a UFS expert (heck, I'm not an expert on _anything_). Have you investigated putting your paying customers onto zfs and managing quotas with zfs properties instead of ufs? Yep, we spent about 6 weeks during the trial period of the x4500 to try to find a way for ZFS to be able to replace the current NetApps. History of this mailing-list should have it, and thanks to everyone who helped. But it was just not possible. Perhaps now it can be done, using mirror-mounts, but the 50 odd servers hanging off the x4500 don't all support it, so it would still not be feasible. Unless there has been some advancement in ZFS in the last 6 months I am not aware of... like user quotas? Thanks for your assistance. Lund -- Jorgen Lundman | [EMAIL PROTECTED] Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss