ok I added some debug statements, and the depth counter is just increasing infinitely when I hit this chain. ideas where to fixy? under what conditions should the depth counter increase?
On Wed, Jul 4, 2012 at 9:01 AM, Aleks Clark <aleks.cl...@gmail.com> wrote: > this is using latest source tarball from oss.oracle.com > > > On Wed, Jul 4, 2012 at 9:00 AM, Aleks Clark <aleks.cl...@gmail.com> wrote: >> I found the infinite loop. chain gets down to 69 (lol) and does this forever: >> >> repair_group_desc:363 | checking desc at 2225664; blkno 2225664 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 10063872; blkno 10063872 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 88445952; blkno 88445952 size >> 4032 bits 32256 free_bits 394 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 80607744; blkno 80607744 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 72769536; blkno 72769536 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 64931328; blkno 64931328 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 57093120; blkno 57093120 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 49254912; blkno 49254912 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 41416704; blkno 41416704 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 33578496; blkno 33578496 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 25740288; blkno 25740288 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 17902080; blkno 17902080 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 2225664; blkno 2225664 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 10063872; blkno 10063872 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 88445952; blkno 88445952 size >> 4032 bits 32256 free_bits 394 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 80607744; blkno 80607744 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 72769536; blkno 72769536 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 64931328; blkno 64931328 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 57093120; blkno 57093120 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 49254912; blkno 49254912 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 41416704; blkno 41416704 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 33578496; blkno 33578496 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 25740288; blkno 25740288 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 17902080; blkno 17902080 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 2225664; blkno 2225664 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 10063872; blkno 10063872 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 88445952; blkno 88445952 size >> 4032 bits 32256 free_bits 394 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 80607744; blkno 80607744 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 72769536; blkno 72769536 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 64931328; blkno 64931328 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 57093120; blkno 57093120 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 49254912; blkno 49254912 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 41416704; blkno 41416704 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 33578496; blkno 33578496 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 25740288; blkno 25740288 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 17902080; blkno 17902080 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> repair_group_desc:363 | checking desc at 2225664; blkno 2225664 size >> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588 >> >> >> On Wed, Jul 4, 2012 at 4:49 AM, Aleks Clark <aleks.cl...@gmail.com> wrote: >>> looks like I got hit by this: >>> >>> https://oss.oracle.com/pipermail/ocfs2-users/2011-April/005106.html >>> >>> guess I'll cancel that fsck and upgrade after all :P >>> >>> On Wed, Jul 4, 2012 at 4:38 AM, Aleks Clark <aleks.cl...@gmail.com> wrote: >>>> I'll try that kernel upgrade while I've got the cluster down. Has >>>> anyone given any thought to multi-threading fsck.ocfs2? From my top >>>> stats, it's clearly CPU-bound (also going on 5 hours, still haven't >>>> seen the end of the first pass). >>>> >>>> On Tue, Jul 3, 2012 at 11:50 PM, Guozhonghua <guozhong...@h3c.com> wrote: >>>>> Hi, >>>>> >>>>> I had used the ocfs2 with Linux kernel 2.6.39, there are some problems >>>>>may be same with you. >>>>> >>>>> I download the Linux kernel 3.2.X, and compare the source code with >>>>>2.6.39, and find so many codes changed. >>>>> So as to update the kernel and the problems disappeared. >>>>> >>>>> I recommend you update the kernel to recent, may be very stable. >>>>> I used the recent kernel and the ocfs2 module is very stable and it had >>>>>run for several weeks without reboot, panic. >>>>> >>>>> Another note, you will set the I/O schedule method with deadline, and >>>>>it will be fitful for ocfs2. >>>>> >>>>> elevator=deadline >>>>> >>>>> Please prefer the ocfs2_faq.txt for details: >>>>> >>>>> Q07 I encounter "Kernel panic - not syncing: ocfs2 is very sorry to >>>>>be fencing this system by panicing" whenever I run a heavy io >>>>> load? A07 We have encountered a bug with the default "cfq" >>>>>io scheduler which causes a process doing heavy io to temporarily starve >>>>>out >>>>> other processes. While this is not fatal for most environments, >>>>> it is for OCFS2 as we expect the hb thread to be r/w to the hb >>>>> area atleast once every 12 secs (default). >>>>> Bug with the fix has been filed with Red Hat and Novell. For >>>>> more, refer to the tracker bug filed on bugzilla: >>>>> http://oss.oracle.com/bugzilla/show_bug.cgi?id=671 >>>>> Till this issue is resolved, one is advised to use the >>>>> "deadline" io scheduler. To use deadline, add "elevator=deadline" >>>>> to the kernel command line as follows: >>>>> 1. For SLES9, edit the command line in /boot/grub/menu.lst. >>>>> title Linux 2.6.5-7.244-bigsmp elevator=deadline kernel >>>>>(hd0,4)/boot/vmlinuz-2.6.5-7.244-bigsmp root=/dev/sda5 vga=0x314 selinux=0 >>>>>splash=silent resume=/dev/sda3 >>>>> elevator=deadline showopts console=tty0 >>>>> console=ttyS0,115200 noexec=off initrd >>>>>(hd0,4)/boot/initrd-2.6.5-7.244-bigsmp >>>>> 2. For RHEL4, edit the command line in /boot/grub/grub.conf: >>>>> title Red Hat Enterprise Linux AS (2.6.9-22.EL) root (hd0,0) >>>>> kernel /vmlinuz-2.6.9-22.EL ro root=LABEL=/ >>>>>console=ttyS0,115200 console=tty0 elevator=deadline noexec=off initrd >>>>>/initrd-2.6.9-22.EL.img >>>>> To see the current kernel command line, do: >>>>> # cat /proc/cmdline >>>>>============================================================================== >>>>> ------------------------------------------------------------------------------------------------------------------------------------- >>>>> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出 >>>>> 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、 >>>>> 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本 >>>>> 邮件! >>>>> This e-mail and its attachments contain confidential information from >>>>> H3C, which is >>>>> intended only for the person or entity whose address is listed above. Any >>>>> use of the >>>>> information contained herein in any way (including, but not limited to, >>>>> total or partial >>>>> disclosure, reproduction, or dissemination) by persons other than the >>>>> intended >>>>> recipient(s) is prohibited. If you receive this e-mail in error, please >>>>> notify the sender >>>>> by phone or email immediately and delete it! >>>> >>>> >>>> >>>> -- >>>> Aleks Clark >>> >>> >>> >>> -- >>> Aleks Clark >> >> >> >> -- >> Aleks Clark > > > > -- > Aleks Clark -- Aleks Clark _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users