Re: [Ocfs2-users] 答复: remove locks? or copy the whole file?

Aleks Clark Wed, 04 Jul 2012 09:39:50 -0700

ok I added some debug statements, and the depth counter is just
increasing infinitely when I hit this chain. ideas where to fixy?
under what conditions should the depth counter increase?


On Wed, Jul 4, 2012 at 9:01 AM, Aleks Clark <aleks.cl...@gmail.com> wrote:
> this is using latest source tarball from oss.oracle.com
>
>
> On Wed, Jul 4, 2012 at 9:00 AM, Aleks Clark <aleks.cl...@gmail.com> wrote:
>> I found the infinite loop. chain gets down to 69 (lol) and does this forever:
>>
>> repair_group_desc:363 | checking desc at 2225664; blkno 2225664 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 10063872; blkno 10063872 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 88445952; blkno 88445952 size
>> 4032 bits 32256 free_bits 394 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 80607744; blkno 80607744 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 72769536; blkno 72769536 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 64931328; blkno 64931328 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 57093120; blkno 57093120 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 49254912; blkno 49254912 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 41416704; blkno 41416704 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 33578496; blkno 33578496 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 25740288; blkno 25740288 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 17902080; blkno 17902080 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 2225664; blkno 2225664 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 10063872; blkno 10063872 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 88445952; blkno 88445952 size
>> 4032 bits 32256 free_bits 394 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 80607744; blkno 80607744 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 72769536; blkno 72769536 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 64931328; blkno 64931328 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 57093120; blkno 57093120 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 49254912; blkno 49254912 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 41416704; blkno 41416704 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 33578496; blkno 33578496 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 25740288; blkno 25740288 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 17902080; blkno 17902080 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 2225664; blkno 2225664 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 10063872; blkno 10063872 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 88445952; blkno 88445952 size
>> 4032 bits 32256 free_bits 394 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 80607744; blkno 80607744 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 72769536; blkno 72769536 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 64931328; blkno 64931328 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 57093120; blkno 57093120 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 49254912; blkno 49254912 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 41416704; blkno 41416704 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 33578496; blkno 33578496 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 25740288; blkno 25740288 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 17902080; blkno 17902080 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 2225664; blkno 2225664 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>>
>>
>> On Wed, Jul 4, 2012 at 4:49 AM, Aleks Clark <aleks.cl...@gmail.com> wrote:
>>> looks like I got hit by this:
>>>
>>> https://oss.oracle.com/pipermail/ocfs2-users/2011-April/005106.html
>>>
>>> guess I'll cancel that fsck and upgrade after all :P
>>>
>>> On Wed, Jul 4, 2012 at 4:38 AM, Aleks Clark <aleks.cl...@gmail.com> wrote:
>>>> I'll try that kernel upgrade while I've got the cluster down. Has
>>>> anyone given any thought to multi-threading fsck.ocfs2? From my top
>>>> stats, it's clearly CPU-bound (also going on 5 hours, still haven't
>>>> seen the end of the first pass).
>>>>
>>>> On Tue, Jul 3, 2012 at 11:50 PM, Guozhonghua <guozhong...@h3c.com> wrote:
>>>>>　　 Hi,
>>>>>
>>>>>　　 I had used the ocfs2 with Linux kernel 2.6.39, there are some problems 
>>>>>may be same with you.
>>>>>
>>>>>　　 I download the Linux kernel 3.2.X, and compare the source code with 
>>>>>2.6.39, and find so many codes changed.
>>>>>　　 So as to update the kernel and the problems disappeared.
>>>>>
>>>>>　　 I recommend you update the kernel to recent, may be very stable.
>>>>>　　 I used the recent kernel and the ocfs2 module is very stable and it had 
>>>>>run for several weeks without reboot, panic.
>>>>>
>>>>>　　 Another note, you will set the I/O schedule method with deadline, and 
>>>>>it will be fitful for ocfs2.
>>>>>
>>>>>　　 elevator=deadline
>>>>>
>>>>>　　 Please prefer the ocfs2_faq.txt for details:
>>>>>
>>>>>　　 Q07   I encounter "Kernel panic - not syncing: ocfs2 is very sorry to 
>>>>>be fencing this system by panicing" whenever I run a heavy io
>>>>>　　       load? A07       We have encountered a bug with the default "cfq" 
>>>>>io scheduler which causes a process doing heavy io to temporarily starve 
>>>>>out
>>>>>　　       other processes. While this is not fatal for most environments,
>>>>>　　       it is for OCFS2 as we expect the hb thread to be r/w to the hb
>>>>>　　       area atleast once every 12 secs (default).
>>>>>　　       Bug with the fix has been filed with Red Hat and Novell. For
>>>>>　　       more, refer to the tracker bug filed on bugzilla:
>>>>>　　       http://oss.oracle.com/bugzilla/show_bug.cgi?id=671
>>>>>　　       Till this issue is resolved, one is advised to use the
>>>>>　　       "deadline" io scheduler. To use deadline, add "elevator=deadline"
>>>>>　　       to the kernel command line as follows:
>>>>>　　       1. For SLES9, edit the command line in /boot/grub/menu.lst.
>>>>>　　       title Linux 2.6.5-7.244-bigsmp  elevator=deadline kernel 
>>>>>(hd0,4)/boot/vmlinuz-2.6.5-7.244-bigsmp root=/dev/sda5 vga=0x314 selinux=0 
>>>>>splash=silent resume=/dev/sda3
>>>>>　　                       elevator=deadline showopts console=tty0
>>>>>　　                       console=ttyS0,115200 noexec=off initrd 
>>>>>(hd0,4)/boot/initrd-2.6.5-7.244-bigsmp
>>>>>　　       2. For RHEL4, edit the command line in /boot/grub/grub.conf:
>>>>>　　       title Red Hat Enterprise Linux AS (2.6.9-22.EL) root (hd0,0)
>>>>>　　               kernel /vmlinuz-2.6.9-22.EL ro root=LABEL=/ 
>>>>>console=ttyS0,115200 console=tty0 elevator=deadline noexec=off initrd 
>>>>>/initrd-2.6.9-22.EL.img
>>>>>　　       To see the current kernel command line, do:
>>>>>　　       # cat /proc/cmdline 
>>>>>==============================================================================
>>>>> -------------------------------------------------------------------------------------------------------------------------------------
>>>>> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息，仅限于发送给上面地址中列出
>>>>> 的个人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、
>>>>> 或散发）本邮件中的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本
>>>>> 邮件！
>>>>> This e-mail and its attachments contain confidential information from 
>>>>> H3C, which is
>>>>> intended only for the person or entity whose address is listed above. Any 
>>>>> use of the
>>>>> information contained herein in any way (including, but not limited to, 
>>>>> total or partial
>>>>> disclosure, reproduction, or dissemination) by persons other than the 
>>>>> intended
>>>>> recipient(s) is prohibited. If you receive this e-mail in error, please 
>>>>> notify the sender
>>>>> by phone or email immediately and delete it!
>>>>
>>>>
>>>>
>>>> --
>>>> Aleks Clark
>>>
>>>
>>>
>>> --
>>> Aleks Clark
>>
>>
>>
>> --
>> Aleks Clark
>
>
>
> --
> Aleks Clark



-- 
Aleks Clark

_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] 答复: remove locks? or copy the whole file?

Reply via email to