Re: [Ocfs2-devel] kernel bug

2018-01-11 Thread Changwei Ge
HiCédric,

Sorry I can't answer your question, but we are trying to.
Please be patient.

On 2018/1/11 19:52, BASSAGET Cédric wrote:
> Hi Changwei,
>
> short question : Will the stable release of kernel 4.15 fix this bug ?
> Regards
>
> 2018-01-11 8:06 GMT+01:00 Changwei Ge  >:
>
> Hi Cédric,
>
> On kernel mainline:)
>
> On 2018/1/11 14:50, BASSAGET Cédric wrote:
> > Hi Changwei.
> > My question may be stupid, but... can you tell me where I can find the 
> source of latest ocfs2 ?
> > Every google search points to oracle website which refers to 1.6 
> versions
> >
> > 2018-01-11 2:03 GMT+01:00 Changwei Ge   >>:
> >
> > Hi Cédric,
> >
> > These two patches are already picked by Andrew being merged into 
> -mm tree for now.
> > So you can refer to below links for them:
> >
> > 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__ozlabs.org_-7Eakpm_mmots_broken-2Dout_ocfs2-2Dmake-2Dmetadata-2Destimation-2Daccurate-2Dand-2Dclear.patch=DwIFAw=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y=zLKfEc8ZLxWTDHb5-Hp55vQ4ukrtrF97ebOxW36d9YM=7H5XI8dsIHCDXk9lVb-nBQvD0iy_f_2duQ49PGFfRGA=
>  
>   
> >
> > 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__ozlabs.org_-7Eakpm_mmots_broken-2Dout_ocfs2-2Dtry-2Dto-2Dreuse-2Dextent-2Dblock-2Din-2Ddealloc-2Dwithout-2Dmeta-5Falloc.patch=DwIFAw=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y=zLKfEc8ZLxWTDHb5-Hp55vQ4ukrtrF97ebOxW36d9YM=XDr0YwGE4j3vh1bC7qfY3kUNPr2cxs9N8cUbyFes00o=
>  
>   
> >
> >
> > Thanks,
> > Changwei
> >
> > On 2018/1/10 22:34, BASSAGET Cédric wrote:
> > > Hi Changwei,
> > > Can you give me a ref or a link pointing to your patch ?
> > > Thanks
> > >
> >  > 2018-01-10 12:57 GMT+01:00 Changwei Ge   >    > >
> > > Hi BASSAGET,
> > >
> > > We ocfs2 developers are solving a DIO crash issue which may 
> share the same root cause with yours.
> > >
> > > You can refer to my patch set of 2 and backport them into 
> your kernel to see if the issue can be kicked away.
> > >
> > >ocfs2: make metadata estimation accurate and clear
> > >ocfs2: try to reuse extent block in dealloc without 
> meta_alloc
> > >
> > > On 2018/1/10 19:48, BASSAGET 

Re: [Ocfs2-devel] kernel bug

2018-01-10 Thread Changwei Ge
Hi Cédric,

On kernel mainline:)

On 2018/1/11 14:50, BASSAGET Cédric wrote:
> Hi Changwei.
> My question may be stupid, but... can you tell me where I can find the source 
> of latest ocfs2 ?
> Every google search points to oracle website which refers to 1.6 versions
> 
> 2018-01-11 2:03 GMT+01:00 Changwei Ge  >:
> 
> Hi Cédric,
> 
> These two patches are already picked by Andrew being merged into -mm tree 
> for now.
> So you can refer to below links for them:
> 
> 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__ozlabs.org_-7Eakpm_mmots_broken-2Dout_ocfs2-2Dmake-2Dmetadata-2Destimation-2Daccurate-2Dand-2Dclear.patch=DwIFAw=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y=qhw95Pko-BqAL2AXrJ7mqrE9hNsgJ6TPkRJcYn_28iQ=jmUQTkM6tW4X6p1r1wea1mkvvgrZyssoy-HVDaS0tNM=
>  
> 
> 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__ozlabs.org_-7Eakpm_mmots_broken-2Dout_ocfs2-2Dtry-2Dto-2Dreuse-2Dextent-2Dblock-2Din-2Ddealloc-2Dwithout-2Dmeta-5Falloc.patch=DwIFAw=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y=qhw95Pko-BqAL2AXrJ7mqrE9hNsgJ6TPkRJcYn_28iQ=rzGu_KehSlpzZUJp2bOKUWzSjUVcgI-UOVMkaePGKSA=
>  
> 
> 
> Thanks,
> Changwei
> 
> On 2018/1/10 22:34, BASSAGET Cédric wrote:
> > Hi Changwei,
> > Can you give me a ref or a link pointing to your patch ?
> > Thanks
> >
>  > 2018-01-10 12:57 GMT+01:00 Changwei Ge    >>:
> >
> >     Hi BASSAGET,
> >
> >     We ocfs2 developers are solving a DIO crash issue which may share 
> the same root cause with yours.
> >
> >     You can refer to my patch set of 2 and backport them into your 
> kernel to see if the issue can be kicked away.
> >
> >        ocfs2: make metadata estimation accurate and clear
> >        ocfs2: try to reuse extent block in dealloc without meta_alloc
> >
> >     On 2018/1/10 19:48, BASSAGET Cédric wrote:
> >     > Hello
> >     > Today I reported a bug related to ocfs2 1.8.on proxmox forum, 
> maybe somebody here will be able to help me...
>  >      > The bog on proxmox forum : 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__forum.proxmox.com_threads_ocfs2-2Dkernel-2Dbug.39163_=DwIFAw=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y=qhw95Pko-BqAL2AXrJ7mqrE9hNsgJ6TPkRJcYn_28iQ=zBGruJ6Tp1dnASUNNB2UnJ8teV-QBQe6nfDzlpXxZIA=
>  
> 
>  
>   
> >
>  
>   
> 
> 
> 

Re: [Ocfs2-devel] kernel bug

2018-01-10 Thread BASSAGET Cédric
Hi Changwei.
My question may be stupid, but... can you tell me where I can find the
source of latest ocfs2 ?
Every google search points to oracle website which refers to 1.6
versions

2018-01-11 2:03 GMT+01:00 Changwei Ge :

> Hi Cédric,
>
> These two patches are already picked by Andrew being merged into -mm tree
> for now.
> So you can refer to below links for them:
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__ozlabs.org_-7Eakpm_mmots_broken-2Dout_ocfs2-2Dmake-2D=DwIFaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y=X2J0M6NkocsU4wZdP80cyhu00PjVCzf735naCm4yqiU=yp0Zw-l3jTCIjTW-RPEqU5cg8sHgSpCZ83sJ2bMsG84=
> metadata-estimation-accurate-and-clear.patch
> https://urldefense.proofpoint.com/v2/url?u=http-3A__ozlabs.org_-7Eakpm_mmots_broken-2Dout_ocfs2-2Dtry-2Dto-2Dreuse-2D=DwIFaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y=X2J0M6NkocsU4wZdP80cyhu00PjVCzf735naCm4yqiU=_-08wf1UDf8iEhLBb3M1o_Q02Qjl6kviphh7RQgo408=
> extent-block-in-dealloc-without-meta_alloc.patch
>
> Thanks,
> Changwei
>
> On 2018/1/10 22:34, BASSAGET Cédric wrote:
> > Hi Changwei,
> > Can you give me a ref or a link pointing to your patch ?
> > Thanks
> >
> > 2018-01-10 12:57 GMT+01:00 Changwei Ge >:
> >
> > Hi BASSAGET,
> >
> > We ocfs2 developers are solving a DIO crash issue which may share
> the same root cause with yours.
> >
> > You can refer to my patch set of 2 and backport them into your
> kernel to see if the issue can be kicked away.
> >
> >ocfs2: make metadata estimation accurate and clear
> >ocfs2: try to reuse extent block in dealloc without meta_alloc
> >
> > On 2018/1/10 19:48, BASSAGET Cédric wrote:
> > > Hello
> > > Today I reported a bug related to ocfs2 1.8.on proxmox forum,
> maybe somebody here will be able to help me...
> >  > The bog on proxmox forum : 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__forum.proxmox.com_=DwIFaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y=X2J0M6NkocsU4wZdP80cyhu00PjVCzf735naCm4yqiU=-6NsvHEvy0Ve3ONNa9Do3_ltNYdYm3UkbH1-dofuYgs=
> threads/ocfs2-kernel-bug.39163/ 
>  threads/ocfs2-kernel-bug.39163/> 
>  proofpoint.com/v2/url?u=https-3A__forum.proxmox.com_threads_
> ocfs2-2Dkernel-2Dbug.39163_=DwMFaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
> MUB65eapI_JnE=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iD
> Y8-HD0qT6Fo7Y=93_F2INiG6ig-q6s8P3sPY09iW1NMx6_H9zZx13OxhM=
> j8Ae2cUQUVQJOimaDZOI0C-HbAhHPvEV-l6BS23AaSw= 
>  proofpoint.com/v2/url?u=https-3A__forum.proxmox.com_threads_
> ocfs2-2Dkernel-2Dbug.39163_=DwMFaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
> MUB65eapI_JnE=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iD
> Y8-HD0qT6Fo7Y=93_F2INiG6ig-q6s8P3sPY09iW1NMx6_H9zZx13OxhM=
> j8Ae2cUQUVQJOimaDZOI0C-HbAhHPvEV-l6BS23AaSw=>>
> >  >
> >  > to resume : kernel BUG at fs/ocfs2/suballoc.c:2017!
> >  >
> >  > is this related to proxmox kernel or ocfs2 ? or both ?
> >  > does it have something to do with https://oss.oracle.com/
> pipermail/ocfs2-devel/2017-January/012701.html  pipermail/ocfs2-devel/2017-January/012701.html> ?
> >  >
> >  > Thanks for your help.
> >  > Regards,
> >  > Cédric
> >
> >
>
>
___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] kernel bug

2018-01-10 Thread BASSAGET Cédric
Hi Changwei,
Can you give me a ref or a link pointing to your patch ?
Thanks

2018-01-10 12:57 GMT+01:00 Changwei Ge :

> Hi BASSAGET,
>
> We ocfs2 developers are solving a DIO crash issue which may share the same
> root cause with yours.
>
> You can refer to my patch set of 2 and backport them into your kernel to
> see if the issue can be kicked away.
>
>   ocfs2: make metadata estimation accurate and clear
>   ocfs2: try to reuse extent block in dealloc without meta_alloc
>
> On 2018/1/10 19:48, BASSAGET Cédric wrote:
> > Hello
> > Today I reported a bug related to ocfs2 1.8.on proxmox forum, maybe
> somebody here will be able to help me...
> > The bog on proxmox forum : 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__forum.proxmox.com_=DwIFaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y=nJTMvje6dAo50nzIOjpl26OrW2QtTh01vyTVh3WxjWQ=_Y6qXlbH07GPOwiiuOsDhF0HsvEBeonYFUo0gQ0inq4=
> threads/ocfs2-kernel-bug.39163/ 
>  proofpoint.com/v2/url?u=https-3A__forum.proxmox.com_threads_
> ocfs2-2Dkernel-2Dbug.39163_=DwMFaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
> MUB65eapI_JnE=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iD
> Y8-HD0qT6Fo7Y=93_F2INiG6ig-q6s8P3sPY09iW1NMx6_H9zZx13OxhM=
> j8Ae2cUQUVQJOimaDZOI0C-HbAhHPvEV-l6BS23AaSw=>
> >
> > to resume : kernel BUG at fs/ocfs2/suballoc.c:2017!
> >
> > is this related to proxmox kernel or ocfs2 ? or both ?
> > does it have something to do with https://oss.oracle.com/
> pipermail/ocfs2-devel/2017-January/012701.html ?
> >
> > Thanks for your help.
> > Regards,
> > Cédric
>
>
___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] kernel bug

2018-01-10 Thread piaojun
Hi Cédric,

You'd better paste the core dump stack and the method of reproducing
this BUG.

thanks,
Jun

On 2018/1/10 19:48, BASSAGET Cédric wrote:
> Hello
> Today I reported a bug related to ocfs2 1.8.on proxmox forum, maybe somebody 
> here will be able to help me...
> The bog on proxmox forum : 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__forum.proxmox.com_threads_ocfs2-2Dkernel-2Dbug.39163_=DwID-g=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y=Cjs_sZYvlRtRCpsPU_lWPoen_rJyr14Cw3AxxedrGac=1IlEt5_VsnceBwz_3AYQq8zKNy6viF9oxQGtp8odqn4=
>  
> 
> 
> to resume : kernel BUG at fs/ocfs2/suballoc.c:2017!
> 
> is this related to proxmox kernel or ocfs2 ? or both ?
> does it have something to do with 
> https://oss.oracle.com/pipermail/ocfs2-devel/2017-January/012701.html ?
> 
> Thanks for your help.
> Regards,
> Cédric
> 
> 
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] kernel bug

2018-01-10 Thread Changwei Ge
Hi BASSAGET,

We ocfs2 developers are solving a DIO crash issue which may share the same root 
cause with yours.

You can refer to my patch set of 2 and backport them into your kernel to see if 
the issue can be kicked away.

  ocfs2: make metadata estimation accurate and clear
  ocfs2: try to reuse extent block in dealloc without meta_alloc

On 2018/1/10 19:48, BASSAGET Cédric wrote:
> Hello
> Today I reported a bug related to ocfs2 1.8.on proxmox forum, maybe somebody 
> here will be able to help me...
> The bog on proxmox forum : 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__forum.proxmox.com_threads_ocfs2-2Dkernel-2Dbug.39163_=DwIFAw=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y=OVDUAiMu4SZHRAFT7cQxchj6wuCXXbzwGVkzVqvuiig=g4t9hVOMcDuDuZ95E4OwnmlKjt_LAIAoI2YKYpkYyi0=
>  
> 
> 
> to resume : kernel BUG at fs/ocfs2/suballoc.c:2017!
> 
> is this related to proxmox kernel or ocfs2 ? or both ?
> does it have something to do with 
> https://oss.oracle.com/pipermail/ocfs2-devel/2017-January/012701.html ?
> 
> Thanks for your help.
> Regards,
> Cédric


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] kernel BUG at ocfs2/alloc.c:1514

2017-06-21 Thread Russell Mosemann

Additional stack traces added.
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=841144
 

--
Russell Mosemann___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] kernel BUG at ocfs2/alloc.c:1514

2017-05-18 Thread Russell Mosemann
There is no need for testing. The bug is documented. Someone made the decision 
to crash the code rather than handle the condition. See the git commit below.

BUG_ON(meta_ac == NULL);

Some information is in a January posting to this list from Ben Hutchings.
https://oss.oracle.com/pipermail/ocfs2-devel/2017-January/012701.html

++ 
meta_ac is passed down from ocfs2_dio_end_io_write(), which allocates
it using ocfs2_lock_allocators()... but the latter only allocates it 
conditionally.  It seems like the condition is wrong somehow.
++

Additional insight was provided by Ian Campbell in March. It was sent this 
list, but I do not see it in the archive.

++
This still seems to be happening for this user with 4.9.13, looking at
"git log -p v4.9.13..origin/master -- fs/ocfs2" I wonder if
https://git.kernel.org/torvalds/c/3e10b793fc40dfdbe51762e0d084bd6f2c8acaaa
might be relevant?

The commit message mentions meta_ac not getting allocated and an extent
split vs refcount split differentiation and we have ocfs2_split_extent
in the trace. Slim reasoning I know, maybe someone who knows the code
could make a better determination.
++

--
Russell Mosemann
IT Manager | Future Foam, Inc.
1610 Avenue N | Council Bluffs, IA 51501-1071
O: (712) 323-9122 Ext 228
F: (712) 323-0158


-Original Message-
From: "Gang He" <g...@suse.com>
Sent: Thursday, May 18, 2017 2:11am
To: rmosem...@futurefoam.com, ocfs2-devel@oss.oracle.com
Subject: Re: [Ocfs2-devel] kernel BUG at ocfs2/alloc.c:1514



Hello Russell,

I just went through the bug description from the link below, 
Could you always meet this issue in your testing environment? since it looks 
like a simple direct write.
Second, could you describe the whole reproduce steps in details or provide a 
testing script which can make this bug happen?
Because, in our normal environment, I believe we can not meet this bug easily 
since the kernel version v4.9 is very new.


Thanks
Gang


>>> 

> Copious stack traces available at
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=841144 
> 
> 
> --
> Russell Mosemann
> IT Manager | Future Foam, Inc.
> 1610 Avenue N | Council Bluffs, IA 51501-1071
> O: (712) 323-9122 Ext 228
> F: (712) 323-0158




___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] kernel BUG at ocfs2/alloc.c:1514

2017-05-18 Thread Gang He
Hello Russell,

I just went through the bug description from the link below, 
Could you always meet this issue in your testing environment? since it looks 
like a simple direct write.
Second, could you describe the whole reproduce steps in details or provide a 
testing script which can make this bug happen?
Because, in our normal environment, I believe we can not meet this bug easily 
since the kernel version v4.9 is very new.


Thanks
Gang


>>> 

> Copious stack traces available at
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=841144 
>  
> 
> --
> Russell Mosemann
> IT Manager | Future Foam, Inc.
> 1610 Avenue N | Council Bluffs, IA 51501-1071
> O: (712) 323-9122 Ext 228
> F: (712) 323-0158


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] kernel BUG in function ocfs2_truncate_file

2016-04-21 Thread Joseph Qi
On 2016/4/22 11:17, Gang He wrote:
> 
> 
> 

>> Hi Gang,
>>
>> On 2016/4/21 17:50, Gang He wrote:
>>>
>>>
>>>
>>
 Hi Gang,
 May be. If so, it seems that it has relations with locks.
 Can we know which data is newer?
>>> Hi Joseph, I do not get your question. 
>>> About this bug, the customer only have a panic message, no message/kernel 
>> dump is provided.
>>>
>> Filesystem not corrupted doesn't mean the disk inode is right (the latest).
>> That means the in-memory inode may be right but happens that flush hasn't
>> been done successfully.
>> But as of now, we can't distinguish which the case is.
> True, I just brought out this bug to everyone, need more time to analyze the 
> code, to see if we can reproduce this issue manually in the future.
Fine, we will also set up a race environment and try to reproduce it.
And give feedback to mail list ASAP once we have more info.

Thanks,
Joseph
> 
> Thanks
> Gang
> 
>>
>> Thanks,
>> Joseph
>>
>>>
>>> Thanks
>>> Gang
>>>

 Thanks,
 Joseph

 On 2016/4/21 15:59, Gang He wrote:
> Hello Guys, 
>
> Sorry for delayed reply, the fsck log was reported from the customer.
> OCFS2 volumes to check are:
> /dev/dm-11   83886080 44540260 39345820 54% 
> /sapmnt
> /dev/dm-14   62914560 15374960 47539600 25% 
> /usr/sap
> /dev/dm-13   89128960 19976272 69152688 23% 
 /usr/sap/trans
> uii316:~ # fsck.ocfs2 /dev/dm-13
> fsck.ocfs2 1.8.2
> Checking OCFS2 filesystem in /dev/dm-13:
>  Label:
>  UUID: E35AF95C8114494391E0FFDEFD07309B
>  Number of blocks: 22282240
>  Block size: 4096
>  Number of clusters: 22282240
>  Cluster size: 4096
>  Number of slots: 2
> /dev/dm-13 is clean. It will be checked after 20 additional mounts.
> uii316:~ # fsck.ocfs2 /dev/dm-14
> fsck.ocfs2 1.8.2
> Checking OCFS2 filesystem in /dev/dm-14: 
>  Label:
>  UUID: 43F2D9C8D70B47388527075F3C6C38BB
>  Number of blocks: 15728640
>  Block size: 4096
>  Number of clusters: 15728640
>  Cluster size: 4096
>  Number of slots: 2
> /dev/dm-14 is clean. It will be checked after 20 additional mounts.
> uii316:~ # fsck.ocfs2 /dev/dm-11
> fsck.ocfs2 1.8.2
> Checking OCFS2 filesystem in /dev/dm-11:
>  Label:
>  UUID: 0F1CBE6017934B5E8AD80149ED332659
>  Number of blocks: 20971520
>  Block size: 4096
>  Number of clusters: 20971520
>  Cluster size: 4096
>  Number of slots: 2
> /dev/dm-11 is clean. It will be checked after 20 additional mounts.
>
> >From the log, it looks the file systems were not corrupted, this should 
> >be a 
>>
 race-condition issue ?!
>
> Thanks
> Gang 
>
>

>> On 03/31/2016 10:56 AM, Gang He wrote:
>>> Hello Joseph and Junxiao,
>>>
>>> Did you encounter this issue in the past? I doubt this is possible a 
>>> race 
>> condition bug (rather than data inconsistency).
>> Never saw this. fsck report any corruption?
>>
>> Thanks,
>> Junxiao.
>>>
>>> Thanks
>>> Gang
>>>
>>>
>>
 Hello Guys,

 I got a bug, which reported a kernel BUG in function 
 ocfs2_truncate_file,
 Base on my initial analysis, this bug looks like a race condition 
 problem.
 Unfortunately, there was no kernel crash dump caught, just got some 
 kernel 
 log as below,

 kernel BUG at 
 /usr/src/packages/BUILD/ocfs2-1.6/default/ocfs2/file.c:466!
 Oct 21 13:02:19 uii316 [ 1766.831230] Supported: Yes
 Oct 21 13:02:19 uii316 [ 1766.831234]
 Oct 21 13:02:19 uii316 [ 1766.831238] Pid: 7134, comm: saposcol Not 
 tainted 
 3.0.101-0.47.67-default #1
 Oct 21 13:02:19 uii316 HP ProLiant BL460c G1
 Oct 21 13:02:19 uii316
 Oct 21 13:02:19 uii316 [ 1766.831247] RIP: 0010:[]
 Oct 21 13:02:19 uii316 [] 
 ocfs2_truncate_file+0xa5/0x490 
 [ocfs2]
 Oct 21 13:02:19 uii316 [ 1766.831312] RSP: 0018:880f39d79b68  
 EFLAGS: 
 00010296
 Oct 21 13:02:19 uii316 [ 1766.831321] RAX: 008f RBX: 
 880f39c5e240 RCX: 39fd
 Oct 21 13:02:19 uii316 [ 1766.831326] RDX:  RSI: 
 0007 RDI: 0246
 Oct 21 13:02:19 uii316 [ 1766.831331] RBP: 1000 R08: 
 81da0ac0 R09: 
 Oct 21 13:02:19 uii316 [ 1766.831336] R10: 0003 R11: 
  R12: 880f3949bc78
 Oct 21 13:02:19 uii316 [ 1766.831342] R13: 880f39c5e888 R14: 
 880f3d481000 R15: 000e43bc
 Oct 21 13:02:19 uii316 [ 1766.831347] FS:  

Re: [Ocfs2-devel] kernel BUG in function ocfs2_truncate_file

2016-04-21 Thread Gang He



>>> 
> Hi Gang,
> 
> On 2016/4/21 17:50, Gang He wrote:
>> 
>> 
>> 
>
>>> Hi Gang,
>>> May be. If so, it seems that it has relations with locks.
>>> Can we know which data is newer?
>> Hi Joseph, I do not get your question. 
>> About this bug, the customer only have a panic message, no message/kernel 
> dump is provided.
>> 
> Filesystem not corrupted doesn't mean the disk inode is right (the latest).
> That means the in-memory inode may be right but happens that flush hasn't
> been done successfully.
> But as of now, we can't distinguish which the case is.
True, I just brought out this bug to everyone, need more time to analyze the 
code, to see if we can reproduce this issue manually in the future.

Thanks
Gang

> 
> Thanks,
> Joseph
> 
>> 
>> Thanks
>> Gang
>> 
>>>
>>> Thanks,
>>> Joseph
>>>
>>> On 2016/4/21 15:59, Gang He wrote:
 Hello Guys, 

 Sorry for delayed reply, the fsck log was reported from the customer.
 OCFS2 volumes to check are:
 /dev/dm-11   83886080 44540260 39345820 54% /sapmnt
 /dev/dm-14   62914560 15374960 47539600 25% 
 /usr/sap
 /dev/dm-13   89128960 19976272 69152688 23% 
>>> /usr/sap/trans
 uii316:~ # fsck.ocfs2 /dev/dm-13
 fsck.ocfs2 1.8.2
 Checking OCFS2 filesystem in /dev/dm-13:
  Label:
  UUID: E35AF95C8114494391E0FFDEFD07309B
  Number of blocks: 22282240
  Block size: 4096
  Number of clusters: 22282240
  Cluster size: 4096
  Number of slots: 2
 /dev/dm-13 is clean. It will be checked after 20 additional mounts.
 uii316:~ # fsck.ocfs2 /dev/dm-14
 fsck.ocfs2 1.8.2
 Checking OCFS2 filesystem in /dev/dm-14: 
  Label:
  UUID: 43F2D9C8D70B47388527075F3C6C38BB
  Number of blocks: 15728640
  Block size: 4096
  Number of clusters: 15728640
  Cluster size: 4096
  Number of slots: 2
 /dev/dm-14 is clean. It will be checked after 20 additional mounts.
 uii316:~ # fsck.ocfs2 /dev/dm-11
 fsck.ocfs2 1.8.2
 Checking OCFS2 filesystem in /dev/dm-11:
  Label:
  UUID: 0F1CBE6017934B5E8AD80149ED332659
  Number of blocks: 20971520
  Block size: 4096
  Number of clusters: 20971520
  Cluster size: 4096
  Number of slots: 2
 /dev/dm-11 is clean. It will be checked after 20 additional mounts.

 >From the log, it looks the file systems were not corrupted, this should 
 >be a 
> 
>>> race-condition issue ?!

 Thanks
 Gang 


>>>
> On 03/31/2016 10:56 AM, Gang He wrote:
>> Hello Joseph and Junxiao,
>>
>> Did you encounter this issue in the past? I doubt this is possible a 
>> race 
> condition bug (rather than data inconsistency).
> Never saw this. fsck report any corruption?
>
> Thanks,
> Junxiao.
>>
>> Thanks
>> Gang
>>
>>
>
>>> Hello Guys,
>>>
>>> I got a bug, which reported a kernel BUG in function 
>>> ocfs2_truncate_file,
>>> Base on my initial analysis, this bug looks like a race condition 
>>> problem.
>>> Unfortunately, there was no kernel crash dump caught, just got some 
>>> kernel 
>>> log as below,
>>>
>>> kernel BUG at 
>>> /usr/src/packages/BUILD/ocfs2-1.6/default/ocfs2/file.c:466!
>>> Oct 21 13:02:19 uii316 [ 1766.831230] Supported: Yes
>>> Oct 21 13:02:19 uii316 [ 1766.831234]
>>> Oct 21 13:02:19 uii316 [ 1766.831238] Pid: 7134, comm: saposcol Not 
>>> tainted 
>>> 3.0.101-0.47.67-default #1
>>> Oct 21 13:02:19 uii316 HP ProLiant BL460c G1
>>> Oct 21 13:02:19 uii316
>>> Oct 21 13:02:19 uii316 [ 1766.831247] RIP: 0010:[]
>>> Oct 21 13:02:19 uii316 [] 
>>> ocfs2_truncate_file+0xa5/0x490 
>>> [ocfs2]
>>> Oct 21 13:02:19 uii316 [ 1766.831312] RSP: 0018:880f39d79b68  
>>> EFLAGS: 
>>> 00010296
>>> Oct 21 13:02:19 uii316 [ 1766.831321] RAX: 008f RBX: 
>>> 880f39c5e240 RCX: 39fd
>>> Oct 21 13:02:19 uii316 [ 1766.831326] RDX:  RSI: 
>>> 0007 RDI: 0246
>>> Oct 21 13:02:19 uii316 [ 1766.831331] RBP: 1000 R08: 
>>> 81da0ac0 R09: 
>>> Oct 21 13:02:19 uii316 [ 1766.831336] R10: 0003 R11: 
>>>  R12: 880f3949bc78
>>> Oct 21 13:02:19 uii316 [ 1766.831342] R13: 880f39c5e888 R14: 
>>> 880f3d481000 R15: 000e43bc
>>> Oct 21 13:02:19 uii316 [ 1766.831347] FS:  7f11cda9d720() 
>>> GS:880fefd4() knlGS:
>>> Oct 21 13:02:19 uii316 [ 1766.831353] CS:  0010 DS:  ES:  CR0: 
>>> 80050033
>>> Oct 21 13:02:19 uii316 [ 1766.831358] CR2: 7f11cdad4000 CR3: 
>>> 000f39d35000 CR4: 07e0
>>> Oct 21 13:02:19 uii316 [ 1766.831363] 

Re: [Ocfs2-devel] kernel BUG in function ocfs2_truncate_file

2016-04-21 Thread Joseph Qi
Hi Gang,

On 2016/4/21 17:50, Gang He wrote:
> 
> 
> 

>> Hi Gang,
>> May be. If so, it seems that it has relations with locks.
>> Can we know which data is newer?
> Hi Joseph, I do not get your question. 
> About this bug, the customer only have a panic message, no message/kernel 
> dump is provided.
> 
Filesystem not corrupted doesn't mean the disk inode is right (the latest).
That means the in-memory inode may be right but happens that flush hasn't
been done successfully.
But as of now, we can't distinguish which the case is.

Thanks,
Joseph

> 
> Thanks
> Gang
> 
>>
>> Thanks,
>> Joseph
>>
>> On 2016/4/21 15:59, Gang He wrote:
>>> Hello Guys, 
>>>
>>> Sorry for delayed reply, the fsck log was reported from the customer.
>>> OCFS2 volumes to check are:
>>> /dev/dm-11   83886080 44540260 39345820 54% /sapmnt
>>> /dev/dm-14   62914560 15374960 47539600 25% /usr/sap
>>> /dev/dm-13   89128960 19976272 69152688 23% 
>> /usr/sap/trans
>>> uii316:~ # fsck.ocfs2 /dev/dm-13
>>> fsck.ocfs2 1.8.2
>>> Checking OCFS2 filesystem in /dev/dm-13:
>>>  Label:
>>>  UUID: E35AF95C8114494391E0FFDEFD07309B
>>>  Number of blocks: 22282240
>>>  Block size: 4096
>>>  Number of clusters: 22282240
>>>  Cluster size: 4096
>>>  Number of slots: 2
>>> /dev/dm-13 is clean. It will be checked after 20 additional mounts.
>>> uii316:~ # fsck.ocfs2 /dev/dm-14
>>> fsck.ocfs2 1.8.2
>>> Checking OCFS2 filesystem in /dev/dm-14: 
>>>  Label:
>>>  UUID: 43F2D9C8D70B47388527075F3C6C38BB
>>>  Number of blocks: 15728640
>>>  Block size: 4096
>>>  Number of clusters: 15728640
>>>  Cluster size: 4096
>>>  Number of slots: 2
>>> /dev/dm-14 is clean. It will be checked after 20 additional mounts.
>>> uii316:~ # fsck.ocfs2 /dev/dm-11
>>> fsck.ocfs2 1.8.2
>>> Checking OCFS2 filesystem in /dev/dm-11:
>>>  Label:
>>>  UUID: 0F1CBE6017934B5E8AD80149ED332659
>>>  Number of blocks: 20971520
>>>  Block size: 4096
>>>  Number of clusters: 20971520
>>>  Cluster size: 4096
>>>  Number of slots: 2
>>> /dev/dm-11 is clean. It will be checked after 20 additional mounts.
>>>
>>> >From the log, it looks the file systems were not corrupted, this should be 
>>> >a 
>> race-condition issue ?!
>>>
>>> Thanks
>>> Gang 
>>>
>>>
>>
 On 03/31/2016 10:56 AM, Gang He wrote:
> Hello Joseph and Junxiao,
>
> Did you encounter this issue in the past? I doubt this is possible a race 
 condition bug (rather than data inconsistency).
 Never saw this. fsck report any corruption?

 Thanks,
 Junxiao.
>
> Thanks
> Gang
>
>

>> Hello Guys,
>>
>> I got a bug, which reported a kernel BUG in function ocfs2_truncate_file,
>> Base on my initial analysis, this bug looks like a race condition 
>> problem.
>> Unfortunately, there was no kernel crash dump caught, just got some 
>> kernel 
>> log as below,
>>
>> kernel BUG at /usr/src/packages/BUILD/ocfs2-1.6/default/ocfs2/file.c:466!
>> Oct 21 13:02:19 uii316 [ 1766.831230] Supported: Yes
>> Oct 21 13:02:19 uii316 [ 1766.831234]
>> Oct 21 13:02:19 uii316 [ 1766.831238] Pid: 7134, comm: saposcol Not 
>> tainted 
>> 3.0.101-0.47.67-default #1
>> Oct 21 13:02:19 uii316 HP ProLiant BL460c G1
>> Oct 21 13:02:19 uii316
>> Oct 21 13:02:19 uii316 [ 1766.831247] RIP: 0010:[]
>> Oct 21 13:02:19 uii316 [] 
>> ocfs2_truncate_file+0xa5/0x490 
>> [ocfs2]
>> Oct 21 13:02:19 uii316 [ 1766.831312] RSP: 0018:880f39d79b68  
>> EFLAGS: 
>> 00010296
>> Oct 21 13:02:19 uii316 [ 1766.831321] RAX: 008f RBX: 
>> 880f39c5e240 RCX: 39fd
>> Oct 21 13:02:19 uii316 [ 1766.831326] RDX:  RSI: 
>> 0007 RDI: 0246
>> Oct 21 13:02:19 uii316 [ 1766.831331] RBP: 1000 R08: 
>> 81da0ac0 R09: 
>> Oct 21 13:02:19 uii316 [ 1766.831336] R10: 0003 R11: 
>>  R12: 880f3949bc78
>> Oct 21 13:02:19 uii316 [ 1766.831342] R13: 880f39c5e888 R14: 
>> 880f3d481000 R15: 000e43bc
>> Oct 21 13:02:19 uii316 [ 1766.831347] FS:  7f11cda9d720() 
>> GS:880fefd4() knlGS:
>> Oct 21 13:02:19 uii316 [ 1766.831353] CS:  0010 DS:  ES:  CR0: 
>> 80050033
>> Oct 21 13:02:19 uii316 [ 1766.831358] CR2: 7f11cdad4000 CR3: 
>> 000f39d35000 CR4: 07e0
>> Oct 21 13:02:19 uii316 [ 1766.831363] DR0:  DR1: 
>>  DR2: 
>> Oct 21 13:02:19 uii316 [ 1766.831368] DR3:  DR6: 
>> 0ff0 DR7: 0400
>> Oct 21 13:02:19 uii316 [ 1766.831374] Process saposcol (pid: 7134, 
>> threadinfo 880f39d78000, task 880f39c5e240)
>> Oct 21 13:02:19 

Re: [Ocfs2-devel] kernel BUG in function ocfs2_truncate_file

2016-04-21 Thread Gang He



>>> 
> Hi Gang,
> May be. If so, it seems that it has relations with locks.
> Can we know which data is newer?
Hi Joseph, I do not get your question. 
About this bug, the customer only have a panic message, no message/kernel dump 
is provided.


Thanks
Gang

> 
> Thanks,
> Joseph
> 
> On 2016/4/21 15:59, Gang He wrote:
>> Hello Guys, 
>> 
>> Sorry for delayed reply, the fsck log was reported from the customer.
>> OCFS2 volumes to check are:
>> /dev/dm-11   83886080 44540260 39345820 54% /sapmnt
>> /dev/dm-14   62914560 15374960 47539600 25% /usr/sap
>> /dev/dm-13   89128960 19976272 69152688 23% 
> /usr/sap/trans
>> uii316:~ # fsck.ocfs2 /dev/dm-13
>> fsck.ocfs2 1.8.2
>> Checking OCFS2 filesystem in /dev/dm-13:
>>  Label:
>>  UUID: E35AF95C8114494391E0FFDEFD07309B
>>  Number of blocks: 22282240
>>  Block size: 4096
>>  Number of clusters: 22282240
>>  Cluster size: 4096
>>  Number of slots: 2
>> /dev/dm-13 is clean. It will be checked after 20 additional mounts.
>> uii316:~ # fsck.ocfs2 /dev/dm-14
>> fsck.ocfs2 1.8.2
>> Checking OCFS2 filesystem in /dev/dm-14: 
>>  Label:
>>  UUID: 43F2D9C8D70B47388527075F3C6C38BB
>>  Number of blocks: 15728640
>>  Block size: 4096
>>  Number of clusters: 15728640
>>  Cluster size: 4096
>>  Number of slots: 2
>> /dev/dm-14 is clean. It will be checked after 20 additional mounts.
>> uii316:~ # fsck.ocfs2 /dev/dm-11
>> fsck.ocfs2 1.8.2
>> Checking OCFS2 filesystem in /dev/dm-11:
>>  Label:
>>  UUID: 0F1CBE6017934B5E8AD80149ED332659
>>  Number of blocks: 20971520
>>  Block size: 4096
>>  Number of clusters: 20971520
>>  Cluster size: 4096
>>  Number of slots: 2
>> /dev/dm-11 is clean. It will be checked after 20 additional mounts.
>> 
>>>From the log, it looks the file systems were not corrupted, this should be a 
> race-condition issue ?!
>> 
>> Thanks
>> Gang 
>> 
>> 
>
>>> On 03/31/2016 10:56 AM, Gang He wrote:
 Hello Joseph and Junxiao,

 Did you encounter this issue in the past? I doubt this is possible a race 
>>> condition bug (rather than data inconsistency).
>>> Never saw this. fsck report any corruption?
>>>
>>> Thanks,
>>> Junxiao.

 Thanks
 Gang


>>>
> Hello Guys,
>
> I got a bug, which reported a kernel BUG in function ocfs2_truncate_file,
> Base on my initial analysis, this bug looks like a race condition problem.
> Unfortunately, there was no kernel crash dump caught, just got some 
> kernel 
> log as below,
>
> kernel BUG at /usr/src/packages/BUILD/ocfs2-1.6/default/ocfs2/file.c:466!
> Oct 21 13:02:19 uii316 [ 1766.831230] Supported: Yes
> Oct 21 13:02:19 uii316 [ 1766.831234]
> Oct 21 13:02:19 uii316 [ 1766.831238] Pid: 7134, comm: saposcol Not 
> tainted 
> 3.0.101-0.47.67-default #1
> Oct 21 13:02:19 uii316 HP ProLiant BL460c G1
> Oct 21 13:02:19 uii316
> Oct 21 13:02:19 uii316 [ 1766.831247] RIP: 0010:[]
> Oct 21 13:02:19 uii316 [] 
> ocfs2_truncate_file+0xa5/0x490 
> [ocfs2]
> Oct 21 13:02:19 uii316 [ 1766.831312] RSP: 0018:880f39d79b68  EFLAGS: 
> 00010296
> Oct 21 13:02:19 uii316 [ 1766.831321] RAX: 008f RBX: 
> 880f39c5e240 RCX: 39fd
> Oct 21 13:02:19 uii316 [ 1766.831326] RDX:  RSI: 
> 0007 RDI: 0246
> Oct 21 13:02:19 uii316 [ 1766.831331] RBP: 1000 R08: 
> 81da0ac0 R09: 
> Oct 21 13:02:19 uii316 [ 1766.831336] R10: 0003 R11: 
>  R12: 880f3949bc78
> Oct 21 13:02:19 uii316 [ 1766.831342] R13: 880f39c5e888 R14: 
> 880f3d481000 R15: 000e43bc
> Oct 21 13:02:19 uii316 [ 1766.831347] FS:  7f11cda9d720() 
> GS:880fefd4() knlGS:
> Oct 21 13:02:19 uii316 [ 1766.831353] CS:  0010 DS:  ES:  CR0: 
> 80050033
> Oct 21 13:02:19 uii316 [ 1766.831358] CR2: 7f11cdad4000 CR3: 
> 000f39d35000 CR4: 07e0
> Oct 21 13:02:19 uii316 [ 1766.831363] DR0:  DR1: 
>  DR2: 
> Oct 21 13:02:19 uii316 [ 1766.831368] DR3:  DR6: 
> 0ff0 DR7: 0400
> Oct 21 13:02:19 uii316 [ 1766.831374] Process saposcol (pid: 7134, 
> threadinfo 880f39d78000, task 880f39c5e240)
> Oct 21 13:02:19 uii316 [ 1766.831379] Stack:
> Oct 21 13:02:19 uii316 [ 1766.831383]  0002433c
> Oct 21 13:02:19 uii316 000e43bc
> Oct 21 13:02:19 uii316 000eab40
> Oct 21 13:02:19 uii316 880f0001
> Oct 21 13:02:19 uii316
> Oct 21 13:02:19 uii316 [ 1766.831397]  880f394956e0
> Oct 21 13:02:19 uii316 880f8e0d1000
> Oct 21 13:02:19 uii316 880f3949b800
> Oct 21 13:02:19 uii316 

Re: [Ocfs2-devel] kernel BUG in function ocfs2_truncate_file

2016-04-21 Thread Joseph Qi
Hi Gang,
May be. If so, it seems that it has relations with locks.
Can we know which data is newer?

Thanks,
Joseph

On 2016/4/21 15:59, Gang He wrote:
> Hello Guys, 
> 
> Sorry for delayed reply, the fsck log was reported from the customer.
> OCFS2 volumes to check are:
> /dev/dm-11   83886080 44540260 39345820 54% /sapmnt
> /dev/dm-14   62914560 15374960 47539600 25% /usr/sap
> /dev/dm-13   89128960 19976272 69152688 23% 
> /usr/sap/trans
> uii316:~ # fsck.ocfs2 /dev/dm-13
> fsck.ocfs2 1.8.2
> Checking OCFS2 filesystem in /dev/dm-13:
>  Label:
>  UUID: E35AF95C8114494391E0FFDEFD07309B
>  Number of blocks: 22282240
>  Block size: 4096
>  Number of clusters: 22282240
>  Cluster size: 4096
>  Number of slots: 2
> /dev/dm-13 is clean. It will be checked after 20 additional mounts.
> uii316:~ # fsck.ocfs2 /dev/dm-14
> fsck.ocfs2 1.8.2
> Checking OCFS2 filesystem in /dev/dm-14: 
>  Label:
>  UUID: 43F2D9C8D70B47388527075F3C6C38BB
>  Number of blocks: 15728640
>  Block size: 4096
>  Number of clusters: 15728640
>  Cluster size: 4096
>  Number of slots: 2
> /dev/dm-14 is clean. It will be checked after 20 additional mounts.
> uii316:~ # fsck.ocfs2 /dev/dm-11
> fsck.ocfs2 1.8.2
> Checking OCFS2 filesystem in /dev/dm-11:
>  Label:
>  UUID: 0F1CBE6017934B5E8AD80149ED332659
>  Number of blocks: 20971520
>  Block size: 4096
>  Number of clusters: 20971520
>  Cluster size: 4096
>  Number of slots: 2
> /dev/dm-11 is clean. It will be checked after 20 additional mounts.
> 
>>From the log, it looks the file systems were not corrupted, this should be a 
>>race-condition issue ?!
> 
> Thanks
> Gang 
> 
> 

>> On 03/31/2016 10:56 AM, Gang He wrote:
>>> Hello Joseph and Junxiao,
>>>
>>> Did you encounter this issue in the past? I doubt this is possible a race 
>> condition bug (rather than data inconsistency).
>> Never saw this. fsck report any corruption?
>>
>> Thanks,
>> Junxiao.
>>>
>>> Thanks
>>> Gang
>>>
>>>
>>
 Hello Guys,

 I got a bug, which reported a kernel BUG in function ocfs2_truncate_file,
 Base on my initial analysis, this bug looks like a race condition problem.
 Unfortunately, there was no kernel crash dump caught, just got some kernel 
 log as below,

 kernel BUG at /usr/src/packages/BUILD/ocfs2-1.6/default/ocfs2/file.c:466!
 Oct 21 13:02:19 uii316 [ 1766.831230] Supported: Yes
 Oct 21 13:02:19 uii316 [ 1766.831234]
 Oct 21 13:02:19 uii316 [ 1766.831238] Pid: 7134, comm: saposcol Not 
 tainted 
 3.0.101-0.47.67-default #1
 Oct 21 13:02:19 uii316 HP ProLiant BL460c G1
 Oct 21 13:02:19 uii316
 Oct 21 13:02:19 uii316 [ 1766.831247] RIP: 0010:[]
 Oct 21 13:02:19 uii316 [] ocfs2_truncate_file+0xa5/0x490 
 [ocfs2]
 Oct 21 13:02:19 uii316 [ 1766.831312] RSP: 0018:880f39d79b68  EFLAGS: 
 00010296
 Oct 21 13:02:19 uii316 [ 1766.831321] RAX: 008f RBX: 
 880f39c5e240 RCX: 39fd
 Oct 21 13:02:19 uii316 [ 1766.831326] RDX:  RSI: 
 0007 RDI: 0246
 Oct 21 13:02:19 uii316 [ 1766.831331] RBP: 1000 R08: 
 81da0ac0 R09: 
 Oct 21 13:02:19 uii316 [ 1766.831336] R10: 0003 R11: 
  R12: 880f3949bc78
 Oct 21 13:02:19 uii316 [ 1766.831342] R13: 880f39c5e888 R14: 
 880f3d481000 R15: 000e43bc
 Oct 21 13:02:19 uii316 [ 1766.831347] FS:  7f11cda9d720() 
 GS:880fefd4() knlGS:
 Oct 21 13:02:19 uii316 [ 1766.831353] CS:  0010 DS:  ES:  CR0: 
 80050033
 Oct 21 13:02:19 uii316 [ 1766.831358] CR2: 7f11cdad4000 CR3: 
 000f39d35000 CR4: 07e0
 Oct 21 13:02:19 uii316 [ 1766.831363] DR0:  DR1: 
  DR2: 
 Oct 21 13:02:19 uii316 [ 1766.831368] DR3:  DR6: 
 0ff0 DR7: 0400
 Oct 21 13:02:19 uii316 [ 1766.831374] Process saposcol (pid: 7134, 
 threadinfo 880f39d78000, task 880f39c5e240)
 Oct 21 13:02:19 uii316 [ 1766.831379] Stack:
 Oct 21 13:02:19 uii316 [ 1766.831383]  0002433c
 Oct 21 13:02:19 uii316 000e43bc
 Oct 21 13:02:19 uii316 000eab40
 Oct 21 13:02:19 uii316 880f0001
 Oct 21 13:02:19 uii316
 Oct 21 13:02:19 uii316 [ 1766.831397]  880f394956e0
 Oct 21 13:02:19 uii316 880f8e0d1000
 Oct 21 13:02:19 uii316 880f3949b800
 Oct 21 13:02:19 uii316 
 Oct 21 13:02:19 uii316
 Oct 21 13:02:19 uii316 [ 1766.831410]  880f39454980
 Oct 21 13:02:19 uii316 0001
 Oct 21 13:02:19 uii316 0002433c
 Oct 21 13:02:19 uii316 0008
 Oct 21 13:02:19 uii316
 Oct 21 13:02:19 uii316 [ 

Re: [Ocfs2-devel] kernel BUG in function ocfs2_truncate_file

2016-04-21 Thread Gang He
Hello Guys, 

Sorry for delayed reply, the fsck log was reported from the customer.
OCFS2 volumes to check are:
/dev/dm-11   83886080 44540260 39345820 54% /sapmnt
/dev/dm-14   62914560 15374960 47539600 25% /usr/sap
/dev/dm-13   89128960 19976272 69152688 23% 
/usr/sap/trans
uii316:~ # fsck.ocfs2 /dev/dm-13
fsck.ocfs2 1.8.2
Checking OCFS2 filesystem in /dev/dm-13:
 Label:
 UUID: E35AF95C8114494391E0FFDEFD07309B
 Number of blocks: 22282240
 Block size: 4096
 Number of clusters: 22282240
 Cluster size: 4096
 Number of slots: 2
/dev/dm-13 is clean. It will be checked after 20 additional mounts.
uii316:~ # fsck.ocfs2 /dev/dm-14
fsck.ocfs2 1.8.2
Checking OCFS2 filesystem in /dev/dm-14: 
 Label:
 UUID: 43F2D9C8D70B47388527075F3C6C38BB
 Number of blocks: 15728640
 Block size: 4096
 Number of clusters: 15728640
 Cluster size: 4096
 Number of slots: 2
/dev/dm-14 is clean. It will be checked after 20 additional mounts.
uii316:~ # fsck.ocfs2 /dev/dm-11
fsck.ocfs2 1.8.2
Checking OCFS2 filesystem in /dev/dm-11:
 Label:
 UUID: 0F1CBE6017934B5E8AD80149ED332659
 Number of blocks: 20971520
 Block size: 4096
 Number of clusters: 20971520
 Cluster size: 4096
 Number of slots: 2
/dev/dm-11 is clean. It will be checked after 20 additional mounts.

>From the log, it looks the file systems were not corrupted, this should be a 
>race-condition issue ?!

Thanks
Gang 


>>> 
> On 03/31/2016 10:56 AM, Gang He wrote:
>> Hello Joseph and Junxiao,
>> 
>> Did you encounter this issue in the past? I doubt this is possible a race 
> condition bug (rather than data inconsistency).
> Never saw this. fsck report any corruption?
> 
> Thanks,
> Junxiao.
>> 
>> Thanks
>> Gang
>> 
>> 
>
>>> Hello Guys,
>>>
>>> I got a bug, which reported a kernel BUG in function ocfs2_truncate_file,
>>> Base on my initial analysis, this bug looks like a race condition problem.
>>> Unfortunately, there was no kernel crash dump caught, just got some kernel 
>>> log as below,
>>>
>>> kernel BUG at /usr/src/packages/BUILD/ocfs2-1.6/default/ocfs2/file.c:466!
>>> Oct 21 13:02:19 uii316 [ 1766.831230] Supported: Yes
>>> Oct 21 13:02:19 uii316 [ 1766.831234]
>>> Oct 21 13:02:19 uii316 [ 1766.831238] Pid: 7134, comm: saposcol Not tainted 
>>> 3.0.101-0.47.67-default #1
>>> Oct 21 13:02:19 uii316 HP ProLiant BL460c G1
>>> Oct 21 13:02:19 uii316
>>> Oct 21 13:02:19 uii316 [ 1766.831247] RIP: 0010:[]
>>> Oct 21 13:02:19 uii316 [] ocfs2_truncate_file+0xa5/0x490 
>>> [ocfs2]
>>> Oct 21 13:02:19 uii316 [ 1766.831312] RSP: 0018:880f39d79b68  EFLAGS: 
>>> 00010296
>>> Oct 21 13:02:19 uii316 [ 1766.831321] RAX: 008f RBX: 
>>> 880f39c5e240 RCX: 39fd
>>> Oct 21 13:02:19 uii316 [ 1766.831326] RDX:  RSI: 
>>> 0007 RDI: 0246
>>> Oct 21 13:02:19 uii316 [ 1766.831331] RBP: 1000 R08: 
>>> 81da0ac0 R09: 
>>> Oct 21 13:02:19 uii316 [ 1766.831336] R10: 0003 R11: 
>>>  R12: 880f3949bc78
>>> Oct 21 13:02:19 uii316 [ 1766.831342] R13: 880f39c5e888 R14: 
>>> 880f3d481000 R15: 000e43bc
>>> Oct 21 13:02:19 uii316 [ 1766.831347] FS:  7f11cda9d720() 
>>> GS:880fefd4() knlGS:
>>> Oct 21 13:02:19 uii316 [ 1766.831353] CS:  0010 DS:  ES:  CR0: 
>>> 80050033
>>> Oct 21 13:02:19 uii316 [ 1766.831358] CR2: 7f11cdad4000 CR3: 
>>> 000f39d35000 CR4: 07e0
>>> Oct 21 13:02:19 uii316 [ 1766.831363] DR0:  DR1: 
>>>  DR2: 
>>> Oct 21 13:02:19 uii316 [ 1766.831368] DR3:  DR6: 
>>> 0ff0 DR7: 0400
>>> Oct 21 13:02:19 uii316 [ 1766.831374] Process saposcol (pid: 7134, 
>>> threadinfo 880f39d78000, task 880f39c5e240)
>>> Oct 21 13:02:19 uii316 [ 1766.831379] Stack:
>>> Oct 21 13:02:19 uii316 [ 1766.831383]  0002433c
>>> Oct 21 13:02:19 uii316 000e43bc
>>> Oct 21 13:02:19 uii316 000eab40
>>> Oct 21 13:02:19 uii316 880f0001
>>> Oct 21 13:02:19 uii316
>>> Oct 21 13:02:19 uii316 [ 1766.831397]  880f394956e0
>>> Oct 21 13:02:19 uii316 880f8e0d1000
>>> Oct 21 13:02:19 uii316 880f3949b800
>>> Oct 21 13:02:19 uii316 
>>> Oct 21 13:02:19 uii316
>>> Oct 21 13:02:19 uii316 [ 1766.831410]  880f39454980
>>> Oct 21 13:02:19 uii316 0001
>>> Oct 21 13:02:19 uii316 0002433c
>>> Oct 21 13:02:19 uii316 0008
>>> Oct 21 13:02:19 uii316
>>> Oct 21 13:02:19 uii316 [ 1766.831423] Call Trace:
>>> Oct 21 13:02:19 uii316 [ 1766.831492]  [] 
>>> ocfs2_setattr+0x26e/0xa90 [ocfs2]
>>> Oct 21 13:02:19 uii316 [ 1766.831522]  [] 
>>> notify_change+0x19f/0x2f0
>>> Oct 21 13:02:19 uii316 [ 1766.831534]  [] 
>>> do_truncate+0x57/0x80
>>> Oct 21 13:02:19 uii316 [ 1766.831544]  [] 
>>> do_last+0x603/0x800

Re: [Ocfs2-devel] kernel BUG in function ocfs2_truncate_file

2016-03-31 Thread Gang He
Hello Joseph and Junxiao,

Unfortunately, there were not any kernel dump/message/fsck report, just 
happened twice at the customer environment.
The customer kept the last log as below, the customer used SLES11SP3 for x86_64 
(3.0.x-xx).

Thanks
Gang 


>>> 
> Hi Gang,
> I haven't found any related information about this BUG.
> Which kernel version are you using? It seems inode size mismatch between
> disk and memory, so any further log about these?
> 
> Thanks,
> Joseph
> 
> On 2016/3/31 10:56, Gang He wrote:
>> Hello Joseph and Junxiao,
>> 
>> Did you encounter this issue in the past? I doubt this is possible a race 
> condition bug (rather than data inconsistency).
>> 
>> Thanks
>> Gang
>> 
>> 
>
>>> Hello Guys,
>>>
>>> I got a bug, which reported a kernel BUG in function ocfs2_truncate_file,
>>> Base on my initial analysis, this bug looks like a race condition problem.
>>> Unfortunately, there was no kernel crash dump caught, just got some kernel 
>>> log as below,
>>>
>>> kernel BUG at /usr/src/packages/BUILD/ocfs2-1.6/default/ocfs2/file.c:466!
>>> Oct 21 13:02:19 uii316 [ 1766.831230] Supported: Yes
>>> Oct 21 13:02:19 uii316 [ 1766.831234]
>>> Oct 21 13:02:19 uii316 [ 1766.831238] Pid: 7134, comm: saposcol Not tainted 
>>> 3.0.101-0.47.67-default #1
>>> Oct 21 13:02:19 uii316 HP ProLiant BL460c G1
>>> Oct 21 13:02:19 uii316
>>> Oct 21 13:02:19 uii316 [ 1766.831247] RIP: 0010:[]
>>> Oct 21 13:02:19 uii316 [] ocfs2_truncate_file+0xa5/0x490 
>>> [ocfs2]
>>> Oct 21 13:02:19 uii316 [ 1766.831312] RSP: 0018:880f39d79b68  EFLAGS: 
>>> 00010296
>>> Oct 21 13:02:19 uii316 [ 1766.831321] RAX: 008f RBX: 
>>> 880f39c5e240 RCX: 39fd
>>> Oct 21 13:02:19 uii316 [ 1766.831326] RDX:  RSI: 
>>> 0007 RDI: 0246
>>> Oct 21 13:02:19 uii316 [ 1766.831331] RBP: 1000 R08: 
>>> 81da0ac0 R09: 
>>> Oct 21 13:02:19 uii316 [ 1766.831336] R10: 0003 R11: 
>>>  R12: 880f3949bc78
>>> Oct 21 13:02:19 uii316 [ 1766.831342] R13: 880f39c5e888 R14: 
>>> 880f3d481000 R15: 000e43bc
>>> Oct 21 13:02:19 uii316 [ 1766.831347] FS:  7f11cda9d720() 
>>> GS:880fefd4() knlGS:
>>> Oct 21 13:02:19 uii316 [ 1766.831353] CS:  0010 DS:  ES:  CR0: 
>>> 80050033
>>> Oct 21 13:02:19 uii316 [ 1766.831358] CR2: 7f11cdad4000 CR3: 
>>> 000f39d35000 CR4: 07e0
>>> Oct 21 13:02:19 uii316 [ 1766.831363] DR0:  DR1: 
>>>  DR2: 
>>> Oct 21 13:02:19 uii316 [ 1766.831368] DR3:  DR6: 
>>> 0ff0 DR7: 0400
>>> Oct 21 13:02:19 uii316 [ 1766.831374] Process saposcol (pid: 7134, 
>>> threadinfo 880f39d78000, task 880f39c5e240)
>>> Oct 21 13:02:19 uii316 [ 1766.831379] Stack:
>>> Oct 21 13:02:19 uii316 [ 1766.831383]  0002433c
>>> Oct 21 13:02:19 uii316 000e43bc
>>> Oct 21 13:02:19 uii316 000eab40
>>> Oct 21 13:02:19 uii316 880f0001
>>> Oct 21 13:02:19 uii316
>>> Oct 21 13:02:19 uii316 [ 1766.831397]  880f394956e0
>>> Oct 21 13:02:19 uii316 880f8e0d1000
>>> Oct 21 13:02:19 uii316 880f3949b800
>>> Oct 21 13:02:19 uii316 
>>> Oct 21 13:02:19 uii316
>>> Oct 21 13:02:19 uii316 [ 1766.831410]  880f39454980
>>> Oct 21 13:02:19 uii316 0001
>>> Oct 21 13:02:19 uii316 0002433c
>>> Oct 21 13:02:19 uii316 0008
>>> Oct 21 13:02:19 uii316
>>> Oct 21 13:02:19 uii316 [ 1766.831423] Call Trace:
>>> Oct 21 13:02:19 uii316 [ 1766.831492]  [] 
>>> ocfs2_setattr+0x26e/0xa90 [ocfs2]
>>> Oct 21 13:02:19 uii316 [ 1766.831522]  [] 
>>> notify_change+0x19f/0x2f0
>>> Oct 21 13:02:19 uii316 [ 1766.831534]  [] 
>>> do_truncate+0x57/0x80
>>> Oct 21 13:02:19 uii316 [ 1766.831544]  [] 
>>> do_last+0x603/0x800
>>> Oct 21 13:02:19 uii316 [ 1766.831551]  [] 
>>> path_openat+0xd9/0x420
>>> Oct 21 13:02:19 uii316 [ 1766.831558]  [] 
>>> do_filp_open+0x4c/0xc0
>>> Oct 21 13:02:19 uii316 [ 1766.831566]  [] 
>>> do_sys_open+0x17f/0x250
>>> Oct 21 13:02:19 uii316 [ 1766.831575]  [] 
>>> system_call_fastpath+0x16/0x1b
>>> Oct 21 13:02:19 uii316 [ 1766.831588]  [<7f11ccb07080>] 0x7f11ccb0707f
>>> Oct 21 13:02:19 uii316 [ 1766.831592] Code:
>>>
>>>  The source code in question is as below,
>>>  444 static int ocfs2_truncate_file(struct inode *inode,
>>>  445struct buffer_head *di_bh,
>>>  446u64 new_i_size)
>>>  447 {
>>>  448 int status = 0;
>>>  449 struct ocfs2_dinode *fe = NULL;
>>>  450 struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>>  451
>>>  452 /* We trust di_bh because it comes from ocfs2_inode_lock(), 
>>> which
>>>  453  * already validated it */
>>>  454 fe = (struct ocfs2_dinode *) di_bh->b_data;
>>>  455
>>>  456 trace_ocfs2_truncate_file((unsigned long 
>>> 

Re: [Ocfs2-devel] kernel BUG in function ocfs2_truncate_file

2016-03-30 Thread Joseph Qi
Hi Gang,
I haven't found any related information about this BUG.
Which kernel version are you using? It seems inode size mismatch between
disk and memory, so any further log about these?

Thanks,
Joseph

On 2016/3/31 10:56, Gang He wrote:
> Hello Joseph and Junxiao,
> 
> Did you encounter this issue in the past? I doubt this is possible a race 
> condition bug (rather than data inconsistency).
> 
> Thanks
> Gang
> 
> 

>> Hello Guys,
>>
>> I got a bug, which reported a kernel BUG in function ocfs2_truncate_file,
>> Base on my initial analysis, this bug looks like a race condition problem.
>> Unfortunately, there was no kernel crash dump caught, just got some kernel 
>> log as below,
>>
>> kernel BUG at /usr/src/packages/BUILD/ocfs2-1.6/default/ocfs2/file.c:466!
>> Oct 21 13:02:19 uii316 [ 1766.831230] Supported: Yes
>> Oct 21 13:02:19 uii316 [ 1766.831234]
>> Oct 21 13:02:19 uii316 [ 1766.831238] Pid: 7134, comm: saposcol Not tainted 
>> 3.0.101-0.47.67-default #1
>> Oct 21 13:02:19 uii316 HP ProLiant BL460c G1
>> Oct 21 13:02:19 uii316
>> Oct 21 13:02:19 uii316 [ 1766.831247] RIP: 0010:[]
>> Oct 21 13:02:19 uii316 [] ocfs2_truncate_file+0xa5/0x490 
>> [ocfs2]
>> Oct 21 13:02:19 uii316 [ 1766.831312] RSP: 0018:880f39d79b68  EFLAGS: 
>> 00010296
>> Oct 21 13:02:19 uii316 [ 1766.831321] RAX: 008f RBX: 
>> 880f39c5e240 RCX: 39fd
>> Oct 21 13:02:19 uii316 [ 1766.831326] RDX:  RSI: 
>> 0007 RDI: 0246
>> Oct 21 13:02:19 uii316 [ 1766.831331] RBP: 1000 R08: 
>> 81da0ac0 R09: 
>> Oct 21 13:02:19 uii316 [ 1766.831336] R10: 0003 R11: 
>>  R12: 880f3949bc78
>> Oct 21 13:02:19 uii316 [ 1766.831342] R13: 880f39c5e888 R14: 
>> 880f3d481000 R15: 000e43bc
>> Oct 21 13:02:19 uii316 [ 1766.831347] FS:  7f11cda9d720() 
>> GS:880fefd4() knlGS:
>> Oct 21 13:02:19 uii316 [ 1766.831353] CS:  0010 DS:  ES:  CR0: 
>> 80050033
>> Oct 21 13:02:19 uii316 [ 1766.831358] CR2: 7f11cdad4000 CR3: 
>> 000f39d35000 CR4: 07e0
>> Oct 21 13:02:19 uii316 [ 1766.831363] DR0:  DR1: 
>>  DR2: 
>> Oct 21 13:02:19 uii316 [ 1766.831368] DR3:  DR6: 
>> 0ff0 DR7: 0400
>> Oct 21 13:02:19 uii316 [ 1766.831374] Process saposcol (pid: 7134, 
>> threadinfo 880f39d78000, task 880f39c5e240)
>> Oct 21 13:02:19 uii316 [ 1766.831379] Stack:
>> Oct 21 13:02:19 uii316 [ 1766.831383]  0002433c
>> Oct 21 13:02:19 uii316 000e43bc
>> Oct 21 13:02:19 uii316 000eab40
>> Oct 21 13:02:19 uii316 880f0001
>> Oct 21 13:02:19 uii316
>> Oct 21 13:02:19 uii316 [ 1766.831397]  880f394956e0
>> Oct 21 13:02:19 uii316 880f8e0d1000
>> Oct 21 13:02:19 uii316 880f3949b800
>> Oct 21 13:02:19 uii316 
>> Oct 21 13:02:19 uii316
>> Oct 21 13:02:19 uii316 [ 1766.831410]  880f39454980
>> Oct 21 13:02:19 uii316 0001
>> Oct 21 13:02:19 uii316 0002433c
>> Oct 21 13:02:19 uii316 0008
>> Oct 21 13:02:19 uii316
>> Oct 21 13:02:19 uii316 [ 1766.831423] Call Trace:
>> Oct 21 13:02:19 uii316 [ 1766.831492]  [] 
>> ocfs2_setattr+0x26e/0xa90 [ocfs2]
>> Oct 21 13:02:19 uii316 [ 1766.831522]  [] 
>> notify_change+0x19f/0x2f0
>> Oct 21 13:02:19 uii316 [ 1766.831534]  [] 
>> do_truncate+0x57/0x80
>> Oct 21 13:02:19 uii316 [ 1766.831544]  [] 
>> do_last+0x603/0x800
>> Oct 21 13:02:19 uii316 [ 1766.831551]  [] 
>> path_openat+0xd9/0x420
>> Oct 21 13:02:19 uii316 [ 1766.831558]  [] 
>> do_filp_open+0x4c/0xc0
>> Oct 21 13:02:19 uii316 [ 1766.831566]  [] 
>> do_sys_open+0x17f/0x250
>> Oct 21 13:02:19 uii316 [ 1766.831575]  [] 
>> system_call_fastpath+0x16/0x1b
>> Oct 21 13:02:19 uii316 [ 1766.831588]  [<7f11ccb07080>] 0x7f11ccb0707f
>> Oct 21 13:02:19 uii316 [ 1766.831592] Code:
>>
>>  The source code in question is as below,
>>  444 static int ocfs2_truncate_file(struct inode *inode,
>>  445struct buffer_head *di_bh,
>>  446u64 new_i_size)
>>  447 {
>>  448 int status = 0;
>>  449 struct ocfs2_dinode *fe = NULL;
>>  450 struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>  451
>>  452 /* We trust di_bh because it comes from ocfs2_inode_lock(), 
>> which
>>  453  * already validated it */
>>  454 fe = (struct ocfs2_dinode *) di_bh->b_data;
>>  455
>>  456 trace_ocfs2_truncate_file((unsigned long 
>> long)OCFS2_I(inode)->ip_blkno,
>>  457   (unsigned long 
>> long)le64_to_cpu(fe->i_size),
>>  458   (unsigned long long)new_i_size);
>>  459
>>  460 mlog_bug_on_msg(le64_to_cpu(fe->i_size) != i_size_read(inode),  
>>   
>>  <<= here
>>  461 "Inode %llu, inode i_size = %lld != di "
>>  

Re: [Ocfs2-devel] kernel BUG in function ocfs2_truncate_file

2016-03-30 Thread Junxiao Bi
On 03/31/2016 10:56 AM, Gang He wrote:
> Hello Joseph and Junxiao,
> 
> Did you encounter this issue in the past? I doubt this is possible a race 
> condition bug (rather than data inconsistency).
Never saw this. fsck report any corruption?

Thanks,
Junxiao.
> 
> Thanks
> Gang
> 
> 

>> Hello Guys,
>>
>> I got a bug, which reported a kernel BUG in function ocfs2_truncate_file,
>> Base on my initial analysis, this bug looks like a race condition problem.
>> Unfortunately, there was no kernel crash dump caught, just got some kernel 
>> log as below,
>>
>> kernel BUG at /usr/src/packages/BUILD/ocfs2-1.6/default/ocfs2/file.c:466!
>> Oct 21 13:02:19 uii316 [ 1766.831230] Supported: Yes
>> Oct 21 13:02:19 uii316 [ 1766.831234]
>> Oct 21 13:02:19 uii316 [ 1766.831238] Pid: 7134, comm: saposcol Not tainted 
>> 3.0.101-0.47.67-default #1
>> Oct 21 13:02:19 uii316 HP ProLiant BL460c G1
>> Oct 21 13:02:19 uii316
>> Oct 21 13:02:19 uii316 [ 1766.831247] RIP: 0010:[]
>> Oct 21 13:02:19 uii316 [] ocfs2_truncate_file+0xa5/0x490 
>> [ocfs2]
>> Oct 21 13:02:19 uii316 [ 1766.831312] RSP: 0018:880f39d79b68  EFLAGS: 
>> 00010296
>> Oct 21 13:02:19 uii316 [ 1766.831321] RAX: 008f RBX: 
>> 880f39c5e240 RCX: 39fd
>> Oct 21 13:02:19 uii316 [ 1766.831326] RDX:  RSI: 
>> 0007 RDI: 0246
>> Oct 21 13:02:19 uii316 [ 1766.831331] RBP: 1000 R08: 
>> 81da0ac0 R09: 
>> Oct 21 13:02:19 uii316 [ 1766.831336] R10: 0003 R11: 
>>  R12: 880f3949bc78
>> Oct 21 13:02:19 uii316 [ 1766.831342] R13: 880f39c5e888 R14: 
>> 880f3d481000 R15: 000e43bc
>> Oct 21 13:02:19 uii316 [ 1766.831347] FS:  7f11cda9d720() 
>> GS:880fefd4() knlGS:
>> Oct 21 13:02:19 uii316 [ 1766.831353] CS:  0010 DS:  ES:  CR0: 
>> 80050033
>> Oct 21 13:02:19 uii316 [ 1766.831358] CR2: 7f11cdad4000 CR3: 
>> 000f39d35000 CR4: 07e0
>> Oct 21 13:02:19 uii316 [ 1766.831363] DR0:  DR1: 
>>  DR2: 
>> Oct 21 13:02:19 uii316 [ 1766.831368] DR3:  DR6: 
>> 0ff0 DR7: 0400
>> Oct 21 13:02:19 uii316 [ 1766.831374] Process saposcol (pid: 7134, 
>> threadinfo 880f39d78000, task 880f39c5e240)
>> Oct 21 13:02:19 uii316 [ 1766.831379] Stack:
>> Oct 21 13:02:19 uii316 [ 1766.831383]  0002433c
>> Oct 21 13:02:19 uii316 000e43bc
>> Oct 21 13:02:19 uii316 000eab40
>> Oct 21 13:02:19 uii316 880f0001
>> Oct 21 13:02:19 uii316
>> Oct 21 13:02:19 uii316 [ 1766.831397]  880f394956e0
>> Oct 21 13:02:19 uii316 880f8e0d1000
>> Oct 21 13:02:19 uii316 880f3949b800
>> Oct 21 13:02:19 uii316 
>> Oct 21 13:02:19 uii316
>> Oct 21 13:02:19 uii316 [ 1766.831410]  880f39454980
>> Oct 21 13:02:19 uii316 0001
>> Oct 21 13:02:19 uii316 0002433c
>> Oct 21 13:02:19 uii316 0008
>> Oct 21 13:02:19 uii316
>> Oct 21 13:02:19 uii316 [ 1766.831423] Call Trace:
>> Oct 21 13:02:19 uii316 [ 1766.831492]  [] 
>> ocfs2_setattr+0x26e/0xa90 [ocfs2]
>> Oct 21 13:02:19 uii316 [ 1766.831522]  [] 
>> notify_change+0x19f/0x2f0
>> Oct 21 13:02:19 uii316 [ 1766.831534]  [] 
>> do_truncate+0x57/0x80
>> Oct 21 13:02:19 uii316 [ 1766.831544]  [] 
>> do_last+0x603/0x800
>> Oct 21 13:02:19 uii316 [ 1766.831551]  [] 
>> path_openat+0xd9/0x420
>> Oct 21 13:02:19 uii316 [ 1766.831558]  [] 
>> do_filp_open+0x4c/0xc0
>> Oct 21 13:02:19 uii316 [ 1766.831566]  [] 
>> do_sys_open+0x17f/0x250
>> Oct 21 13:02:19 uii316 [ 1766.831575]  [] 
>> system_call_fastpath+0x16/0x1b
>> Oct 21 13:02:19 uii316 [ 1766.831588]  [<7f11ccb07080>] 0x7f11ccb0707f
>> Oct 21 13:02:19 uii316 [ 1766.831592] Code:
>>
>>  The source code in question is as below,
>>  444 static int ocfs2_truncate_file(struct inode *inode,
>>  445struct buffer_head *di_bh,
>>  446u64 new_i_size)
>>  447 {
>>  448 int status = 0;
>>  449 struct ocfs2_dinode *fe = NULL;
>>  450 struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>  451
>>  452 /* We trust di_bh because it comes from ocfs2_inode_lock(), 
>> which
>>  453  * already validated it */
>>  454 fe = (struct ocfs2_dinode *) di_bh->b_data;
>>  455
>>  456 trace_ocfs2_truncate_file((unsigned long 
>> long)OCFS2_I(inode)->ip_blkno,
>>  457   (unsigned long 
>> long)le64_to_cpu(fe->i_size),
>>  458   (unsigned long long)new_i_size);
>>  459
>>  460 mlog_bug_on_msg(le64_to_cpu(fe->i_size) != i_size_read(inode),  
>>   
>>  <<= here
>>  461 "Inode %llu, inode i_size = %lld != di "
>>  462 "i_size = %llu, i_flags = 0x%x\n",
>>  463 (unsigned long 

Re: [Ocfs2-devel] kernel BUG in function ocfs2_truncate_file

2016-03-30 Thread Gang He
Hello Joseph and Junxiao,

Did you encounter this issue in the past? I doubt this is possible a race 
condition bug (rather than data inconsistency).

Thanks
Gang


>>> 
> Hello Guys,
> 
> I got a bug, which reported a kernel BUG in function ocfs2_truncate_file,
> Base on my initial analysis, this bug looks like a race condition problem.
> Unfortunately, there was no kernel crash dump caught, just got some kernel 
> log as below,
> 
> kernel BUG at /usr/src/packages/BUILD/ocfs2-1.6/default/ocfs2/file.c:466!
> Oct 21 13:02:19 uii316 [ 1766.831230] Supported: Yes
> Oct 21 13:02:19 uii316 [ 1766.831234]
> Oct 21 13:02:19 uii316 [ 1766.831238] Pid: 7134, comm: saposcol Not tainted 
> 3.0.101-0.47.67-default #1
> Oct 21 13:02:19 uii316 HP ProLiant BL460c G1
> Oct 21 13:02:19 uii316
> Oct 21 13:02:19 uii316 [ 1766.831247] RIP: 0010:[]
> Oct 21 13:02:19 uii316 [] ocfs2_truncate_file+0xa5/0x490 
> [ocfs2]
> Oct 21 13:02:19 uii316 [ 1766.831312] RSP: 0018:880f39d79b68  EFLAGS: 
> 00010296
> Oct 21 13:02:19 uii316 [ 1766.831321] RAX: 008f RBX: 
> 880f39c5e240 RCX: 39fd
> Oct 21 13:02:19 uii316 [ 1766.831326] RDX:  RSI: 
> 0007 RDI: 0246
> Oct 21 13:02:19 uii316 [ 1766.831331] RBP: 1000 R08: 
> 81da0ac0 R09: 
> Oct 21 13:02:19 uii316 [ 1766.831336] R10: 0003 R11: 
>  R12: 880f3949bc78
> Oct 21 13:02:19 uii316 [ 1766.831342] R13: 880f39c5e888 R14: 
> 880f3d481000 R15: 000e43bc
> Oct 21 13:02:19 uii316 [ 1766.831347] FS:  7f11cda9d720() 
> GS:880fefd4() knlGS:
> Oct 21 13:02:19 uii316 [ 1766.831353] CS:  0010 DS:  ES:  CR0: 
> 80050033
> Oct 21 13:02:19 uii316 [ 1766.831358] CR2: 7f11cdad4000 CR3: 
> 000f39d35000 CR4: 07e0
> Oct 21 13:02:19 uii316 [ 1766.831363] DR0:  DR1: 
>  DR2: 
> Oct 21 13:02:19 uii316 [ 1766.831368] DR3:  DR6: 
> 0ff0 DR7: 0400
> Oct 21 13:02:19 uii316 [ 1766.831374] Process saposcol (pid: 7134, 
> threadinfo 880f39d78000, task 880f39c5e240)
> Oct 21 13:02:19 uii316 [ 1766.831379] Stack:
> Oct 21 13:02:19 uii316 [ 1766.831383]  0002433c
> Oct 21 13:02:19 uii316 000e43bc
> Oct 21 13:02:19 uii316 000eab40
> Oct 21 13:02:19 uii316 880f0001
> Oct 21 13:02:19 uii316
> Oct 21 13:02:19 uii316 [ 1766.831397]  880f394956e0
> Oct 21 13:02:19 uii316 880f8e0d1000
> Oct 21 13:02:19 uii316 880f3949b800
> Oct 21 13:02:19 uii316 
> Oct 21 13:02:19 uii316
> Oct 21 13:02:19 uii316 [ 1766.831410]  880f39454980
> Oct 21 13:02:19 uii316 0001
> Oct 21 13:02:19 uii316 0002433c
> Oct 21 13:02:19 uii316 0008
> Oct 21 13:02:19 uii316
> Oct 21 13:02:19 uii316 [ 1766.831423] Call Trace:
> Oct 21 13:02:19 uii316 [ 1766.831492]  [] 
> ocfs2_setattr+0x26e/0xa90 [ocfs2]
> Oct 21 13:02:19 uii316 [ 1766.831522]  [] 
> notify_change+0x19f/0x2f0
> Oct 21 13:02:19 uii316 [ 1766.831534]  [] 
> do_truncate+0x57/0x80
> Oct 21 13:02:19 uii316 [ 1766.831544]  [] 
> do_last+0x603/0x800
> Oct 21 13:02:19 uii316 [ 1766.831551]  [] 
> path_openat+0xd9/0x420
> Oct 21 13:02:19 uii316 [ 1766.831558]  [] 
> do_filp_open+0x4c/0xc0
> Oct 21 13:02:19 uii316 [ 1766.831566]  [] 
> do_sys_open+0x17f/0x250
> Oct 21 13:02:19 uii316 [ 1766.831575]  [] 
> system_call_fastpath+0x16/0x1b
> Oct 21 13:02:19 uii316 [ 1766.831588]  [<7f11ccb07080>] 0x7f11ccb0707f
> Oct 21 13:02:19 uii316 [ 1766.831592] Code:
> 
>  The source code in question is as below,
>  444 static int ocfs2_truncate_file(struct inode *inode,
>  445struct buffer_head *di_bh,
>  446u64 new_i_size)
>  447 {
>  448 int status = 0;
>  449 struct ocfs2_dinode *fe = NULL;
>  450 struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>  451
>  452 /* We trust di_bh because it comes from ocfs2_inode_lock(), 
> which
>  453  * already validated it */
>  454 fe = (struct ocfs2_dinode *) di_bh->b_data;
>  455
>  456 trace_ocfs2_truncate_file((unsigned long 
> long)OCFS2_I(inode)->ip_blkno,
>  457   (unsigned long 
> long)le64_to_cpu(fe->i_size),
>  458   (unsigned long long)new_i_size);
>  459
>  460 mlog_bug_on_msg(le64_to_cpu(fe->i_size) != i_size_read(inode),   
>  
>  <<= here
>  461 "Inode %llu, inode i_size = %lld != di "
>  462 "i_size = %llu, i_flags = 0x%x\n",
>  463 (unsigned long long)OCFS2_I(inode)->ip_blkno,
>  464 i_size_read(inode),
>  465 (unsigned long long)le64_to_cpu(fe->i_size),
>  466 le32_to_cpu(fe->i_flags));
>  467
> 
> If your 

Re: [Ocfs2-devel] kernel BUG at fs/buffer.c:2886! Linux 3.5.0

2012-08-15 Thread Vincent ETIENNE

HI,

Le 30/07/2012 08:30, Joel Becker a écrit :
 On Sat, Jul 28, 2012 at 12:18:30AM +0200, Vincent ETIENNE wrote:
 Hello

 Get this on first write made ( by deliver sending mail to inform of the
 restart of services  )
 Home partition (the one receiving the mail) is based on ocfs2 created
 from drbd block device in primary/primary mode
 These drbd devices are based on lvm.

 system is running linux-3.5.0, identical symptom with linux 3.3 and 3.2
 but working with linux 3.0 kernel

 reproduced on two machines ( so different hardware involved on this one
 software md raid on SATA, on second one areca hardware raid card )
 but the 2 machines are the one sharing this partition ( so share the
 same data )
   Hmm.  Any chance you can bisect this further?

Will try to. Will take a few days as the server is in production ( but
used as backup so...)

 Jul 27 23:41:41 jupiter2 kernel: [  351.169213] [ cut here
 ]
 Jul 27 23:41:41 jupiter2 kernel: [  351.169261] kernel BUG at
 fs/buffer.c:2886!
   This is:

   BUG_ON(!buffer_mapped(bh));

 in submit_bh().


 Jul 27 23:41:41 jupiter2 kernel: [  351.170003] Call Trace:
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [81327546] ?
 ocfs2_read_blocks+0x176/0x6c0
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [8114e541] ?
 T.1552+0x91/0x2b0
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [81346ad0] ?
 ocfs2_find_actor+0x120/0x120
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [813464f7] ?
 ocfs2_read_inode_block_full+0x37/0x60
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [813964ff] ?
 ocfs2_fast_symlink_readpage+0x2f/0x160
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [8585] ?
 do_read_cache_page+0x85/0x180
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [813964d0] ?
 ocfs2_fill_super+0x2500/0x2500
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [86d9] ?
 read_cache_page+0x9/0x20
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [8115c705] ?
 page_getlink+0x25/0x80
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [8115c77b] ?
 page_follow_link_light+0x1b/0x30
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [8116099b] ?
 path_lookupat+0x38b/0x720
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [81160d5c] ?
 do_path_lookup+0x2c/0xd0
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [81346f31] ?
 ocfs2_inode_revalidate+0x71/0x160
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [81161c0c] ?
 user_path_at_empty+0x5c/0xb0
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [8106714a] ?
 do_page_fault+0x1aa/0x3c0
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [81156f2d] ?
 cp_new_stat+0x10d/0x120
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [81157021] ?
 vfs_fstatat+0x41/0x80
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [8115715f] ?
 sys_newstat+0x1f/0x50
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [817ecee2] ?
 system_call_fastpath+0x16/0x1b
   This stack trace is from 3.5, because of the location of the
 BUG.  The call path in the trace suggests the code added by Al's ea022d,
 but you say it breaks in 3.2 and 3.3 as well.  Can you give me a trace
 from 3.2?

For a 3.2 kernel i get this stack trace. Different trace form 3.5 but
exactly at the same moment. and for the same reasons.
Seems to be less immmediate than with 3.5 but more a subjective
imrpession than something based on fact. ( it takes a few seconds after
deliver is started to have the bug )


[  716.402833] o2dlm: Joining domain B43153ED20B942E291251F2C138ADA9E (
0 1 ) 2 nodes
[  716.501511] ocfs2: Mounting device (147,2) on (node 1, slot 0) with
ordered data mode.
[  716.505744] mount.ocfs2 used greatest stack depth: 2936 bytes left
[  727.133743] deliver used greatest stack depth: 2632 bytes left
[  764.167029] deliver used greatest stack depth: 1896 bytes left
[  764.778872] BUG: unable to handle kernel NULL pointer dereference at
0038
[  764.778897] IP: [8133c51a]
__ocfs2_change_file_space+0x75a/0x1690
[  764.778922] PGD 62697067 PUD 67a81067 PMD 0
[  764.778939] Oops:  [#1] SMP
[  764.778953] CPU 0
[  764.778959] Modules linked in: drbd lru_cache ipv6 [last unloaded: drbd]
[  764.778986]
[  764.778993] Pid: 5909, comm: deliver Not tainted 3.2.12-gentoo #2 HP
ProLiant ML150 G3/ML150 G3
[  764.779017] RIP: 0010:[8133c51a]  [8133c51a]
__ocfs2_change_file_space+0x75a/0x1690
[  764.779041] RSP: 0018:880067b2dd98  EFLAGS: 00010246
[  764.779053] RAX:  RBX: 880067f82000 RCX:
880063d11000
[  764.779069] RDX:  RSI: 0001 RDI:
88007ae83288
[  764.779085] RBP: 880055d1f138 R08: 0010 R09:
880063d11000
[  764.779100] R10:  R11:  R12:
88007ae83288
[  764.779115] R13:  R14:  R15:
00df
[  

Re: [Ocfs2-devel] kernel BUG at fs/buffer.c:2886! Linux 3.5.0

2012-08-15 Thread Vincent ETIENNE
On 01/08/2012 22:43, Vincent ETIENNE wrote:
 Hi
 Some further progress on bisection

 I'm now here

 git bisect start
 # bad: [2d534926205db9ffce4bbbde67cb9b2cee4b835c] Merge tag
 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6
 git bisect bad 2d534926205db9ffce4bbbde67cb9b2cee4b835c
 # good: [c3b92c8787367a8bb53d57d9789b558f1295cc96] Linux 3.1
 git bisect good c3b92c8787367a8bb53d57d9789b558f1295cc96
 # good: [95211279c5ad00a317c98221d7e4365e02f20836] Merge branch 'akpm'
 (Andrew's patch-bomb)
 git bisect good 95211279c5ad00a317c98221d7e4365e02f20836
 # good: [654443e20dfc0617231f28a07c96a979ee1a0239] Merge branch
 'perf-uprobes-for-linus' of
 git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
 git bisect good 654443e20dfc0617231f28a07c96a979ee1a0239
 # bad: [f0a08fcb5972167e55faa330c4a24fbaa3328b1f] Merge
 git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
 git bisect bad f0a08fcb5972167e55faa330c4a24fbaa3328b1f
 # bad: [f5e7e844a571124ffc117d4696787d6afc4fc5ae] Merge tag
 'for-linus-3.5-20120601' of git://git.infradead.org/linux-mtd
 git bisect bad f5e7e844a571124ffc117d4696787d6afc4fc5ae
 # good: [f465d145d76803fe6332092775d891c8c509aa44] Merge tag
 'cleanup-initcall' of
 git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
 git bisect good f465d145d76803fe6332092775d891c8c509aa44
 # good: [a70f35af4e49f87ba4b6c4b30220fbb66cd74af6] Merge branch
 'for-3.5/drivers' of git://git.kernel.dk/linux-block
 git bisect good a70f35af4e49f87ba4b6c4b30220fbb66cd74af6
 # good: [a00b6151a2ae4c52576c35d3998e144a993d50b8] Merge branch
 'for-3.5-take-2' of git://linux-nfs.org/~bfields/linux
 git bisect good a00b6151a2ae4c52576c35d3998e144a993d50b8
 # bad: [1193755ac6328ad240ba987e6ec41d5e8baf0680] Merge branch
 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
 git bisect bad 1193755ac6328ad240ba987e6ec41d5e8baf0680
 # good: [51eab603f5c86dd1eae4c525df3e7f7eeab401d6] Merge branch
 'for-linus' of
 git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
 git bisect good 51eab603f5c86dd1eae4c525df3e7f7eeab401d6
 # bad: [eb36c5873b96e8c7376768d3906da74aae6e3839] new helper:
 vm_mmap_pgoff()
 git bisect bad eb36c5873b96e8c7376768d3906da74aae6e3839

 but got a problem ( kernel does not compile ) at next iteration
 need to dig into git bisect for how to select another entry

 Vincent


git bisect skip looks like the way to do it ?


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] kernel BUG at fs/buffer.c:2886! Linux 3.5.0

2012-08-15 Thread Vincent ETIENNE



On 30/07/2012 09:53, Joel Becker wrote:
 On Mon, Jul 30, 2012 at 09:45:14AM +0200, Vincent ETIENNE wrote:
 Le 30/07/2012 08:30, Joel Becker a écrit :
 On Sat, Jul 28, 2012 at 12:18:30AM +0200, Vincent ETIENNE wrote:
 Hello

 Get this on first write made ( by deliver sending mail to inform of the
 restart of services  )
 Home partition (the one receiving the mail) is based on ocfs2 created
 from drbd block device in primary/primary mode
 These drbd devices are based on lvm.

 system is running linux-3.5.0, identical symptom with linux 3.3 and 3.2
 but working with linux 3.0 kernel

 reproduced on two machines ( so different hardware involved on this one
 software md raid on SATA, on second one areca hardware raid card )
 but the 2 machines are the one sharing this partition ( so share the
 same data )
 Hmm.  Any chance you can bisect this further?
 Will try to. Will take a few days as the server is in production ( but
 used as backup so...)

 Jul 27 23:41:41 jupiter2 kernel: [  351.169213] [ cut here
 ]
 Jul 27 23:41:41 jupiter2 kernel: [  351.169261] kernel BUG at
 fs/buffer.c:2886!
 This is:

 BUG_ON(!buffer_mapped(bh));

 in submit_bh().

 system_call_fastpath+0x16/0x1b
 This stack trace is from 3.5, because of the location of the
 BUG.  The call path in the trace suggests the code added by Al's ea022d,
 but you say it breaks in 3.2 and 3.3 as well.  Can you give me a trace
 from 3.2?
 For a 3.2 kernel i get this stack trace. Different trace form 3.5 but
 exactly at the same moment. and for the same reasons.
 Seems to be less immmediate than with 3.5 but more a subjective
 imrpession than something based on fact. ( it takes a few seconds after
 deliver is started to have the bug )
 Totally different stack trace.  Not in symlink code, but instead in
 fallocate.  Weird.  I wonder if you are hitting two things.  Bisection
 will definitely help.

Yes could be, that would explain the 2 stack trace ( and the different
timing observed )
Bisection is in progress. The fallocate bug is certainly already
corrected ( info sent by
sunil.mush...@gmail.com but unavailable on the list for the moment  ?)

--

The fallocate() oops is probably the same that is fixed by this patch.
https://oss.oracle.com/git/?p=smushran/linux-2.6.git;a=commit;h=a2118b301104a24381b414bc93371d666fe8d43a


Is in the list of patches that are ready to be pushed.
https://oss.oracle.com/git/?p=smushran/linux-2.6.git;a=shortlog;h=mw-3.4-mar15



But not sure it will correct all i observed. So i will continue to
bisect to confirm/infirm.
( But i seems to have lost network on my server after a reboot and so no
more access before tomorrow , I have certainly forget to do make
modules_install before installing new kernel ... Being stupid is not
very helpful... ) . I hope to finish the bisection tomorrow or wednesday.
 
Thanks a lot for the support.
 Joel




___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] kernel BUG at fs/buffer.c:2886! Linux 3.5.0

2012-08-15 Thread Vincent ETIENNE

Hi,

So 12 commits left, corresponding to this bisection log

git bisect start
# bad: [2d534926205db9ffce4bbbde67cb9b2cee4b835c] Merge tag
'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6
git bisect bad 2d534926205db9ffce4bbbde67cb9b2cee4b835c
# good: [c3b92c8787367a8bb53d57d9789b558f1295cc96] Linux 3.1
git bisect good c3b92c8787367a8bb53d57d9789b558f1295cc96
# good: [95211279c5ad00a317c98221d7e4365e02f20836] Merge branch 'akpm'
(Andrew's patch-bomb)
git bisect good 95211279c5ad00a317c98221d7e4365e02f20836
# good: [654443e20dfc0617231f28a07c96a979ee1a0239] Merge branch
'perf-uprobes-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 654443e20dfc0617231f28a07c96a979ee1a0239
# bad: [f0a08fcb5972167e55faa330c4a24fbaa3328b1f] Merge
git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
git bisect bad f0a08fcb5972167e55faa330c4a24fbaa3328b1f
# bad: [f5e7e844a571124ffc117d4696787d6afc4fc5ae] Merge tag
'for-linus-3.5-20120601' of git://git.infradead.org/linux-mtd
git bisect bad f5e7e844a571124ffc117d4696787d6afc4fc5ae
# good: [f465d145d76803fe6332092775d891c8c509aa44] Merge tag
'cleanup-initcall' of
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
git bisect good f465d145d76803fe6332092775d891c8c509aa44
# good: [a70f35af4e49f87ba4b6c4b30220fbb66cd74af6] Merge branch
'for-3.5/drivers' of git://git.kernel.dk/linux-block
git bisect good a70f35af4e49f87ba4b6c4b30220fbb66cd74af6
# good: [a00b6151a2ae4c52576c35d3998e144a993d50b8] Merge branch
'for-3.5-take-2' of git://linux-nfs.org/~bfields/linux
git bisect good a00b6151a2ae4c52576c35d3998e144a993d50b8
# bad: [1193755ac6328ad240ba987e6ec41d5e8baf0680] Merge branch
'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
git bisect bad 1193755ac6328ad240ba987e6ec41d5e8baf0680
# good: [51eab603f5c86dd1eae4c525df3e7f7eeab401d6] Merge branch
'for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
git bisect good 51eab603f5c86dd1eae4c525df3e7f7eeab401d6
# bad: [eb36c5873b96e8c7376768d3906da74aae6e3839] new helper:
vm_mmap_pgoff()
git bisect bad eb36c5873b96e8c7376768d3906da74aae6e3839
# skip: [eea62f831b8030b0eeea8314eed73b6132d1de26] brlocks/lglocks: turn
into functions
git bisect skip eea62f831b8030b0eeea8314eed73b6132d1de26
# good: [52576da3545e78c534d901a39f6f2391665c641b] hpfs: bitmaps are
little-endian
git bisect good 52576da3545e78c534d901a39f6f2391665c641b
# bad: [3ed37648e1cbf1bbebc200c6ea8fd8daf8325843] fs: move
file_remove_suid() to fs/inode.c
git bisect bad 3ed37648e1cbf1bbebc200c6ea8fd8daf8325843
# bad: [962830df366b66e71849040770ae6ba55a8b4aec] brlocks/lglocks: API
cleanups
git bisect bad 962830df366b66e71849040770ae6ba55a8b4aec

the commit left are

commit 962830df366b66e71849040770ae6ba55a8b4aec
Author: Andi Kleen a...@linux.intel.com
Date:   Tue May 8 13:32:02 2012 +0930

brlocks/lglocks: API cleanups


commit eea62f831b8030b0eeea8314eed73b6132d1de26
Author: Andi Kleen a...@linux.intel.com
Date:   Tue May 8 13:32:24 2012 +0930

brlocks/lglocks: turn into functions

commit 9dd6fa03ab31bb57cee4623a689d058d222fbe68
Author: Rusty Russell ru...@rustcorp.com.au
Date:   Tue May 8 13:29:45 2012 +0930

lglock: remove online variants of lock

commit ea022dfb3c2a4680483b00eb2fecc9fc4f6091d1
Author: Al Viro v...@zeniv.linux.org.uk
Date:   Thu May 3 10:14:29 2012 -0400

ocfs: simplify symlink handling
   

commit 408bd629badbd4353b238ab6f58001529b274d73
Author: Al Viro v...@zeniv.linux.org.uk
Date:   Thu May 3 09:34:20 2012 -0400

get rid of pointless allocations and copying in ecryptfs_follow_link()
   
switch to generic_readlink(), while we are at it


commit 28fe3c1963b0bafa56ec92df1987828090151d87
Author: Al Viro v...@zeniv.linux.org.uk
Date:   Tue Apr 17 16:41:13 2012 -0400

hpfs: assorted endianness annotations

commit 77ee26e44c28823a29bc09091950544566ae7cea
Author: Al Viro v...@zeniv.linux.org.uk
Date:   Tue Apr 17 16:26:46 2012 -0400

hpfs: annotate ea

commit 46287aa652fa8ea1edac41817ddc63332495ffc3
Author: Al Viro v...@zeniv.linux.org.uk
Date:   Tue Apr 17 16:20:49 2012 -0400

hpfs: annotate struct hpfs_dirent

commit 6ce2bbba5266c1dd5c27dd8af1887ed8ca564919
Author: Al Viro v...@zeniv.linux.org.uk
Date:   Tue Apr 17 16:11:25 2012 -0400

hpfs: annotate struct anode

commit 2b9f1cc29ba0e56089fe04501ec6d3b49eee3c3e
Author: Al Viro v...@zeniv.linux.org.uk
Date:   Tue Apr 17 16:09:25 2012 -0400

hpfs: annotate struct fnode

commit ddc19e6e04c1131a48f5b9a25aa433bbd8430cdd
Author: Al Viro v...@zeniv.linux.org.uk
Date:   Tue Apr 17 15:59:35 2012 -0400

hpfs: annotate btree nodes, get rid of bitfields mess
   
commit 39413c6046de282a92739110cfafb8f1e862680d
Author: Al Viro v...@zeniv.linux.org.uk
Date:   Tue Apr 17 15:32:22 2012 -0400

hpfs: annotate struct dnode


After that bisection start to be quite hard : i have compile error or
unbootable kernel or unrelated OOPS


For the record the BUG that i'm chasing is 

Re: [Ocfs2-devel] kernel BUG at fs/buffer.c:2886! Linux 3.5.0

2012-08-15 Thread Vincent ETIENNE
Hi

based on current git ( commit 1a9b4993b70fb1884716902774dc9025b457760d )
and  reverting commit  ea022dfb3c2a4680483b00eb2fecc9fc4f6091d1

commit ea022dfb3c2a4680483b00eb2fecc9fc4f6091d1
Author: Al Viro v...@zeniv.linux.org.uk
Date:   Thu May 3 10:14:29 2012 -0400

ocfs: simplify symlink handling

Suppress

Jul 31 09:42:12 jupiter2 kernel: [  594.244726] kernel BUG at fs/buffer.c:2886!
Jul 31 09:42:12 jupiter2 kernel: [  594.244768] invalid opcode:  [#1] SMP
Jul 31 09:42:12 jupiter2 kernel: [  594.244874] CPU 0
Jul 31 09:42:12 jupiter2 kernel: [  594.244911] Modules linked in: drbd 
lru_cache [last unloaded: drbd]
Jul 31 09:42:12 jupiter2 kernel: [  594.245121]
Jul 31 09:42:12 jupiter2 kernel: [  594.245156] Pid: 5725, comm: deliver Not 
tainted 3.5.0-gentoo #3 HP ProLiant ML150 G3/ML150 G3
Jul 31 09:42:12 jupiter2 kernel: [  594.245302] RIP: 0010:[81180862]  
[81180862] submit_bh+0x112/0x120
Jul 31 09:42:12 jupiter2 kernel: [  594.245389] RSP: 0018:88006032fb38  
EFLAGS: 00010246
Jul 31 09:42:12 jupiter2 kernel: [  594.245432] RAX: 4104 RBX: 
ea00014a1a80 RCX: 0003
Jul 31 09:42:12 jupiter2 kernel: [  594.245478] RDX: 0001 RSI: 
ea00014a1a80 RDI: 
Jul 31 09:42:12 jupiter2 kernel: [  594.245523] RBP:  R08: 
 R09: 81346ad0
Jul 31 09:42:12 jupiter2 kernel: [  594.245569] R10: dead00200200 R11: 
 R12: 04cc4789
Jul 31 09:42:12 jupiter2 kernel: [  594.245614] R13: 0003 R14: 
 R15: 
Jul 31 09:42:12 jupiter2 kernel: [  594.245661] FS:  7f23be7e6700() 
GS:88007fc0() knlGS:
Jul 31 09:42:12 jupiter2 kernel: [  594.245708] CS:  0010 DS:  ES:  
CR0: 8005003b
Jul 31 09:42:12 jupiter2 kernel: [  594.245752] CR2: 7f23bd098b6c CR3: 
61cfd000 CR4: 07f0
Jul 31 09:42:12 jupiter2 kernel: [  594.245853] DR0:  DR1: 
 DR2: 
Jul 31 09:42:12 jupiter2 kernel: [  594.245954] DR3:  DR6: 
0ff0 DR7: 0400
Jul 31 09:42:12 jupiter2 kernel: [  594.246058] Process deliver (pid: 5725, 
threadinfo 88006032e000, task 88007c7f1e00)
Jul 31 09:42:12 jupiter2 kernel: [  594.246218] Stack:
Jul 31 09:42:12 jupiter2 kernel: [  594.246311]  ea00014a1a80 
0001 04cc4789 81327546
Jul 31 09:42:12 jupiter2 kernel: [  594.246598]  53a6db78 
0001800e000e 88007c7f2468 88006032fc10
Jul 31 09:42:12 jupiter2 kernel: [  594.246885]   
0001 880053a7e9b0 880056f32000
Jul 31 09:42:12 jupiter2 kernel: [  594.247173] Call Trace:
Jul 31 09:42:12 jupiter2 kernel: [  594.247271]  [81327546] ? 
ocfs2_read_blocks+0x176/0x6c0
Jul 31 09:42:12 jupiter2 kernel: [  594.247373]  [81346ad0] ? 
ocfs2_find_actor+0x120/0x120
Jul 31 09:42:12 jupiter2 kernel: [  594.247474]  [813464f7] ? 
ocfs2_read_inode_block_full+0x37/0x60
Jul 31 09:42:12 jupiter2 kernel: [  594.247578]  [813964ff] ? 
ocfs2_fast_symlink_readpage+0x2f/0x160
Jul 31 09:42:12 jupiter2 kernel: [  594.247683]  [8585] ? 
do_read_cache_page+0x85/0x180
Jul 31 09:42:12 jupiter2 kernel: [  594.247784]  [813964d0] ? 
ocfs2_fill_super+0x2500/0x2500
Jul 31 09:42:12 jupiter2 kernel: [  594.247883]  [86d9] ? 
read_cache_page+0x9/0x20
Jul 31 09:42:12 jupiter2 kernel: [  594.247984]  [8115c705] ? 
page_getlink+0x25/0x80
Jul 31 09:42:12 jupiter2 kernel: [  594.248083]  [8115c77b] ? 
page_follow_link_light+0x1b/0x30
Jul 31 09:42:12 jupiter2 kernel: [  594.248186]  [8116099b] ? 
path_lookupat+0x38b/0x720
Jul 31 09:42:12 jupiter2 kernel: [  594.248286]  [81160d5c] ? 
do_path_lookup+0x2c/0xd0
Jul 31 09:42:12 jupiter2 kernel: [  594.248385]  [81346f31] ? 
ocfs2_inode_revalidate+0x71/0x160
Jul 31 09:42:12 jupiter2 kernel: [  594.248492]  [8106b9d7] ? 
flush_tlb_others_ipi+0x107/0x130
Jul 31 09:42:12 jupiter2 kernel: [  594.248594]  [81161c0c] ? 
user_path_at_empty+0x5c/0xb0
Jul 31 09:42:12 jupiter2 kernel: [  594.248694]  [8106714a] ? 
do_page_fault+0x1aa/0x3c0
Jul 31 09:42:12 jupiter2 kernel: [  594.248789]  [81156f2d] ? 
cp_new_stat+0x10d/0x120
Jul 31 09:42:12 jupiter2 kernel: [  594.248884]  [81157021] ? 
vfs_fstatat+0x41/0x80
Jul 31 09:42:12 jupiter2 kernel: [  594.248978]  [8115715f] ? 
sys_newstat+0x1f/0x50
Jul 31 09:42:12 jupiter2 kernel: [  594.249075]  [817ecee2] ? 
system_call_fastpath+0x16/0x1b
Jul 31 09:42:12 jupiter2 kernel: [  594.249169] Code: b6 44 24 18 4c 89 e7 83 
e0 80 3c 01 19 db e8 76 3f 00 00 f7 d3 83 e3 a1 89 d8 5b 5d 41 5c c3 0f 0b eb 
fe 0f 0b eb fe 0f 0$
Jul 31 09:42:12 jupiter2 kernel: [  594.250003] RIP  [81180862] 
submit_bh+0x112/0x120

and  adding the correction from


Re: [Ocfs2-devel] kernel BUG at fs/buffer.c:2886! Linux 3.5.0

2012-08-15 Thread Vincent ETIENNE


Some progress

the fallocate bug is not the only bug
latest head with the fallocate correction still crash
( in read_blocks )

So i have restart bisection but at each stage i reinject the fallocate
patch ( is it a corerct way to do this ?)
Bisection is not very fast but for the moment (sometimes i need to rebot
harsly and it kicks a rebuild of the raid array ) :

git bisect start
# bad: [2d534926205db9ffce4bbbde67cb9b2cee4b835c] Merge tag
'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6
git bisect bad 2d534926205db9ffce4bbbde67cb9b2cee4b835c
# good: [c3b92c8787367a8bb53d57d9789b558f1295cc96] Linux 3.1
git bisect good c3b92c8787367a8bb53d57d9789b558f1295cc96
# good: [95211279c5ad00a317c98221d7e4365e02f20836] Merge branch 'akpm'
(Andrew's patch-bomb)
git bisect good 95211279c5ad00a317c98221d7e4365e02f20836
# good: [654443e20dfc0617231f28a07c96a979ee1a0239] Merge branch
'perf-uprobes-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 654443e20dfc0617231f28a07c96a979ee1a0239
# bad: [f0a08fcb5972167e55faa330c4a24fbaa3328b1f] Merge
git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
git bisect bad f0a08fcb5972167e55faa330c4a24fbaa3328b1f
# bad: [f5e7e844a571124ffc117d4696787d6afc4fc5ae] Merge tag
'for-linus-3.5-20120601' of git://git.infradead.org/linux-mtd
git bisect bad f5e7e844a571124ffc117d4696787d6afc4fc5ae

Each bad has failed with the read_block OOPS ( so somewhat consistent
for now )




Le 30/07/2012 20:30, Vincent ETIENNE a écrit :


 On 30/07/2012 09:53, Joel Becker wrote:
 On Mon, Jul 30, 2012 at 09:45:14AM +0200, Vincent ETIENNE wrote:
 Le 30/07/2012 08:30, Joel Becker a écrit :
 On Sat, Jul 28, 2012 at 12:18:30AM +0200, Vincent ETIENNE wrote:
 Hello

 Get this on first write made ( by deliver sending mail to inform of the
 restart of services  )
 Home partition (the one receiving the mail) is based on ocfs2 created
 from drbd block device in primary/primary mode
 These drbd devices are based on lvm.

 system is running linux-3.5.0, identical symptom with linux 3.3 and 3.2
 but working with linux 3.0 kernel

 reproduced on two machines ( so different hardware involved on this one
 software md raid on SATA, on second one areca hardware raid card )
 but the 2 machines are the one sharing this partition ( so share the
 same data )
Hmm.  Any chance you can bisect this further?
 Will try to. Will take a few days as the server is in production ( but
 used as backup so...)

 Jul 27 23:41:41 jupiter2 kernel: [  351.169213] [ cut here
 ]
 Jul 27 23:41:41 jupiter2 kernel: [  351.169261] kernel BUG at
 fs/buffer.c:2886!
This is:

BUG_ON(!buffer_mapped(bh));

 in submit_bh().

 system_call_fastpath+0x16/0x1b
This stack trace is from 3.5, because of the location of the
 BUG.  The call path in the trace suggests the code added by Al's ea022d,
 but you say it breaks in 3.2 and 3.3 as well.  Can you give me a trace
 from 3.2?
 For a 3.2 kernel i get this stack trace. Different trace form 3.5 but
 exactly at the same moment. and for the same reasons.
 Seems to be less immmediate than with 3.5 but more a subjective
 imrpession than something based on fact. ( it takes a few seconds after
 deliver is started to have the bug )
 Totally different stack trace.  Not in symlink code, but instead in
 fallocate.  Weird.  I wonder if you are hitting two things.  Bisection
 will definitely help.
 Yes could be, that would explain the 2 stack trace ( and the different
 timing observed )
 Bisection is in progress. The fallocate bug is certainly already
 corrected ( info sent by
 sunil.mush...@gmail.com but unavailable on the list for the moment  ?)

 --

 The fallocate() oops is probably the same that is fixed by this patch.
 https://oss.oracle.com/git/?p=smushran/linux-2.6.git;a=commit;h=a2118b301104a24381b414bc93371d666fe8d43a


 Is in the list of patches that are ready to be pushed.
 https://oss.oracle.com/git/?p=smushran/linux-2.6.git;a=shortlog;h=mw-3.4-mar15

 

 But not sure it will correct all i observed. So i will continue to
 bisect to confirm/infirm.
 ( But i seems to have lost network on my server after a reboot and so no
 more access before tomorrow , I have certainly forget to do make
 modules_install before installing new kernel ... Being stupid is not
 very helpful... ) . I hope to finish the bisection tomorrow or wednesday.
  
 Thanks a lot for the support.
 Joel




___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] kernel BUG at fs/buffer.c:2886! Linux 3.5.0

2012-08-15 Thread Vincent ETIENNE
Hi
Some further progress on bisection

I'm now here

git bisect start
# bad: [2d534926205db9ffce4bbbde67cb9b2cee4b835c] Merge tag
'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6
git bisect bad 2d534926205db9ffce4bbbde67cb9b2cee4b835c
# good: [c3b92c8787367a8bb53d57d9789b558f1295cc96] Linux 3.1
git bisect good c3b92c8787367a8bb53d57d9789b558f1295cc96
# good: [95211279c5ad00a317c98221d7e4365e02f20836] Merge branch 'akpm'
(Andrew's patch-bomb)
git bisect good 95211279c5ad00a317c98221d7e4365e02f20836
# good: [654443e20dfc0617231f28a07c96a979ee1a0239] Merge branch
'perf-uprobes-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 654443e20dfc0617231f28a07c96a979ee1a0239
# bad: [f0a08fcb5972167e55faa330c4a24fbaa3328b1f] Merge
git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
git bisect bad f0a08fcb5972167e55faa330c4a24fbaa3328b1f
# bad: [f5e7e844a571124ffc117d4696787d6afc4fc5ae] Merge tag
'for-linus-3.5-20120601' of git://git.infradead.org/linux-mtd
git bisect bad f5e7e844a571124ffc117d4696787d6afc4fc5ae
# good: [f465d145d76803fe6332092775d891c8c509aa44] Merge tag
'cleanup-initcall' of
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
git bisect good f465d145d76803fe6332092775d891c8c509aa44
# good: [a70f35af4e49f87ba4b6c4b30220fbb66cd74af6] Merge branch
'for-3.5/drivers' of git://git.kernel.dk/linux-block
git bisect good a70f35af4e49f87ba4b6c4b30220fbb66cd74af6
# good: [a00b6151a2ae4c52576c35d3998e144a993d50b8] Merge branch
'for-3.5-take-2' of git://linux-nfs.org/~bfields/linux
git bisect good a00b6151a2ae4c52576c35d3998e144a993d50b8
# bad: [1193755ac6328ad240ba987e6ec41d5e8baf0680] Merge branch
'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
git bisect bad 1193755ac6328ad240ba987e6ec41d5e8baf0680
# good: [51eab603f5c86dd1eae4c525df3e7f7eeab401d6] Merge branch
'for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
git bisect good 51eab603f5c86dd1eae4c525df3e7f7eeab401d6
# bad: [eb36c5873b96e8c7376768d3906da74aae6e3839] new helper:
vm_mmap_pgoff()
git bisect bad eb36c5873b96e8c7376768d3906da74aae6e3839

but got a problem ( kernel does not compile ) at next iteration
need to dig into git bisect for how to select another entry

Vincent



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] kernel BUG at fs/buffer.c:2886! Linux 3.5.0

2012-08-15 Thread Vincent ETIENEN


Le 02/08/2012 23:08, Sunil Mushran a écrit :
 On Thu, Aug 2, 2012 at 12:28 PM, Vincent ETIENNE v...@vetienne.net
 mailto:v...@vetienne.net wrote:

 Hi

 based on current git ( commit
 1a9b4993b70fb1884716902774dc9025b457760d )
 and  reverting commit  ea022dfb3c2a4680483b00eb2fecc9fc4f6091d1

 commit ea022dfb3c2a4680483b00eb2fecc9fc4f6091d1
 Author: Al Viro v...@zeniv.linux.org.uk
 mailto:v...@zeniv.linux.org.uk
 Date:   Thu May 3 10:14:29 2012 -0400

 ocfs: simplify symlink handling

  and  adding the correction from

 
 https://oss.oracle.com/git/?p=smushran/linux-2.6.git;a=commit;h=a2118b301104a24381b414bc93371d666fe8d43a

 suppres the fallocate bug

 and lead to no oops. At least immediatly. Will let it run some times



 Apply this change and re-run:

 diff --git a/fs/ocfs2/symlink.c b/fs/ocfs2/symlink.c
 index f1fbb4b..66edce7 100644
 --- a/fs/ocfs2/symlink.c
 +++ b/fs/ocfs2/symlink.c
 @@ -57,7 +57,7 @@
  static int ocfs2_fast_symlink_readpage(struct file *unused, struct
 page *page)
  {
 struct inode *inode = page-mapping-host;
 -   struct buffer_head *bh;
 +   struct buffer_head *bh = NULL;
 int status = ocfs2_read_inode_block(inode, bh);
 struct ocfs2_dinode *fe;
 const char *link;

latest head with only your two changes is working here, thanks a lot.
Running now flowlessly since a few hours.

Vincent



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] kernel BUG at fs/buffer.c:2886! Linux 3.5.0

2012-08-03 Thread Sunil Mushran
Thanks for your help.

On Fri, Aug 3, 2012 at 12:22 AM, Vincent ETIENEN v...@vetienne.net wrote:



 Le 02/08/2012 23:08, Sunil Mushran a écrit :

 On Thu, Aug 2, 2012 at 12:28 PM, Vincent ETIENNE v...@vetienne.net wrote:

 Hi

 based on current git ( commit 1a9b4993b70fb1884716902774dc9025b457760d )
 and  reverting commit  ea022dfb3c2a4680483b00eb2fecc9fc4f6091d1

 commit ea022dfb3c2a4680483b00eb2fecc9fc4f6091d1
 Author: Al Viro v...@zeniv.linux.org.uk
 Date:   Thu May 3 10:14:29 2012 -0400

 ocfs: simplify symlink handling

   and  adding the correction from


 https://oss.oracle.com/git/?p=smushran/linux-2.6.git;a=commit;h=a2118b301104a24381b414bc93371d666fe8d43a

 suppres the fallocate bug

 and lead to no oops. At least immediatly. Will let it run some times



  Apply this change and re-run:

  diff --git a/fs/ocfs2/symlink.c b/fs/ocfs2/symlink.c
 index f1fbb4b..66edce7 100644
 --- a/fs/ocfs2/symlink.c
 +++ b/fs/ocfs2/symlink.c
 @@ -57,7 +57,7 @@
  static int ocfs2_fast_symlink_readpage(struct file *unused, struct page
 *page)
  {
 struct inode *inode = page-mapping-host;
 -   struct buffer_head *bh;
 +   struct buffer_head *bh = NULL;
 int status = ocfs2_read_inode_block(inode, bh);
 struct ocfs2_dinode *fe;
 const char *link;

   latest head with only your two changes is working here, thanks a lot.
 Running now flowlessly since a few hours.

 Vincent




___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] kernel BUG at fs/buffer.c:2886! Linux 3.5.0

2012-08-02 Thread Sunil Mushran
On Thu, Aug 2, 2012 at 12:28 PM, Vincent ETIENNE v...@vetienne.net wrote:

 Hi

 based on current git ( commit 1a9b4993b70fb1884716902774dc9025b457760d )
 and  reverting commit  ea022dfb3c2a4680483b00eb2fecc9fc4f6091d1

 commit ea022dfb3c2a4680483b00eb2fecc9fc4f6091d1
 Author: Al Viro v...@zeniv.linux.org.uk
 Date:   Thu May 3 10:14:29 2012 -0400

 ocfs: simplify symlink handling

  and  adding the correction from


 https://oss.oracle.com/git/?p=smushran/linux-2.6.git;a=commit;h=a2118b301104a24381b414bc93371d666fe8d43a

 suppres the fallocate bug

 and lead to no oops. At least immediatly. Will let it run some times



Apply this change and re-run:

diff --git a/fs/ocfs2/symlink.c b/fs/ocfs2/symlink.c
index f1fbb4b..66edce7 100644
--- a/fs/ocfs2/symlink.c
+++ b/fs/ocfs2/symlink.c
@@ -57,7 +57,7 @@
 static int ocfs2_fast_symlink_readpage(struct file *unused, struct page
*page)
 {
struct inode *inode = page-mapping-host;
-   struct buffer_head *bh;
+   struct buffer_head *bh = NULL;
int status = ocfs2_read_inode_block(inode, bh);
struct ocfs2_dinode *fe;
const char *link;
___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] kernel BUG at fs/buffer.c:2886! Linux 3.5.0

2012-07-30 Thread Joel Becker
On Sat, Jul 28, 2012 at 12:18:30AM +0200, Vincent ETIENNE wrote:
 Hello
 
 Get this on first write made ( by deliver sending mail to inform of the
 restart of services  )
 Home partition (the one receiving the mail) is based on ocfs2 created
 from drbd block device in primary/primary mode
 These drbd devices are based on lvm.
 
 system is running linux-3.5.0, identical symptom with linux 3.3 and 3.2
 but working with linux 3.0 kernel
 
 reproduced on two machines ( so different hardware involved on this one
 software md raid on SATA, on second one areca hardware raid card )
 but the 2 machines are the one sharing this partition ( so share the
 same data )

Hmm.  Any chance you can bisect this further?

 Jul 27 23:41:41 jupiter2 kernel: [  351.169213] [ cut here
 ]
 Jul 27 23:41:41 jupiter2 kernel: [  351.169261] kernel BUG at
 fs/buffer.c:2886!

This is:

BUG_ON(!buffer_mapped(bh));

in submit_bh().


 Jul 27 23:41:41 jupiter2 kernel: [  351.170003] Call Trace:
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [81327546] ?
 ocfs2_read_blocks+0x176/0x6c0
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [8114e541] ?
 T.1552+0x91/0x2b0
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [81346ad0] ?
 ocfs2_find_actor+0x120/0x120
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [813464f7] ?
 ocfs2_read_inode_block_full+0x37/0x60
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [813964ff] ?
 ocfs2_fast_symlink_readpage+0x2f/0x160
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [8585] ?
 do_read_cache_page+0x85/0x180
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [813964d0] ?
 ocfs2_fill_super+0x2500/0x2500
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [86d9] ?
 read_cache_page+0x9/0x20
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [8115c705] ?
 page_getlink+0x25/0x80
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [8115c77b] ?
 page_follow_link_light+0x1b/0x30
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [8116099b] ?
 path_lookupat+0x38b/0x720
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [81160d5c] ?
 do_path_lookup+0x2c/0xd0
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [81346f31] ?
 ocfs2_inode_revalidate+0x71/0x160
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [81161c0c] ?
 user_path_at_empty+0x5c/0xb0
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [8106714a] ?
 do_page_fault+0x1aa/0x3c0
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [81156f2d] ?
 cp_new_stat+0x10d/0x120
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [81157021] ?
 vfs_fstatat+0x41/0x80
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [8115715f] ?
 sys_newstat+0x1f/0x50
 Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [817ecee2] ?
 system_call_fastpath+0x16/0x1b

This stack trace is from 3.5, because of the location of the
BUG.  The call path in the trace suggests the code added by Al's ea022d,
but you say it breaks in 3.2 and 3.3 as well.  Can you give me a trace
from 3.2?

Joel

-- 

Life's Little Instruction Book #139

Never deprive someone of hope; it might be all they have.

http://www.jlbec.org/
jl...@evilplan.org

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] kernel BUG at fs/buffer.c:2886! Linux 3.5.0

2012-07-30 Thread Joel Becker
On Mon, Jul 30, 2012 at 09:45:14AM +0200, Vincent ETIENNE wrote:
 
 HI,
 
 Le 30/07/2012 08:30, Joel Becker a écrit :
  On Sat, Jul 28, 2012 at 12:18:30AM +0200, Vincent ETIENNE wrote:
  Hello
 
  Get this on first write made ( by deliver sending mail to inform of the
  restart of services  )
  Home partition (the one receiving the mail) is based on ocfs2 created
  from drbd block device in primary/primary mode
  These drbd devices are based on lvm.
 
  system is running linux-3.5.0, identical symptom with linux 3.3 and 3.2
  but working with linux 3.0 kernel
 
  reproduced on two machines ( so different hardware involved on this one
  software md raid on SATA, on second one areca hardware raid card )
  but the 2 machines are the one sharing this partition ( so share the
  same data )
  Hmm.  Any chance you can bisect this further?
 
 Will try to. Will take a few days as the server is in production ( but
 used as backup so...)
 
  Jul 27 23:41:41 jupiter2 kernel: [  351.169213] [ cut here
  ]
  Jul 27 23:41:41 jupiter2 kernel: [  351.169261] kernel BUG at
  fs/buffer.c:2886!
  This is:
 
  BUG_ON(!buffer_mapped(bh));
 
  in submit_bh().
 
 
  Jul 27 23:41:41 jupiter2 kernel: [  351.170003] Call Trace:
  Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [81327546] ?
  ocfs2_read_blocks+0x176/0x6c0
  Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [8114e541] ?
  T.1552+0x91/0x2b0
  Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [81346ad0] ?
  ocfs2_find_actor+0x120/0x120
  Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [813464f7] ?
  ocfs2_read_inode_block_full+0x37/0x60
  Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [813964ff] ?
  ocfs2_fast_symlink_readpage+0x2f/0x160
  Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [8585] ?
  do_read_cache_page+0x85/0x180
  Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [813964d0] ?
  ocfs2_fill_super+0x2500/0x2500
  Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [86d9] ?
  read_cache_page+0x9/0x20
  Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [8115c705] ?
  page_getlink+0x25/0x80
  Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [8115c77b] ?
  page_follow_link_light+0x1b/0x30
  Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [8116099b] ?
  path_lookupat+0x38b/0x720
  Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [81160d5c] ?
  do_path_lookup+0x2c/0xd0
  Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [81346f31] ?
  ocfs2_inode_revalidate+0x71/0x160
  Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [81161c0c] ?
  user_path_at_empty+0x5c/0xb0
  Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [8106714a] ?
  do_page_fault+0x1aa/0x3c0
  Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [81156f2d] ?
  cp_new_stat+0x10d/0x120
  Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [81157021] ?
  vfs_fstatat+0x41/0x80
  Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [8115715f] ?
  sys_newstat+0x1f/0x50
  Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [817ecee2] ?
  system_call_fastpath+0x16/0x1b
  This stack trace is from 3.5, because of the location of the
  BUG.  The call path in the trace suggests the code added by Al's ea022d,
  but you say it breaks in 3.2 and 3.3 as well.  Can you give me a trace
  from 3.2?
 
 For a 3.2 kernel i get this stack trace. Different trace form 3.5 but
 exactly at the same moment. and for the same reasons.
 Seems to be less immmediate than with 3.5 but more a subjective
 imrpession than something based on fact. ( it takes a few seconds after
 deliver is started to have the bug )

Totally different stack trace.  Not in symlink code, but instead in
fallocate.  Weird.  I wonder if you are hitting two things.  Bisection
will definitely help.

Joel

 [  716.402833] o2dlm: Joining domain B43153ED20B942E291251F2C138ADA9E (
 0 1 ) 2 nodes
 [  716.501511] ocfs2: Mounting device (147,2) on (node 1, slot 0) with
 ordered data mode.
 [  716.505744] mount.ocfs2 used greatest stack depth: 2936 bytes left
 [  727.133743] deliver used greatest stack depth: 2632 bytes left
 [  764.167029] deliver used greatest stack depth: 1896 bytes left
 [  764.778872] BUG: unable to handle kernel NULL pointer dereference at
 0038
 [  764.778897] IP: [8133c51a]
 __ocfs2_change_file_space+0x75a/0x1690
 [  764.778922] PGD 62697067 PUD 67a81067 PMD 0
 [  764.778939] Oops:  [#1] SMP
 [  764.778953] CPU 0
 [  764.778959] Modules linked in: drbd lru_cache ipv6 [last unloaded: drbd]
 [  764.778986]
 [  764.778993] Pid: 5909, comm: deliver Not tainted 3.2.12-gentoo #2 HP
 ProLiant ML150 G3/ML150 G3
 [  764.779017] RIP: 0010:[8133c51a]  [8133c51a]
 __ocfs2_change_file_space+0x75a/0x1690
 [  764.779041] RSP: 0018:880067b2dd98  EFLAGS: 00010246
 [  764.779053] RAX:  RBX: 880067f82000 RCX:
 

Re: [Ocfs2-devel] kernel BUG at fs/buffer.c:2886! Linux 3.5.0

2012-07-30 Thread Sunil Mushran
The fallocate() oops is probably the same that is fixed by this patch.
https://oss.oracle.com/git/?p=smushran/linux-2.6.git;a=commit;h=a2118b301104a24381b414bc93371d666fe8d43a

Is in the list of patches that are ready to be pushed.
https://oss.oracle.com/git/?p=smushran/linux-2.6.git;a=shortlog;h=mw-3.4-mar15

On Mon, Jul 30, 2012 at 12:53 AM, Joel Becker jl...@evilplan.org wrote:

 On Mon, Jul 30, 2012 at 09:45:14AM +0200, Vincent ETIENNE wrote:
 
  HI,
 
  Le 30/07/2012 08:30, Joel Becker a écrit :
   On Sat, Jul 28, 2012 at 12:18:30AM +0200, Vincent ETIENNE wrote:
   Hello
  
   Get this on first write made ( by deliver sending mail to inform of
 the
   restart of services  )
   Home partition (the one receiving the mail) is based on ocfs2 created
   from drbd block device in primary/primary mode
   These drbd devices are based on lvm.
  
   system is running linux-3.5.0, identical symptom with linux 3.3 and
 3.2
   but working with linux 3.0 kernel
  
   reproduced on two machines ( so different hardware involved on this
 one
   software md raid on SATA, on second one areca hardware raid card )
   but the 2 machines are the one sharing this partition ( so share the
   same data )
   Hmm.  Any chance you can bisect this further?
 
  Will try to. Will take a few days as the server is in production ( but
  used as backup so...)
 
   Jul 27 23:41:41 jupiter2 kernel: [  351.169213] [ cut here
   ]
   Jul 27 23:41:41 jupiter2 kernel: [  351.169261] kernel BUG at
   fs/buffer.c:2886!
   This is:
  
   BUG_ON(!buffer_mapped(bh));
  
   in submit_bh().
  
  
   Jul 27 23:41:41 jupiter2 kernel: [  351.170003] Call Trace:
   Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [81327546]
 ?
   ocfs2_read_blocks+0x176/0x6c0
   Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [8114e541]
 ?
   T.1552+0x91/0x2b0
   Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [81346ad0]
 ?
   ocfs2_find_actor+0x120/0x120
   Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [813464f7]
 ?
   ocfs2_read_inode_block_full+0x37/0x60
   Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [813964ff]
 ?
   ocfs2_fast_symlink_readpage+0x2f/0x160
   Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [8585]
 ?
   do_read_cache_page+0x85/0x180
   Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [813964d0]
 ?
   ocfs2_fill_super+0x2500/0x2500
   Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [86d9]
 ?
   read_cache_page+0x9/0x20
   Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [8115c705]
 ?
   page_getlink+0x25/0x80
   Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [8115c77b]
 ?
   page_follow_link_light+0x1b/0x30
   Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [8116099b]
 ?
   path_lookupat+0x38b/0x720
   Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [81160d5c]
 ?
   do_path_lookup+0x2c/0xd0
   Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [81346f31]
 ?
   ocfs2_inode_revalidate+0x71/0x160
   Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [81161c0c]
 ?
   user_path_at_empty+0x5c/0xb0
   Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [8106714a]
 ?
   do_page_fault+0x1aa/0x3c0
   Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [81156f2d]
 ?
   cp_new_stat+0x10d/0x120
   Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [81157021]
 ?
   vfs_fstatat+0x41/0x80
   Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [8115715f]
 ?
   sys_newstat+0x1f/0x50
   Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [817ecee2]
 ?
   system_call_fastpath+0x16/0x1b
   This stack trace is from 3.5, because of the location of the
   BUG.  The call path in the trace suggests the code added by Al's
 ea022d,
   but you say it breaks in 3.2 and 3.3 as well.  Can you give me a trace
   from 3.2?
 
  For a 3.2 kernel i get this stack trace. Different trace form 3.5 but
  exactly at the same moment. and for the same reasons.
  Seems to be less immmediate than with 3.5 but more a subjective
  imrpession than something based on fact. ( it takes a few seconds after
  deliver is started to have the bug )

 Totally different stack trace.  Not in symlink code, but instead in
 fallocate.  Weird.  I wonder if you are hitting two things.  Bisection
 will definitely help.

 Joel

  [  716.402833] o2dlm: Joining domain B43153ED20B942E291251F2C138ADA9E (
  0 1 ) 2 nodes
  [  716.501511] ocfs2: Mounting device (147,2) on (node 1, slot 0) with
  ordered data mode.
  [  716.505744] mount.ocfs2 used greatest stack depth: 2936 bytes left
  [  727.133743] deliver used greatest stack depth: 2632 bytes left
  [  764.167029] deliver used greatest stack depth: 1896 bytes left
  [  764.778872] BUG: unable to handle kernel NULL pointer dereference at
  0038
  [  764.778897] IP: [8133c51a]
  __ocfs2_change_file_space+0x75a/0x1690
  [  764.778922] PGD 62697067 PUD