Re: RCU bug with v3.17-rc3 ?

2014-10-19 Thread Olof Johansson
On Sun, Oct 19, 2014 at 8:28 AM, Felipe Balbi  wrote:
> Hi,
>
> On Sun, Oct 19, 2014 at 10:54:16AM +0100, Russell King - ARM Linux wrote:
>> On Wed, Oct 15, 2014 at 10:25:13PM +0100, Russell King - ARM Linux wrote:
>> > On Wed, Oct 15, 2014 at 10:23:10PM +0100, Russell King - ARM Linux wrote:
>> > > As I said, I have a patch in progress, but it seems that there needed
>> > > to be some discussion about exactly which compiler versions are affected.
>> > > It seems that it's not as trivial as looking at the GCC bug entry.
>> >
>> > ... and in any case, it has been a known bug for well over a year now,
>> > and it seems that it doesn't affect _that_ many people.  So taking some
>> > extra time to get it properly correct is the _right_ thing to do.
>>
>> Well, this is just great.  Pushing out the change which blacklists these
>> compilers takes out Olof's kernel build system...
>>
>> Things are not as trivial as they seem.
>
> Maybe Olof just needs to update his compiler. Olof ?

Yep, doing a run with 4.9.1 to see how it looks. In the past, 4.9 has
been really noisy with warnings, maybe most of them have been fixed by
now.


-Olof
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RCU bug with v3.17-rc3 ?

2014-10-19 Thread Felipe Balbi
Hi,

On Sun, Oct 19, 2014 at 10:54:16AM +0100, Russell King - ARM Linux wrote:
> On Wed, Oct 15, 2014 at 10:25:13PM +0100, Russell King - ARM Linux wrote:
> > On Wed, Oct 15, 2014 at 10:23:10PM +0100, Russell King - ARM Linux wrote:
> > > As I said, I have a patch in progress, but it seems that there needed
> > > to be some discussion about exactly which compiler versions are affected.
> > > It seems that it's not as trivial as looking at the GCC bug entry.
> > 
> > ... and in any case, it has been a known bug for well over a year now,
> > and it seems that it doesn't affect _that_ many people.  So taking some
> > extra time to get it properly correct is the _right_ thing to do.
> 
> Well, this is just great.  Pushing out the change which blacklists these
> compilers takes out Olof's kernel build system...
> 
> Things are not as trivial as they seem.

Maybe Olof just needs to update his compiler. Olof ?

-- 
balbi


signature.asc
Description: Digital signature


Re: RCU bug with v3.17-rc3 ?

2014-10-19 Thread Russell King - ARM Linux
On Wed, Oct 15, 2014 at 10:25:13PM +0100, Russell King - ARM Linux wrote:
> On Wed, Oct 15, 2014 at 10:23:10PM +0100, Russell King - ARM Linux wrote:
> > As I said, I have a patch in progress, but it seems that there needed
> > to be some discussion about exactly which compiler versions are affected.
> > It seems that it's not as trivial as looking at the GCC bug entry.
> 
> ... and in any case, it has been a known bug for well over a year now,
> and it seems that it doesn't affect _that_ many people.  So taking some
> extra time to get it properly correct is the _right_ thing to do.

Well, this is just great.  Pushing out the change which blacklists these
compilers takes out Olof's kernel build system...

Things are not as trivial as they seem.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RCU bug with v3.17-rc3 ?

2014-10-15 Thread Russell King - ARM Linux
On Wed, Oct 15, 2014 at 10:23:10PM +0100, Russell King - ARM Linux wrote:
> As I said, I have a patch in progress, but it seems that there needed
> to be some discussion about exactly which compiler versions are affected.
> It seems that it's not as trivial as looking at the GCC bug entry.

... and in any case, it has been a known bug for well over a year now,
and it seems that it doesn't affect _that_ many people.  So taking some
extra time to get it properly correct is the _right_ thing to do.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RCU bug with v3.17-rc3 ?

2014-10-15 Thread Russell King - ARM Linux
On Tue, Oct 14, 2014 at 04:06:40AM +0200, Greg KH wrote:
> On Mon, Oct 13, 2014 at 12:43:07PM +0100, Russell King - ARM Linux wrote:
> > I think the only viable solution here is that:
> > 
> > 1. We blacklist the bad compiler versions outright in the kernel.
> 
> Yes, please do this, it's what we have done for other buggy compiler
> versions, no need to do something different here.
> 
> > Remember, it's the distro's choice to fix these buggy compilers, so the
> > onus is on _them_ to deal with the mess they've created by doing so.
> 
> I totally agree.
> 
> Is someone going to send this patch, or do I have to write it myself?

As I said, I have a patch in progress, but it seems that there needed
to be some discussion about exactly which compiler versions are affected.
It seems that it's not as trivial as looking at the GCC bug entry.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RCU bug with v3.17-rc3 ?

2014-10-14 Thread Peter Hurley
On 10/13/2014 10:06 PM, Greg KH wrote:
> On Mon, Oct 13, 2014 at 12:43:07PM +0100, Russell King - ARM Linux wrote:
>> On Mon, Oct 13, 2014 at 09:11:34AM +, David Laight wrote:
>>> From: Nathan Lynch
 On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
>
> Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
> it seems that this has been known about for some time.)

 Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x < 3
 are affected, as well as 4.9.0.

> We can blacklist these GCC versions quite easily.  We already have GCC
> 3.3 blacklisted, and it's trivial to add others.  I would want to include
> some proper details about the bug, just like the other existing entries
> we already have in asm-offsets.c, where we name the functions that the
> compiler is known to break where appropriate.

 Before blacklisting anything, it's worth considering that simple version
 checks would break existing pre-4.8.3 compilers that have been patched
 for PR58854.  It looks like Yocto and Buildroot issued releases with
 patched 4.8.2 compilers well before the (fixed) 4.8.3 release.  I think
 the most we can reasonably do without breaking some correctly-behaving
 toolchains is to emit a warning.
>>>
>>> Is it possible to compile a small code fragment and check the generated
>>> code for the bug?
>>> Possibly predicated on the broken version number to avoid false positives.
>>
>> I don't see how - it looks like it requires an interrupt to occur at an
>> opportune moment to provoke the function to fail.  The alternative would
>> be to parse the assembly generated by the compiler to determine how it
>> is dealing with the stack.
>>
>> I think the only viable solution here is that:
>>
>> 1. We blacklist the bad compiler versions outright in the kernel.
> 
> Yes, please do this, it's what we have done for other buggy compiler
> versions, no need to do something different here.
> 
>> Remember, it's the distro's choice to fix these buggy compilers, so the
>> onus is on _them_ to deal with the mess they've created by doing so.
> 
> I totally agree.
> 
> Is someone going to send this patch, or do I have to write it myself?

I did on Friday (arm: Blacklist gcc 4.8.[012] ...) but Russell said he
was doing it himself.

Regards,
Peter Hurley

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RCU bug with v3.17-rc3 ?

2014-10-13 Thread Greg KH
On Mon, Oct 13, 2014 at 12:43:07PM +0100, Russell King - ARM Linux wrote:
> On Mon, Oct 13, 2014 at 09:11:34AM +, David Laight wrote:
> > From: Nathan Lynch
> > > On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
> > > >
> > > > Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
> > > > it seems that this has been known about for some time.)
> > > 
> > > Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x < 3
> > > are affected, as well as 4.9.0.
> > > 
> > > > We can blacklist these GCC versions quite easily.  We already have GCC
> > > > 3.3 blacklisted, and it's trivial to add others.  I would want to 
> > > > include
> > > > some proper details about the bug, just like the other existing entries
> > > > we already have in asm-offsets.c, where we name the functions that the
> > > > compiler is known to break where appropriate.
> > > 
> > > Before blacklisting anything, it's worth considering that simple version
> > > checks would break existing pre-4.8.3 compilers that have been patched
> > > for PR58854.  It looks like Yocto and Buildroot issued releases with
> > > patched 4.8.2 compilers well before the (fixed) 4.8.3 release.  I think
> > > the most we can reasonably do without breaking some correctly-behaving
> > > toolchains is to emit a warning.
> > 
> > Is it possible to compile a small code fragment and check the generated
> > code for the bug?
> > Possibly predicated on the broken version number to avoid false positives.
> 
> I don't see how - it looks like it requires an interrupt to occur at an
> opportune moment to provoke the function to fail.  The alternative would
> be to parse the assembly generated by the compiler to determine how it
> is dealing with the stack.
> 
> I think the only viable solution here is that:
> 
> 1. We blacklist the bad compiler versions outright in the kernel.

Yes, please do this, it's what we have done for other buggy compiler
versions, no need to do something different here.

> Remember, it's the distro's choice to fix these buggy compilers, so the
> onus is on _them_ to deal with the mess they've created by doing so.

I totally agree.

Is someone going to send this patch, or do I have to write it myself?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RCU bug with v3.17-rc3 ?

2014-10-13 Thread Russell King - ARM Linux
On Mon, Oct 13, 2014 at 09:11:34AM +, David Laight wrote:
> From: Nathan Lynch
> > On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
> > >
> > > Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
> > > it seems that this has been known about for some time.)
> > 
> > Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x < 3
> > are affected, as well as 4.9.0.
> > 
> > > We can blacklist these GCC versions quite easily.  We already have GCC
> > > 3.3 blacklisted, and it's trivial to add others.  I would want to include
> > > some proper details about the bug, just like the other existing entries
> > > we already have in asm-offsets.c, where we name the functions that the
> > > compiler is known to break where appropriate.
> > 
> > Before blacklisting anything, it's worth considering that simple version
> > checks would break existing pre-4.8.3 compilers that have been patched
> > for PR58854.  It looks like Yocto and Buildroot issued releases with
> > patched 4.8.2 compilers well before the (fixed) 4.8.3 release.  I think
> > the most we can reasonably do without breaking some correctly-behaving
> > toolchains is to emit a warning.
> 
> Is it possible to compile a small code fragment and check the generated
> code for the bug?
> Possibly predicated on the broken version number to avoid false positives.

I don't see how - it looks like it requires an interrupt to occur at an
opportune moment to provoke the function to fail.  The alternative would
be to parse the assembly generated by the compiler to determine how it
is dealing with the stack.

I think the only viable solution here is that:

1. We blacklist the bad compiler versions outright in the kernel.
2. We /consider/ a testing a preprocessor symbol which when present
   indicates that these versions are fixed and should not be blacklisted.

The argument for (2) is that /if/ distros want to patch their compilers
to fix the problem, they /also/ have the ability to patch their compilers
to make them identifyable, and that is a far more reliable solution than
trying to parse the assembly output from multiple different GCC versions.

Remember, it's the distro's choice to fix these buggy compilers, so the
onus is on _them_ to deal with the mess they've created by doing so.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: RCU bug with v3.17-rc3 ?

2014-10-13 Thread David Laight
From: Nathan Lynch
> On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
> >
> > Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
> > it seems that this has been known about for some time.)
> 
> Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x < 3
> are affected, as well as 4.9.0.
> 
> > We can blacklist these GCC versions quite easily.  We already have GCC
> > 3.3 blacklisted, and it's trivial to add others.  I would want to include
> > some proper details about the bug, just like the other existing entries
> > we already have in asm-offsets.c, where we name the functions that the
> > compiler is known to break where appropriate.
> 
> Before blacklisting anything, it's worth considering that simple version
> checks would break existing pre-4.8.3 compilers that have been patched
> for PR58854.  It looks like Yocto and Buildroot issued releases with
> patched 4.8.2 compilers well before the (fixed) 4.8.3 release.  I think
> the most we can reasonably do without breaking some correctly-behaving
> toolchains is to emit a warning.

Is it possible to compile a small code fragment and check the generated
code for the bug?
Possibly predicated on the broken version number to avoid false positives.

David



--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RCU bug with v3.17-rc3 ?

2014-10-11 Thread Nathan Lynch
On 10/10/2014 08:44 PM, Nathan Lynch wrote:
> On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
>>
>> Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
>> it seems that this has been known about for some time.)
> 
> Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x < 3
> are affected, as well as 4.9.0.

Correction -- 4.9.0 has this fixed, even though the GCC PR shows it as a
"known to fail" version.

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RCU bug with v3.17-rc3 ?

2014-10-11 Thread Peter Hurley
On 10/11/2014 10:51 AM, Otavio Salvador wrote:
> Hello Russell,
> 
> On Sat, Oct 11, 2014 at 11:16 AM, Russell King - ARM Linux
>  wrote:
>> On Sat, Oct 11, 2014 at 11:54:32AM +0800, Peter Chen wrote:
>>> On Fri, Oct 10, 2014 at 08:44:33PM -0500, Nathan Lynch wrote:
 On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
>
> Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
> it seems that this has been known about for some time.)

 Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x < 3
 are affected, as well as 4.9.0.

> We can blacklist these GCC versions quite easily.  We already have GCC
> 3.3 blacklisted, and it's trivial to add others.  I would want to include
> some proper details about the bug, just like the other existing entries
> we already have in asm-offsets.c, where we name the functions that the
> compiler is known to break where appropriate.

 Before blacklisting anything, it's worth considering that simple version
 checks would break existing pre-4.8.3 compilers that have been patched
 for PR58854.  It looks like Yocto and Buildroot issued releases with
 patched 4.8.2 compilers well before the (fixed) 4.8.3 release.  I think
 the most we can reasonably do without breaking some correctly-behaving
 toolchains is to emit a warning.
>>>
>>> Yocto has PR58854 problem patch.
>>>
>>> http://git.yoctoproject.org/cgit/cgit.cgi/poky/tree/meta/recipes-devtools/gcc/gcc-4.8/0048-PR58854_fix_arm_apcs_epilogue.patch?h=daisy
>>
>> Right, and we can provide links to these in the comments above the #error
>> so people have the right places to do a bit of research into whether their
>> compiler is safe.
>>
>> It is unfortunate that they are indistinguishable from the broken versions,
>> but that's really a distro problem for causing that issue themselves -
>> especially given how serious this bug is.
> 
> What about checking if GCC_PR58854_FIXED is not defined for error? So
> build systems and people could easily define it if they know their GCC
> has the fix applied.

If the distro/build system/individual is capable of patching gcc, then it
seems reasonable that the same distro/build system/individual is capable
of carrying a patch on top of mainline kernel for building with their
"special" compiler.

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RCU bug with v3.17-rc3 ?

2014-10-11 Thread Otavio Salvador
Hello Russell,

On Sat, Oct 11, 2014 at 11:16 AM, Russell King - ARM Linux
 wrote:
> On Sat, Oct 11, 2014 at 11:54:32AM +0800, Peter Chen wrote:
>> On Fri, Oct 10, 2014 at 08:44:33PM -0500, Nathan Lynch wrote:
>> > On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
>> > >
>> > > Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
>> > > it seems that this has been known about for some time.)
>> >
>> > Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x < 3
>> > are affected, as well as 4.9.0.
>> >
>> > > We can blacklist these GCC versions quite easily.  We already have GCC
>> > > 3.3 blacklisted, and it's trivial to add others.  I would want to include
>> > > some proper details about the bug, just like the other existing entries
>> > > we already have in asm-offsets.c, where we name the functions that the
>> > > compiler is known to break where appropriate.
>> >
>> > Before blacklisting anything, it's worth considering that simple version
>> > checks would break existing pre-4.8.3 compilers that have been patched
>> > for PR58854.  It looks like Yocto and Buildroot issued releases with
>> > patched 4.8.2 compilers well before the (fixed) 4.8.3 release.  I think
>> > the most we can reasonably do without breaking some correctly-behaving
>> > toolchains is to emit a warning.
>>
>> Yocto has PR58854 problem patch.
>>
>> http://git.yoctoproject.org/cgit/cgit.cgi/poky/tree/meta/recipes-devtools/gcc/gcc-4.8/0048-PR58854_fix_arm_apcs_epilogue.patch?h=daisy
>
> Right, and we can provide links to these in the comments above the #error
> so people have the right places to do a bit of research into whether their
> compiler is safe.
>
> It is unfortunate that they are indistinguishable from the broken versions,
> but that's really a distro problem for causing that issue themselves -
> especially given how serious this bug is.

What about checking if GCC_PR58854_FIXED is not defined for error? So
build systems and people could easily define it if they know their GCC
has the fix applied.

-- 
Otavio Salvador O.S. Systems
http://www.ossystems.com.brhttp://code.ossystems.com.br
Mobile: +55 (53) 9981-7854Mobile: +1 (347) 903-9750
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RCU bug with v3.17-rc3 ?

2014-10-11 Thread Russell King - ARM Linux
On Sat, Oct 11, 2014 at 11:54:32AM +0800, Peter Chen wrote:
> On Fri, Oct 10, 2014 at 08:44:33PM -0500, Nathan Lynch wrote:
> > On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
> > > 
> > > Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
> > > it seems that this has been known about for some time.)
> > 
> > Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x < 3
> > are affected, as well as 4.9.0.
> > 
> > > We can blacklist these GCC versions quite easily.  We already have GCC
> > > 3.3 blacklisted, and it's trivial to add others.  I would want to include
> > > some proper details about the bug, just like the other existing entries
> > > we already have in asm-offsets.c, where we name the functions that the
> > > compiler is known to break where appropriate.
> > 
> > Before blacklisting anything, it's worth considering that simple version
> > checks would break existing pre-4.8.3 compilers that have been patched
> > for PR58854.  It looks like Yocto and Buildroot issued releases with
> > patched 4.8.2 compilers well before the (fixed) 4.8.3 release.  I think
> > the most we can reasonably do without breaking some correctly-behaving
> > toolchains is to emit a warning.
> 
> Yocto has PR58854 problem patch.
> 
> http://git.yoctoproject.org/cgit/cgit.cgi/poky/tree/meta/recipes-devtools/gcc/gcc-4.8/0048-PR58854_fix_arm_apcs_epilogue.patch?h=daisy

Right, and we can provide links to these in the comments above the #error
so people have the right places to do a bit of research into whether their
compiler is safe.

It is unfortunate that they are indistinguishable from the broken versions,
but that's really a distro problem for causing that issue themselves -
especially given how serious this bug is.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RCU bug with v3.17-rc3 ?

2014-10-11 Thread Russell King - ARM Linux
On Fri, Oct 10, 2014 at 08:44:33PM -0500, Nathan Lynch wrote:
> On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
> > We can blacklist these GCC versions quite easily.  We already have GCC
> > 3.3 blacklisted, and it's trivial to add others.  I would want to include
> > some proper details about the bug, just like the other existing entries
> > we already have in asm-offsets.c, where we name the functions that the
> > compiler is known to break where appropriate.
> 
> Before blacklisting anything, it's worth considering that simple version
> checks would break existing pre-4.8.3 compilers that have been patched
> for PR58854.  It looks like Yocto and Buildroot issued releases with
> patched 4.8.2 compilers well before the (fixed) 4.8.3 release.  I think
> the most we can reasonably do without breaking some correctly-behaving
> toolchains is to emit a warning.

I wish that it was possible to just do the warning thing, but unfortunately
evidence is that many people ignore compiler warnings, because they see
them appearing from the kernel soo often they have become de-sensitised
to them.

This is pretty obvious from the various nightly build systems which produce
the same warnings for months without any progress on them - some of them
can be quite serious (oops-able) where printf format strings are concerned.

> > for some time that GCC 4.8.1 and GCC 4.8.2 _can_ lead to filesystem
> > corruption, and have sat on their backsides doing nothing about getting
> > it blacklisted for something like a year.
> 
> Mea culpa, although I hadn't drawn the connection to FS corruption
> reports until now.  I have known about the issue for some time, but
> figured the prevalence of the fix in downstream projects largely
> mitigated the issue.

It's the FS corruption which swings it in favour of a #error - even if
we have a bunch of compilers around with that version which have the
problem fixed, it's /far/ better to #error out.  Those people who know
definitely that they have a fixed compiler can comment out the test
after checking that they do indeed have a fixed version, or are willing
to take the risk.

What we can't do is have kernels built by people who then run into FS
corruption because of this known issue.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RCU bug with v3.17-rc3 ?

2014-10-10 Thread Peter Chen
On Fri, Oct 10, 2014 at 08:44:33PM -0500, Nathan Lynch wrote:
> On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
> > 
> > Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
> > it seems that this has been known about for some time.)
> 
> Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x < 3
> are affected, as well as 4.9.0.
> 
> > We can blacklist these GCC versions quite easily.  We already have GCC
> > 3.3 blacklisted, and it's trivial to add others.  I would want to include
> > some proper details about the bug, just like the other existing entries
> > we already have in asm-offsets.c, where we name the functions that the
> > compiler is known to break where appropriate.
> 
> Before blacklisting anything, it's worth considering that simple version
> checks would break existing pre-4.8.3 compilers that have been patched
> for PR58854.  It looks like Yocto and Buildroot issued releases with
> patched 4.8.2 compilers well before the (fixed) 4.8.3 release.  I think
> the most we can reasonably do without breaking some correctly-behaving
> toolchains is to emit a warning.

Yocto has PR58854 problem patch.

http://git.yoctoproject.org/cgit/cgit.cgi/poky/tree/meta/recipes-devtools/gcc/gcc-4.8/0048-PR58854_fix_arm_apcs_epilogue.patch?h=daisy

> 
> Hopefully nobody's still using gcc 4.8 from the Linaro 2013.11 toolchain
> release -- since it's a 4.8.3 prerelease from before the fix was
> committed you'll get GCC_VERSION == 40803 but still generate bad code.
> 
> > However, I'm rather annoyed that there are people here who have known
> > for some time that GCC 4.8.1 and GCC 4.8.2 _can_ lead to filesystem
> > corruption, and have sat on their backsides doing nothing about getting
> > it blacklisted for something like a year.
> 
> Mea culpa, although I hadn't drawn the connection to FS corruption
> reports until now.  I have known about the issue for some time, but
> figured the prevalence of the fix in downstream projects largely
> mitigated the issue.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Best Regards,
Peter Chen
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RCU bug with v3.17-rc3 ?

2014-10-10 Thread Peter Hurley
On 10/10/2014 09:44 PM, Nathan Lynch wrote:
> On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
>>
>> Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
>> it seems that this has been known about for some time.)
> 
> Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x < 3
> are affected, as well as 4.9.0.
> 
>> We can blacklist these GCC versions quite easily.  We already have GCC
>> 3.3 blacklisted, and it's trivial to add others.  I would want to include
>> some proper details about the bug, just like the other existing entries
>> we already have in asm-offsets.c, where we name the functions that the
>> compiler is known to break where appropriate.
> 
> Before blacklisting anything, it's worth considering that simple version
> checks would break existing pre-4.8.3 compilers that have been patched
> for PR58854.  It looks like Yocto and Buildroot issued releases with
> patched 4.8.2 compilers well before the (fixed) 4.8.3 release.  I think
> the most we can reasonably do without breaking some correctly-behaving
> toolchains is to emit a warning.

Providing a manual switch to override blacklisting is way more sane
than a build warning that no one's looking at.

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RCU bug with v3.17-rc3 ?

2014-10-10 Thread Nathan Lynch
On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
> 
> Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
> it seems that this has been known about for some time.)

Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x < 3
are affected, as well as 4.9.0.

> We can blacklist these GCC versions quite easily.  We already have GCC
> 3.3 blacklisted, and it's trivial to add others.  I would want to include
> some proper details about the bug, just like the other existing entries
> we already have in asm-offsets.c, where we name the functions that the
> compiler is known to break where appropriate.

Before blacklisting anything, it's worth considering that simple version
checks would break existing pre-4.8.3 compilers that have been patched
for PR58854.  It looks like Yocto and Buildroot issued releases with
patched 4.8.2 compilers well before the (fixed) 4.8.3 release.  I think
the most we can reasonably do without breaking some correctly-behaving
toolchains is to emit a warning.

Hopefully nobody's still using gcc 4.8 from the Linaro 2013.11 toolchain
release -- since it's a 4.8.3 prerelease from before the fix was
committed you'll get GCC_VERSION == 40803 but still generate bad code.

> However, I'm rather annoyed that there are people here who have known
> for some time that GCC 4.8.1 and GCC 4.8.2 _can_ lead to filesystem
> corruption, and have sat on their backsides doing nothing about getting
> it blacklisted for something like a year.

Mea culpa, although I hadn't drawn the connection to FS corruption
reports until now.  I have known about the issue for some time, but
figured the prevalence of the fix in downstream projects largely
mitigated the issue.

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RCU bug with v3.17-rc3 ?

2014-10-10 Thread Aaro Koskinen
On Fri, Oct 10, 2014 at 05:18:35PM +0100, Russell King - ARM Linux wrote:
> On Fri, Oct 10, 2014 at 12:47:06AM +0300, Aaro Koskinen wrote:
> > On Thu, Oct 09, 2014 at 10:41:01PM +0200, Rabin Vincent wrote:
> > >   What GCC version are you using?
> > >   
> > >   4.8.1 and 4.8.2 are known to miscompile the ARM kernel and these
> > >   find_get_entry() crashes with 0x involved smell a lot like the
> > >   earlier reports from kernels build with those compilers:
> > >   
> > >   https://lkml.org/lkml/2014/6/25/456
> > >   https://lkml.org/lkml/2014/6/30/375
> > >   https://lkml.org/lkml/2014/6/30/660
> > >   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854
> > >   https://lkml.org/lkml/2014/5/9/330
> > 
> > Is it possible to blacklist those GCC versions on ARM somehow as it
> > seems people are still using them?
> > 
> > This bug also ruined a file system on one of my boxes last year
> > (see e.g. http://marc.info/?l=linux-arm-kernel&m=139033442527244&w=2).
> 
> Given that, why the fsck (pun intended) did you not shout a little louder
> about getting it blacklisted.  Looking at your marc.info URL, there's
> very little information there which hints at filesystem corruption, and
> it's a thread of only *one* message according to marc.info.
> 
> Even _if_ I did read the message you point to above, that on its own did
> not hint at filesystem corruption.
> 
> So, would you please mind passing on further details about this,
> specifically which function in the ext4 code is affected, so it can
> be properly written up.

I have not done any proper deeper analysis. After I first mailed about
the issue I just downgraded GCC and pretty much forgot about it until
an engineer from some commercial Linux vendor replied privately months
later and kindly pointed me the needed GCC fix (which I then shared
in the reply). Then I just moved on using a newer GCC with no issues.
Obviously this was not a widespread problem since no one else
reported the same.

Today I again booted a kernel compiled with GCC 4.8.2 and still was able
reproduce the issue, and I think below shows that at least ext3 can
easily end up in inconsistent state using these compiler versions:

0) Run the bad kernel:

~ # dmesg|grep GCC
[0.00] Linux version 3.17.0-mvebu-los_9755+ (aaro@cooljazz) (gcc 
version 4.8.2 (GCC) ) #1 Fri Oct 10 21:05:20 EEST 2014

1) Start with small ext3 (writeback) fs with gcc tarball:

/mnt/test # ls -l
total 84092
-rw-r--r--1 root root  85999682 Apr 24 21:52 gcc-4.8.2.tar.bz2
drwx--2 root root 16384 Oct 10 10:33 lost+found
/mnt/test # df -h .
FilesystemSize  Used Available Use% Mounted on
/dev/sdc1 3.8G 90.2M  3.5G   2% /mnt/test

2) Extract, delete & crash:

/mnt/test # tar xjf gcc-4.8.2.tar.bz2
/mnt/test # rm -rf gcc-4.8.2
rm: can't remove 'gcc-4.8.2/libgfortran/generated': Directory not empty
rm: can't remove 'gcc-4.8.2/libgfortran': Directory not empty
rm: can't remove 
'gcc-4.8.2/gcc/testsuite/gcc.dg/compat/struct-by-value-18a_y.c': No such file 
or directory
rm: can't remove 'gcc-4.8.2/gcc/testsuite/gcc.dg/compat': Directory not empty
rm: can't remove 'gcc-4.8.2/gcc/testsuite/gcc.dg/tree-ssa': Directory not empty
rm: can't remove 'gcc-4.8.2/gcc/testsuite/gcc.dg': Directory not empty
rm: can't remove 
'gcc-4.8.2/gcc/testsuite/gfortran.dg/result_default_init_1.f90': No such file 
or directory
rm: can't remove 'gcc-4.8.2/gcc/testsuite/gfortran.dg': Directory not empty
[  960.864433] Unable to handle kernel paging request at virtual address 

[  960.930597] pgd = df6e
[  960.990849] [] *pgd=1fffd831, *pte=, *ppte=
[  961.056512] Internal error: Oops: 1 [#1] ARM
[  961.120063] Modules linked in:
[  961.180974] CPU: 0 PID: 684 Comm: rm Not tainted 3.17.0-mvebu-los_9755+ #1
[  961.247146] task: df447b00 ti: df4de000 task.ti: df4de000
[  961.311524] PC is at find_get_entry+0x28/0x84
[  961.375037] LR is at radix_tree_lookup_slot+0x1c/0x2c
[  961.439061] pc : []lr : []psr: a013
[  961.439061] sp : df4dfc68  ip :   fp : df4dfc7c
[  961.570018] r10: 0001  r9 : c04e3253  r8 : df020b60
[  961.634596] r7 : 0009001a  r6 :   r5 : 0009001a  r4 : df020c90
[  961.700070] r3 :   r2 :   r1 : 0009001a  r0 : 
[  961.764437] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  961.830518] Control: 0005317f  Table: 1f6e  DAC: 0015
[  961.895866] Process rm (pid: 684, stack limit = 0xdf4de1c0)
[  961.960597] Stack: (0xdf4dfc68 to 0xdf4e)
[  962.022968] fc60:   0001 df020c8c df4dfcb4 df4dfc80 
c006eef68 c006e400
[  962.091214] fc80: c00d4e80 c00d4764 1000 0009001a   
df0200b60 df020b60
[  962.159490] fca0: df020bd8 df04e4d8 df4dfd04 df4dfcb8 c00d34c0 c006ef44 
0 df4dfcc8
[  962.226940] fcc0: c00d4e80 c00d4764 1000 0001 df4dfd84 dd1c73f0 
000900306 
[  962.295558] fce0: 00090068 

Re: RCU bug with v3.17-rc3 ?

2014-10-10 Thread Russell King - ARM Linux
On Fri, Oct 10, 2014 at 08:57:43AM -0500, Felipe Balbi wrote:
> On Thu, Oct 09, 2014 at 04:07:15PM -0500, Felipe Balbi wrote:
> > Hi,
> > 
> > On Thu, Oct 09, 2014 at 03:46:37PM -0500, Felipe Balbi wrote:
> > > On Thu, Oct 09, 2014 at 10:41:01PM +0200, Rabin Vincent wrote:
> > > > On Thu, Oct 09, 2014 at 11:26:56AM -0500, Felipe Balbi wrote:
> > > > > alright, it's pretty deterministic however. Always on the same test, 
> > > > > no
> > > > > matter which USB controller, no matter if backing store is RAM or MMC.
> > > > > 
> > > > > Those two undefined instructions on the disassembly caught my 
> > > > > attention,
> > > > > perhaps I'm facing a GCC bug ?
> > > > 
> > > > The undefined instructions are just ARM's BUG() implementation.
> > > > 
> > > > But did you see the question I asked you yesterday in your other thread?
> > > > http://www.spinics.net/lists/arm-kernel/msg368634.html
> > > 
> > > hmm, completely missed that, sorry. I'm using 4.8.2, will try something
> > > else.
> > 
> > seems to be working fine now, thanks. I'll leave test running overnight
> > just in case.
> 
> yup, ran over night without any problems.

Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
it seems that this has been known about for some time.)

We can blacklist these GCC versions quite easily.  We already have GCC
3.3 blacklisted, and it's trivial to add others.  I would want to include
some proper details about the bug, just like the other existing entries
we already have in asm-offsets.c, where we name the functions that the
compiler is known to break where appropriate.

However, I'm rather annoyed that there are people here who have known
for some time that GCC 4.8.1 and GCC 4.8.2 _can_ lead to filesystem
corruption, and have sat on their backsides doing nothing about getting
it blacklisted for something like a year.

When people talk about the ARM community being dysfunctional... well,
this kind of irresponsible behaviour just gives them more fodder to
throw at us.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RCU bug with v3.17-rc3 ?

2014-10-10 Thread Russell King - ARM Linux
On Fri, Oct 10, 2014 at 12:47:06AM +0300, Aaro Koskinen wrote:
> Hi,
> 
> On Thu, Oct 09, 2014 at 10:41:01PM +0200, Rabin Vincent wrote:
> >   What GCC version are you using?
> >   
> >   4.8.1 and 4.8.2 are known to miscompile the ARM kernel and these
> >   find_get_entry() crashes with 0x involved smell a lot like the
> >   earlier reports from kernels build with those compilers:
> >   
> >   https://lkml.org/lkml/2014/6/25/456
> >   https://lkml.org/lkml/2014/6/30/375
> >   https://lkml.org/lkml/2014/6/30/660
> >   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854
> >   https://lkml.org/lkml/2014/5/9/330
> 
> Is it possible to blacklist those GCC versions on ARM somehow as it
> seems people are still using them?
> 
> This bug also ruined a file system on one of my boxes last year
> (see e.g. http://marc.info/?l=linux-arm-kernel&m=139033442527244&w=2).

Given that, why the fsck (pun intended) did you not shout a little louder
about getting it blacklisted.  Looking at your marc.info URL, there's
very little information there which hints at filesystem corruption, and
it's a thread of only *one* message according to marc.info.

Even _if_ I did read the message you point to above, that on its own did
not hint at filesystem corruption.

So, would you please mind passing on further details about this,
specifically which function in the ext4 code is affected, so it can
be properly written up.

Thanks.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RCU bug with v3.17-rc3 ?

2014-10-10 Thread Felipe Balbi
On Thu, Oct 09, 2014 at 04:07:15PM -0500, Felipe Balbi wrote:
> Hi,
> 
> On Thu, Oct 09, 2014 at 03:46:37PM -0500, Felipe Balbi wrote:
> > On Thu, Oct 09, 2014 at 10:41:01PM +0200, Rabin Vincent wrote:
> > > On Thu, Oct 09, 2014 at 11:26:56AM -0500, Felipe Balbi wrote:
> > > > alright, it's pretty deterministic however. Always on the same test, no
> > > > matter which USB controller, no matter if backing store is RAM or MMC.
> > > > 
> > > > Those two undefined instructions on the disassembly caught my attention,
> > > > perhaps I'm facing a GCC bug ?
> > > 
> > > The undefined instructions are just ARM's BUG() implementation.
> > > 
> > > But did you see the question I asked you yesterday in your other thread?
> > > http://www.spinics.net/lists/arm-kernel/msg368634.html
> > 
> > hmm, completely missed that, sorry. I'm using 4.8.2, will try something
> > else.
> 
> seems to be working fine now, thanks. I'll leave test running overnight
> just in case.

yup, ran over night without any problems.

-- 
balbi


signature.asc
Description: Digital signature


Re: RCU bug with v3.17-rc3 ?

2014-10-09 Thread Aaro Koskinen
Hi,

On Thu, Oct 09, 2014 at 10:41:01PM +0200, Rabin Vincent wrote:
>   What GCC version are you using?
>   
>   4.8.1 and 4.8.2 are known to miscompile the ARM kernel and these
>   find_get_entry() crashes with 0x involved smell a lot like the
>   earlier reports from kernels build with those compilers:
>   
>   https://lkml.org/lkml/2014/6/25/456
>   https://lkml.org/lkml/2014/6/30/375
>   https://lkml.org/lkml/2014/6/30/660
>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854
>   https://lkml.org/lkml/2014/5/9/330

Is it possible to blacklist those GCC versions on ARM somehow as it
seems people are still using them?

This bug also ruined a file system on one of my boxes last year
(see e.g. http://marc.info/?l=linux-arm-kernel&m=139033442527244&w=2).

A.
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RCU bug with v3.17-rc3 ?

2014-10-09 Thread Felipe Balbi
Hi,

On Thu, Oct 09, 2014 at 03:46:37PM -0500, Felipe Balbi wrote:
> On Thu, Oct 09, 2014 at 10:41:01PM +0200, Rabin Vincent wrote:
> > On Thu, Oct 09, 2014 at 11:26:56AM -0500, Felipe Balbi wrote:
> > > alright, it's pretty deterministic however. Always on the same test, no
> > > matter which USB controller, no matter if backing store is RAM or MMC.
> > > 
> > > Those two undefined instructions on the disassembly caught my attention,
> > > perhaps I'm facing a GCC bug ?
> > 
> > The undefined instructions are just ARM's BUG() implementation.
> > 
> > But did you see the question I asked you yesterday in your other thread?
> > http://www.spinics.net/lists/arm-kernel/msg368634.html
> 
> hmm, completely missed that, sorry. I'm using 4.8.2, will try something
> else.

seems to be working fine now, thanks. I'll leave test running overnight
just in case.

thanks again, and sorry for the noise.

PS: I wonder if we should a warning message to the build system if we're
building with known broken versions of GCC.

-- 
balbi


signature.asc
Description: Digital signature


Re: RCU bug with v3.17-rc3 ?

2014-10-09 Thread Felipe Balbi
Hi,

On Thu, Oct 09, 2014 at 10:41:01PM +0200, Rabin Vincent wrote:
> On Thu, Oct 09, 2014 at 11:26:56AM -0500, Felipe Balbi wrote:
> > alright, it's pretty deterministic however. Always on the same test, no
> > matter which USB controller, no matter if backing store is RAM or MMC.
> > 
> > Those two undefined instructions on the disassembly caught my attention,
> > perhaps I'm facing a GCC bug ?
> 
> The undefined instructions are just ARM's BUG() implementation.
> 
> But did you see the question I asked you yesterday in your other thread?
> http://www.spinics.net/lists/arm-kernel/msg368634.html

hmm, completely missed that, sorry. I'm using 4.8.2, will try something
else.

-- 
balbi


signature.asc
Description: Digital signature


Re: RCU bug with v3.17-rc3 ?

2014-10-09 Thread Rabin Vincent
On Thu, Oct 09, 2014 at 11:26:56AM -0500, Felipe Balbi wrote:
> alright, it's pretty deterministic however. Always on the same test, no
> matter which USB controller, no matter if backing store is RAM or MMC.
> 
> Those two undefined instructions on the disassembly caught my attention,
> perhaps I'm facing a GCC bug ?

The undefined instructions are just ARM's BUG() implementation.

But did you see the question I asked you yesterday in your other thread?
http://www.spinics.net/lists/arm-kernel/msg368634.html

Here it is again:

  What GCC version are you using?
  
  4.8.1 and 4.8.2 are known to miscompile the ARM kernel and these
  find_get_entry() crashes with 0x involved smell a lot like the
  earlier reports from kernels build with those compilers:
  
  https://lkml.org/lkml/2014/6/25/456
  https://lkml.org/lkml/2014/6/30/375
  https://lkml.org/lkml/2014/6/30/660
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854
  https://lkml.org/lkml/2014/5/9/330

Also, I didn't see any public email making a definitive link between GCC
PR 58854 that Nathan pointed out in https://lkml.org/lkml/2014/6/30/660
and the earlier find_get_entry() crashes, but I just built GCC 4.8.1 and
an ARM kernel with that, and the GCC bug is clearly seen in
radix_tree_lookup_slot() which returns the pointer which
find_get_entry() is dereferencing:

  :
   e1a0c00d  mov ip, sp
   e92dd800  push{fp, ip, lr, pc}
   e24cb004  sub fp, ip, #4
   e24dd008  sub sp, sp, #8
   e3a02000  mov r2, #0
   e24b3010  sub r3, fp, #16
   ebc5  bl  c0176ab8 <__radix_tree_lookup>
   e24bd00c  sub sp, fp, #12<--- sp moved up
   e350  cmp r0, #0
   151b0010  ldrne   r0, [fp, #-16] <--- load from under sp 
   e89da800  ldm sp, {fp, sp, pc}

Please check your compiler to make sure it's not the same problem.
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RCU bug with v3.17-rc3 ?

2014-10-09 Thread Felipe Balbi
Hi,

On Thu, Oct 09, 2014 at 11:26:56AM -0500, Felipe Balbi wrote:
> > I'm thinking it's not the slot pointer itself that's bad, because
> > __radix_tree_lookup() dereferences that to test if it's populated
> > before returning it, and slot life-time is guaranteed by RCU.
> > 
> > That would only leave garbage in the slot itself, crashing during
> > page_cache_get_speculative().
> > 
> > I'll keep staring at this change, but nothing stands out to me yet.
> 
> alright, it's pretty deterministic however. Always on the same test, no
> matter which USB controller, no matter if backing store is RAM or MMC.
> 
> Those two undefined instructions on the disassembly caught my attention,
> perhaps I'm facing a GCC bug ?

no, probably not a GCC bug. Looking at your commit, however. Man, it
does quite many things at once. Moves code around, adds new functions by
refactoring (and changing) code, renames things, changes int offset into
unsigned ints. Should not be too difficult too to miss a bug in there.

I'll continue digging here.

-- 
balbi


signature.asc
Description: Digital signature


Re: RCU bug with v3.17-rc3 ?

2014-10-09 Thread Felipe Balbi
Hi Johannes,

On Thu, Oct 09, 2014 at 12:01:38PM -0400, Johannes Weiner wrote:
> On Wed, Oct 08, 2014 at 04:29:38PM -0500, Felipe Balbi wrote:
> > Finally bisected it down to commit 139e561660fe11e0fc35e142a800df3dd7d03e9d
> > (lib: radix_tree: tree node interface). Here's full bisect log:
> > 
> > git bisect start
> > # good: [455c6fdbd219161bd09b1165f11699d6d73de11c] Linux 3.14
> > git bisect good 455c6fdbd219161bd09b1165f11699d6d73de11c
> > # bad: [1860e379875dfe7271c649058aeddffe5afd9d0d] Linux 3.15
> > git bisect bad 1860e379875dfe7271c649058aeddffe5afd9d0d
> > # bad: [74a475acea49459721ae4b062d3da68c74259009] SubmittingPatches: add 
> > style recommendation to use imperative descriptions
> > git bisect bad 74a475acea49459721ae4b062d3da68c74259009
> > # good: [c12e69c6aaf785fd307d05cb6f36ca0e7577ead7] Merge tag 
> > 'staging-3.15-rc1' of 
> > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
> > git bisect good c12e69c6aaf785fd307d05cb6f36ca0e7577ead7
> > # good: [0fc31966035d7a540c011b6c967ce8eae1db121b] Merge branch 'for-davem' 
> > of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
> > git bisect good 0fc31966035d7a540c011b6c967ce8eae1db121b
> > # good: [bdfc7cbdeef8cadba0e5793079ac0130b8e2220c] Merge branch 
> > 'mips-for-linux-next' of git://git.linux-mips.org/pub/scm/ralf/upstream-sfr
> > git bisect good bdfc7cbdeef8cadba0e5793079ac0130b8e2220c
> > # good: [0f1b1e6d73cb989ce2c071edc57deade3b084dfe] Merge branch 'for-linus' 
> > of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
> > git bisect good 0f1b1e6d73cb989ce2c071edc57deade3b084dfe
> > # good: [181e7d5d7bd7747e882e3ca89ecbf0fc3e72d0da] ixgbe: remove redundant 
> > if clause from PTP work
> > git bisect good 181e7d5d7bd7747e882e3ca89ecbf0fc3e72d0da
> > # good: [59ecc26004e77e100c700b1d0da7502b0fdadb46] Merge 
> > git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
> > git bisect good 59ecc26004e77e100c700b1d0da7502b0fdadb46
> > # good: [2b665e276c15ba7d9fc8cdd16931883a51ed13e4] fs/direct-io.c: remove 
> > redundant comparison
> > git bisect good 2b665e276c15ba7d9fc8cdd16931883a51ed13e4
> > # bad: [f412c97abef71026d8192ca8efca231f1e3906b3] mm, hugetlb: mark some 
> > bootstrap functions as __init
> > git bisect bad f412c97abef71026d8192ca8efca231f1e3906b3
> > # good: [4e35f483850ba46b838adfd312b3052416e15204] mm, hugetlb: use 
> > vma_resv_map() map types
> > git bisect good 4e35f483850ba46b838adfd312b3052416e15204
> > # good: [6dbaf22ce1f1dfba33313198eb5bd989ae76dd87] mm: shmem: save one 
> > radix tree lookup when truncating swapped pages
> > git bisect good 6dbaf22ce1f1dfba33313198eb5bd989ae76dd87
> > # good: [91b0abe36a7b2b3b02d7500925a5f8455334f0e5] mm + fs: store shadow 
> > entries in page cache
> > git bisect good 91b0abe36a7b2b3b02d7500925a5f8455334f0e5
> > # bad: [139e561660fe11e0fc35e142a800df3dd7d03e9d] lib: radix_tree: tree 
> > node interface
> > git bisect bad 139e561660fe11e0fc35e142a800df3dd7d03e9d
> > # good: [a528910e12ec7ee203095eb1711468a66b9b60b0] mm: thrash 
> > detection-based file cache sizing
> > git bisect good a528910e12ec7ee203095eb1711468a66b9b60b0
> > # first bad commit: [139e561660fe11e0fc35e142a800df3dd7d03e9d] lib: 
> > radix_tree: tree node interface
> > 
> > I tried reverting that commit on v3.15 but it's non-trivial; I'll leave
> > that for tomorrow. Meanwhile, adding folks involved with that commit to
> > Cc list and another backtrace for reference:
> > 
> > [  113.696647] Unable to handle kernel paging request at virtual address 
> > 
> > [  113.704370] pgd = c0004000
> > [  113.707276] [] *pgd=9fef6821, *pte=, *ppte=
> > [  113.713998] Internal error: Oops: 17 [#1] SMP ARM
> > [  113.718912] Modules linked in: g_mass_storage usb_f_mass_storage 
> > libcomposite configfs musb_dsps musb_hdrc musb_am335x
> > [  113.730144] CPU: 0 PID: 1368 Comm: file-storage Not tainted 
> > 3.17.0-02899-g748eb79 #239
> > [  113.738410] task: de606e00 ti: dd0ba000 task.ti: dd0ba000
> > [  113.744060] PC is at find_get_entry+0x64/0x100
> 
> Could you please provide the disassembly of that function?

here you go. It's ARM assembly however:

Dump of assembler code for function find_get_entry:
   0xc011da48 <+0>: mov r12, sp
   0xc011da4c <+4>: push{r4, r5, r6, r7, r8, r9, r11, r12, lr, pc}
   0xc011da50 <+8>: sub r11, r12, #4
   0xc011da54 <+12>:sub sp, sp, #16
   0xc011da58 <+16>:push{lr}; (str lr, [sp, #-4]!)
   0xc011da5c <+20>:bl  0xc000ef00 <__gnu_mcount_nc>
   0xc011da60 <+24>:mov r6, r0
   0xc011da64 <+28>:mov r7, r1
   0xc011da68 <+32>:ldr r2, [pc, #520]  ; 0xc011dc78 

   0xc011da6c <+36>:mov r3, #0
   0xc011da70 <+40>:mov r1, r3
   0xc011da74 <+44>:str r2, [sp, #8]
   0xc011da78 <+48>:str r3, [sp]
   0xc011da7c <+52>:mov r2, r3
   0xc011da80 <+56>:str r3, [sp, #4]
   0xc011da84 <+60>:ldr r0, [p

Re: RCU bug with v3.17-rc3 ?

2014-10-09 Thread Johannes Weiner
Hi Felipe,

On Wed, Oct 08, 2014 at 04:29:38PM -0500, Felipe Balbi wrote:
> Finally bisected it down to commit 139e561660fe11e0fc35e142a800df3dd7d03e9d
> (lib: radix_tree: tree node interface). Here's full bisect log:
> 
> git bisect start
> # good: [455c6fdbd219161bd09b1165f11699d6d73de11c] Linux 3.14
> git bisect good 455c6fdbd219161bd09b1165f11699d6d73de11c
> # bad: [1860e379875dfe7271c649058aeddffe5afd9d0d] Linux 3.15
> git bisect bad 1860e379875dfe7271c649058aeddffe5afd9d0d
> # bad: [74a475acea49459721ae4b062d3da68c74259009] SubmittingPatches: add 
> style recommendation to use imperative descriptions
> git bisect bad 74a475acea49459721ae4b062d3da68c74259009
> # good: [c12e69c6aaf785fd307d05cb6f36ca0e7577ead7] Merge tag 
> 'staging-3.15-rc1' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
> git bisect good c12e69c6aaf785fd307d05cb6f36ca0e7577ead7
> # good: [0fc31966035d7a540c011b6c967ce8eae1db121b] Merge branch 'for-davem' 
> of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
> git bisect good 0fc31966035d7a540c011b6c967ce8eae1db121b
> # good: [bdfc7cbdeef8cadba0e5793079ac0130b8e2220c] Merge branch 
> 'mips-for-linux-next' of git://git.linux-mips.org/pub/scm/ralf/upstream-sfr
> git bisect good bdfc7cbdeef8cadba0e5793079ac0130b8e2220c
> # good: [0f1b1e6d73cb989ce2c071edc57deade3b084dfe] Merge branch 'for-linus' 
> of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
> git bisect good 0f1b1e6d73cb989ce2c071edc57deade3b084dfe
> # good: [181e7d5d7bd7747e882e3ca89ecbf0fc3e72d0da] ixgbe: remove redundant if 
> clause from PTP work
> git bisect good 181e7d5d7bd7747e882e3ca89ecbf0fc3e72d0da
> # good: [59ecc26004e77e100c700b1d0da7502b0fdadb46] Merge 
> git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
> git bisect good 59ecc26004e77e100c700b1d0da7502b0fdadb46
> # good: [2b665e276c15ba7d9fc8cdd16931883a51ed13e4] fs/direct-io.c: remove 
> redundant comparison
> git bisect good 2b665e276c15ba7d9fc8cdd16931883a51ed13e4
> # bad: [f412c97abef71026d8192ca8efca231f1e3906b3] mm, hugetlb: mark some 
> bootstrap functions as __init
> git bisect bad f412c97abef71026d8192ca8efca231f1e3906b3
> # good: [4e35f483850ba46b838adfd312b3052416e15204] mm, hugetlb: use 
> vma_resv_map() map types
> git bisect good 4e35f483850ba46b838adfd312b3052416e15204
> # good: [6dbaf22ce1f1dfba33313198eb5bd989ae76dd87] mm: shmem: save one radix 
> tree lookup when truncating swapped pages
> git bisect good 6dbaf22ce1f1dfba33313198eb5bd989ae76dd87
> # good: [91b0abe36a7b2b3b02d7500925a5f8455334f0e5] mm + fs: store shadow 
> entries in page cache
> git bisect good 91b0abe36a7b2b3b02d7500925a5f8455334f0e5
> # bad: [139e561660fe11e0fc35e142a800df3dd7d03e9d] lib: radix_tree: tree node 
> interface
> git bisect bad 139e561660fe11e0fc35e142a800df3dd7d03e9d
> # good: [a528910e12ec7ee203095eb1711468a66b9b60b0] mm: thrash detection-based 
> file cache sizing
> git bisect good a528910e12ec7ee203095eb1711468a66b9b60b0
> # first bad commit: [139e561660fe11e0fc35e142a800df3dd7d03e9d] lib: 
> radix_tree: tree node interface
> 
> I tried reverting that commit on v3.15 but it's non-trivial; I'll leave
> that for tomorrow. Meanwhile, adding folks involved with that commit to
> Cc list and another backtrace for reference:
> 
> [  113.696647] Unable to handle kernel paging request at virtual address 
> 
> [  113.704370] pgd = c0004000
> [  113.707276] [] *pgd=9fef6821, *pte=, *ppte=
> [  113.713998] Internal error: Oops: 17 [#1] SMP ARM
> [  113.718912] Modules linked in: g_mass_storage usb_f_mass_storage 
> libcomposite configfs musb_dsps musb_hdrc musb_am335x
> [  113.730144] CPU: 0 PID: 1368 Comm: file-storage Not tainted 
> 3.17.0-02899-g748eb79 #239
> [  113.738410] task: de606e00 ti: dd0ba000 task.ti: dd0ba000
> [  113.744060] PC is at find_get_entry+0x64/0x100

Could you please provide the disassembly of that function?

I'm thinking it's not the slot pointer itself that's bad, because
__radix_tree_lookup() dereferences that to test if it's populated
before returning it, and slot life-time is guaranteed by RCU.

That would only leave garbage in the slot itself, crashing during
page_cache_get_speculative().

I'll keep staring at this change, but nothing stands out to me yet.

Thanks,
Johannes

> [  113.748700] LR is at 0xfffa
> [  113.751978] pc : []lr : []psr: a00f0013
> [  113.751978] sp : dd0bbba0  ip :   fp : dd0bbbd4
> [  113.763962] r10: c0665100  r9 : 1000  r8 : 001a
> [  113.769415] r7 : dd0ee9b8  r6 : 0001  r5 :   r4 : dd0ee880
> [  113.776228] r3 : dd0bbb8c  r2 :   r1 : 001a  r0 : 
> [  113.783044] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment 
> kernel
> [  113.790674] Control: 10c5387d  Table: 9e210019  DAC: 0015
> [  113.796672] Process file-storage (pid: 1368, stack limit = 0xdd0ba248)
> [  113.803486] Stack: (0xdd0bbba0 to 0xdd0bc000)
> [  113.808038] bba0:   c

Re: RCU bug with v3.17-rc3 ?

2014-10-08 Thread Felipe Balbi
Hi,

On Wed, Oct 08, 2014 at 12:57:07PM -0500, Felipe Balbi wrote:

[ snip ]

> > > > It seems to be a difficult-to-reproduce race though. On a second boot it
> > > > didn't die during boot, but died with my USB test case. Unfortunately,
> > > > the platform I'm using is pretty new and only goes as far back as v3.16
> > > > (which I had to backport 11 patches to get it to boot good enough for
> > > > this test).
> > > > 
> > > > I wonder if a corrupt file system could cause such problems... I keep
> > > > seeing EXT4 errors every now and again; considering that this dies in a
> > > > path through VFS, I wonder...
> > > 
> > > I recall hearing of similar things in the past, but must defer to the
> > > FS/VFS experts on this one.
> > 
> > resurrecting this thread. I'm facing the same issues with a brand new
> > filesystem mounted through NFS. The way to reproduce is the same though:
> > using g_mass_storage with either tmpfs or mmc as backing store.
> > 
> > However it seems to die much more frequently than before. I can
> > reproduce all the time. It's definitely not a problem with my board as I
> > have two boards with different SoCs (ARM Cortex A8 and ARM Cortex A9)
> > with two different USB peripheral controllers (MUSB and DWC3), using the
> > same rootfs and they die the exact same way no matter if I use tmpfs or
> > MMC as backing store.
> > 
> > Adding a few more folks here.
> 
> alright, first stable kernel with Cortex A8 was v3.14. All other kernel
> versions die starting with v3.15 to today's Linus. I'll start bisecting
> now.

Finally bisected it down to commit 139e561660fe11e0fc35e142a800df3dd7d03e9d
(lib: radix_tree: tree node interface). Here's full bisect log:

git bisect start
# good: [455c6fdbd219161bd09b1165f11699d6d73de11c] Linux 3.14
git bisect good 455c6fdbd219161bd09b1165f11699d6d73de11c
# bad: [1860e379875dfe7271c649058aeddffe5afd9d0d] Linux 3.15
git bisect bad 1860e379875dfe7271c649058aeddffe5afd9d0d
# bad: [74a475acea49459721ae4b062d3da68c74259009] SubmittingPatches: add style 
recommendation to use imperative descriptions
git bisect bad 74a475acea49459721ae4b062d3da68c74259009
# good: [c12e69c6aaf785fd307d05cb6f36ca0e7577ead7] Merge tag 'staging-3.15-rc1' 
of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
git bisect good c12e69c6aaf785fd307d05cb6f36ca0e7577ead7
# good: [0fc31966035d7a540c011b6c967ce8eae1db121b] Merge branch 'for-davem' of 
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
git bisect good 0fc31966035d7a540c011b6c967ce8eae1db121b
# good: [bdfc7cbdeef8cadba0e5793079ac0130b8e2220c] Merge branch 
'mips-for-linux-next' of git://git.linux-mips.org/pub/scm/ralf/upstream-sfr
git bisect good bdfc7cbdeef8cadba0e5793079ac0130b8e2220c
# good: [0f1b1e6d73cb989ce2c071edc57deade3b084dfe] Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
git bisect good 0f1b1e6d73cb989ce2c071edc57deade3b084dfe
# good: [181e7d5d7bd7747e882e3ca89ecbf0fc3e72d0da] ixgbe: remove redundant if 
clause from PTP work
git bisect good 181e7d5d7bd7747e882e3ca89ecbf0fc3e72d0da
# good: [59ecc26004e77e100c700b1d0da7502b0fdadb46] Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
git bisect good 59ecc26004e77e100c700b1d0da7502b0fdadb46
# good: [2b665e276c15ba7d9fc8cdd16931883a51ed13e4] fs/direct-io.c: remove 
redundant comparison
git bisect good 2b665e276c15ba7d9fc8cdd16931883a51ed13e4
# bad: [f412c97abef71026d8192ca8efca231f1e3906b3] mm, hugetlb: mark some 
bootstrap functions as __init
git bisect bad f412c97abef71026d8192ca8efca231f1e3906b3
# good: [4e35f483850ba46b838adfd312b3052416e15204] mm, hugetlb: use 
vma_resv_map() map types
git bisect good 4e35f483850ba46b838adfd312b3052416e15204
# good: [6dbaf22ce1f1dfba33313198eb5bd989ae76dd87] mm: shmem: save one radix 
tree lookup when truncating swapped pages
git bisect good 6dbaf22ce1f1dfba33313198eb5bd989ae76dd87
# good: [91b0abe36a7b2b3b02d7500925a5f8455334f0e5] mm + fs: store shadow 
entries in page cache
git bisect good 91b0abe36a7b2b3b02d7500925a5f8455334f0e5
# bad: [139e561660fe11e0fc35e142a800df3dd7d03e9d] lib: radix_tree: tree node 
interface
git bisect bad 139e561660fe11e0fc35e142a800df3dd7d03e9d
# good: [a528910e12ec7ee203095eb1711468a66b9b60b0] mm: thrash detection-based 
file cache sizing
git bisect good a528910e12ec7ee203095eb1711468a66b9b60b0
# first bad commit: [139e561660fe11e0fc35e142a800df3dd7d03e9d] lib: radix_tree: 
tree node interface

I tried reverting that commit on v3.15 but it's non-trivial; I'll leave
that for tomorrow. Meanwhile, adding folks involved with that commit to
Cc list and another backtrace for reference:

[  113.696647] Unable to handle kernel paging request at virtual address 

[  113.704370] pgd = c0004000
[  113.707276] [] *pgd=9fef6821, *pte=, *ppte=
[  113.713998] Internal error: Oops: 17 [#1] SMP ARM
[  113.718912] Modules linked in: g_mass_storage usb_f_mass_storage 
libcomposite configfs musb_dsps mu

Re: RCU bug with v3.17-rc3 ?

2014-10-08 Thread Felipe Balbi
Hi,

On Wed, Oct 08, 2014 at 12:13:22PM -0500, Felipe Balbi wrote:
> On Fri, Sep 05, 2014 at 02:32:16PM -0700, Paul E. McKenney wrote:
> > On Thu, Sep 04, 2014 at 03:04:03PM -0500, Felipe Balbi wrote:
> > > Hi,
> > > 
> > > On Thu, Sep 04, 2014 at 02:25:35PM -0500, Felipe Balbi wrote:
> > > > On Thu, Sep 04, 2014 at 12:16:42PM -0700, Paul E. McKenney wrote:
> > > > > On Thu, Sep 04, 2014 at 01:40:21PM -0500, Felipe Balbi wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > I keep triggering the following Oops with -rc3 when writing to the 
> > > > > > mass
> > > > > > storage gadget driver:
> > > > > 
> > > > > v3.17-rc3, correct?
> > > > 
> > > > yup, as in subject ;-)
> > > > 
> > > > > I take it that the test passes on some earlier version?
> > > > 
> > > > about to test v3.14.17.
> > > 
> > > coudln't get v3.14 working on this board but at least v3.16 is also
> > > affected except that on now it happened during boot, I didn't even need
> > > to run my test:
> > > 
> > > [   17.438195] Unable to handle kernel paging request at virtual address 
> > > 
> > > [   17.446109] pgd = ec36
> > > [   17.448947] [] *pgd=ae7f6821, *pte=, *ppte=
> > > [   17.455639] Internal error: Oops: 17 [#1] SMP ARM
> > > [   17.460578] Modules linked in: dwc3(+) udc_core lis3lv02d_i2c 
> > > lis3lv02d input_polldev dwc3_omap matrix_keypad
> > > [   17.471060] CPU: 0 PID: 1381 Comm: accounts-daemon Tainted: G W 
> > > 3.16.0-5-g8a6cdb4 #811
> > > [   17.480735] task: ed716040 ti: ec026000 task.ti: ec026000
> > > [   17.486405] PC is at find_get_entry+0x7c/0x128
> > > [   17.491070] LR is at 0xfffa
> > > [   17.494364] pc : []lr : []psr: a013
> > > [   17.494364] sp : ec027dc8  ip :   fp : ec027dfc
> > > [   17.506384] r10: c0c6f6bc  r9 : 0005  r8 : ecdf22f8
> > > [   17.511860] r7 : ec026008  r6 : 0001  r5 :   r4 : 
> > > [   17.518705] r3 : ec027db4  r2 :   r1 : 0005  r0 : 
> > > [   17.525526] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM 
> > > Segment user
> > > [   17.533007] Control: 10c5387d  Table: ac360059  DAC: 0015
> > > [   17.539020] Process accounts-daemon (pid: 1381, stack limit = 
> > > 0xec026248)
> > > [   17.546151] Stack: (0xec027dc8 to 0xec028000)
> > > [   17.550710] 7dc0:     c0110ad0 
> > > ecdf0b80  ecdf22f4
> > > [   17.559259] 7de0: ecdf22f4  0005  ec027e34 
> > > ec027e00 c0111874 c0110adc
> > > [   17.567824] 7e00: ecdf0b80 c03565b4 ed7165f8 ec3dddf0 ecdf22f4 
> > > 0005 ec3ddd00 0001
> > > [   17.576385] 7e20: ecdf21a0  ec027ebc ec027e38 c0112978 
> > > c0111844  c06af938
> > > [   17.584950] 7e40: ecdf0b70 ecdf0b70 ec027e6c ec027e58 0005 
> > > 0006 0b80 ecdf0b70
> > > [   17.593514] 7e60:  c0163264 ec3dddf0 ec027ee8 ec027ed4 
> > > 0b80 ec027eac ec027e88
> > > [   17.602087] 7e80: c0178d98 c0356590   0002 
> > > 5b80  ec027f78
> > > [   17.610653] 7ea0: ec3ddd00 ed716040 b6cab018  ec027f44 
> > > ec027ec0 c0163264 c0112780
> > > [   17.619202] 7ec0: 0180 0180 ec027efc b6cab018 0180 
> > >   0180
> > > [   17.627772] 7ee0: ec027ecc 0001 ec3ddd00   
> > >  ed716040 
> > > [   17.636371] 7f00:   5b80  0180 
> > >   
> > > [   17.644946] 7f20: b6cab018 ec3ddd00 b6cab018 ec027f78 ec3ddd00 
> > > 0180 ec027f74 ec027f48
> > > [   17.653524] 7f40: c0163a6c c01631cc b6cab018  5b80 
> > >  ec3ddd03 ec3ddd00
> > > [   17.662085] 7f60: 0180 b6cab018 ec027fa4 ec027f78 c0164198 
> > > c01639e0 5b80 
> > > [   17.670658] 7f80: be91badc be91ba50 00044a00 0003 c000f044 
> > > ec026000  ec027fa8
> > > [   17.679222] 7fa0: c000edc0 c0164158 be91badc be91ba50 0008 
> > > b6cab018 0180 be91ba38
> > > [   17.687794] 7fc0: be91badc be91ba50 00044a00 0003 be91bbac 
> > > b6cab008  
> > > [   17.696370] 7fe0: 0020 be91ba40 b6c78e8c b6c78ea8 6010 
> > > 0008 ae7f6821 ae7f6c21
> > > [   17.704956] [] (find_get_entry) from [] 
> > > (pagecache_get_page+0x3c/0x1f4)
> > > [   17.713687] [] (pagecache_get_page) from [] 
> > > (generic_file_read_iter+0x204/0x794)
> > > [   17.723259] [] (generic_file_read_iter) from [] 
> > > (new_sync_read+0xa4/0xcc)
> > > [   17.732185] [] (new_sync_read) from [] 
> > > (vfs_read+0x98/0x158)
> > > [   17.739945] [] (vfs_read) from [] 
> > > (SyS_read+0x4c/0xa0)
> > > [   17.747149] [] (SyS_read) from [] 
> > > (ret_fast_syscall+0x0/0x48)
> > > [   17.754994] Code: e1a01009 eb08ffa9 e350 0a1f (e5904000) 
> > > [   17.761476] ---[ end trace 49c4ed35a1c01157 ]---
> > > 
> > > It seems to be a difficult-to-reproduce race though. On a second boot it
> > > didn't die during boot, but died with my USB test case. Unfortu

Re: RCU bug with v3.17-rc3 ?

2014-10-08 Thread Felipe Balbi
Hi,

On Fri, Sep 05, 2014 at 02:32:16PM -0700, Paul E. McKenney wrote:
> On Thu, Sep 04, 2014 at 03:04:03PM -0500, Felipe Balbi wrote:
> > Hi,
> > 
> > On Thu, Sep 04, 2014 at 02:25:35PM -0500, Felipe Balbi wrote:
> > > On Thu, Sep 04, 2014 at 12:16:42PM -0700, Paul E. McKenney wrote:
> > > > On Thu, Sep 04, 2014 at 01:40:21PM -0500, Felipe Balbi wrote:
> > > > > Hi,
> > > > > 
> > > > > I keep triggering the following Oops with -rc3 when writing to the 
> > > > > mass
> > > > > storage gadget driver:
> > > > 
> > > > v3.17-rc3, correct?
> > > 
> > > yup, as in subject ;-)
> > > 
> > > > I take it that the test passes on some earlier version?
> > > 
> > > about to test v3.14.17.
> > 
> > coudln't get v3.14 working on this board but at least v3.16 is also
> > affected except that on now it happened during boot, I didn't even need
> > to run my test:
> > 
> > [   17.438195] Unable to handle kernel paging request at virtual address 
> > 
> > [   17.446109] pgd = ec36
> > [   17.448947] [] *pgd=ae7f6821, *pte=, *ppte=
> > [   17.455639] Internal error: Oops: 17 [#1] SMP ARM
> > [   17.460578] Modules linked in: dwc3(+) udc_core lis3lv02d_i2c lis3lv02d 
> > input_polldev dwc3_omap matrix_keypad
> > [   17.471060] CPU: 0 PID: 1381 Comm: accounts-daemon Tainted: G W 
> > 3.16.0-5-g8a6cdb4 #811
> > [   17.480735] task: ed716040 ti: ec026000 task.ti: ec026000
> > [   17.486405] PC is at find_get_entry+0x7c/0x128
> > [   17.491070] LR is at 0xfffa
> > [   17.494364] pc : []lr : []psr: a013
> > [   17.494364] sp : ec027dc8  ip :   fp : ec027dfc
> > [   17.506384] r10: c0c6f6bc  r9 : 0005  r8 : ecdf22f8
> > [   17.511860] r7 : ec026008  r6 : 0001  r5 :   r4 : 
> > [   17.518705] r3 : ec027db4  r2 :   r1 : 0005  r0 : 
> > [   17.525526] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM Segment 
> > user
> > [   17.533007] Control: 10c5387d  Table: ac360059  DAC: 0015
> > [   17.539020] Process accounts-daemon (pid: 1381, stack limit = 0xec026248)
> > [   17.546151] Stack: (0xec027dc8 to 0xec028000)
> > [   17.550710] 7dc0:     c0110ad0 ecdf0b80 
> >  ecdf22f4
> > [   17.559259] 7de0: ecdf22f4  0005  ec027e34 ec027e00 
> > c0111874 c0110adc
> > [   17.567824] 7e00: ecdf0b80 c03565b4 ed7165f8 ec3dddf0 ecdf22f4 0005 
> > ec3ddd00 0001
> > [   17.576385] 7e20: ecdf21a0  ec027ebc ec027e38 c0112978 c0111844 
> >  c06af938
> > [   17.584950] 7e40: ecdf0b70 ecdf0b70 ec027e6c ec027e58 0005 0006 
> > 0b80 ecdf0b70
> > [   17.593514] 7e60:  c0163264 ec3dddf0 ec027ee8 ec027ed4 0b80 
> > ec027eac ec027e88
> > [   17.602087] 7e80: c0178d98 c0356590   0002 5b80 
> >  ec027f78
> > [   17.610653] 7ea0: ec3ddd00 ed716040 b6cab018  ec027f44 ec027ec0 
> > c0163264 c0112780
> > [   17.619202] 7ec0: 0180 0180 ec027efc b6cab018 0180  
> >  0180
> > [   17.627772] 7ee0: ec027ecc 0001 ec3ddd00    
> > ed716040 
> > [   17.636371] 7f00:   5b80  0180  
> >  
> > [   17.644946] 7f20: b6cab018 ec3ddd00 b6cab018 ec027f78 ec3ddd00 0180 
> > ec027f74 ec027f48
> > [   17.653524] 7f40: c0163a6c c01631cc b6cab018  5b80  
> > ec3ddd03 ec3ddd00
> > [   17.662085] 7f60: 0180 b6cab018 ec027fa4 ec027f78 c0164198 c01639e0 
> > 5b80 
> > [   17.670658] 7f80: be91badc be91ba50 00044a00 0003 c000f044 ec026000 
> >  ec027fa8
> > [   17.679222] 7fa0: c000edc0 c0164158 be91badc be91ba50 0008 b6cab018 
> > 0180 be91ba38
> > [   17.687794] 7fc0: be91badc be91ba50 00044a00 0003 be91bbac b6cab008 
> >  
> > [   17.696370] 7fe0: 0020 be91ba40 b6c78e8c b6c78ea8 6010 0008 
> > ae7f6821 ae7f6c21
> > [   17.704956] [] (find_get_entry) from [] 
> > (pagecache_get_page+0x3c/0x1f4)
> > [   17.713687] [] (pagecache_get_page) from [] 
> > (generic_file_read_iter+0x204/0x794)
> > [   17.723259] [] (generic_file_read_iter) from [] 
> > (new_sync_read+0xa4/0xcc)
> > [   17.732185] [] (new_sync_read) from [] 
> > (vfs_read+0x98/0x158)
> > [   17.739945] [] (vfs_read) from [] 
> > (SyS_read+0x4c/0xa0)
> > [   17.747149] [] (SyS_read) from [] 
> > (ret_fast_syscall+0x0/0x48)
> > [   17.754994] Code: e1a01009 eb08ffa9 e350 0a1f (e5904000) 
> > [   17.761476] ---[ end trace 49c4ed35a1c01157 ]---
> > 
> > It seems to be a difficult-to-reproduce race though. On a second boot it
> > didn't die during boot, but died with my USB test case. Unfortunately,
> > the platform I'm using is pretty new and only goes as far back as v3.16
> > (which I had to backport 11 patches to get it to boot good enough for
> > this test).
> > 
> > I wonder if a corrupt file system could cause such problems... I keep
> > seeing EXT4