Re: [bug] crash when reading /proc/mounts (was: Re: Linux 2.6.23-rc9 and a heads-up for the 2.6.24 series..)

2007-10-04 Thread Alexey Dobriyan
On 10/4/07, Adrian Bunk <[EMAIL PROTECTED]> wrote:
> On Wed, Oct 03, 2007 at 08:11:05AM -0700, Linus Torvalds wrote:
> >...
> > and btw, there is no question what-so-ever about whether your compiler
> > might be doing a legal optimization - the compiler really is wrong, and is
> > total shit. You need to make a gcc bug-report.
> >...
>
> Ingo can't send a gcc bug-report since gcc 4.0 is no longer supported
> upstream and a 4.1.2 compiler was confirmed to work.

Ingo can upgrade to 4.0.4. :-)

> Our only options are to either stop supporting the broken gcc versions
> as compiler for the kernel or to work around this compiler bug in the
> kernel.

Distro can backport a fix for miscompilation while leaving, say,
__GNUC_MINOR__ intact, so banning version numbers aren't terribly
useful.

Perhaps, someone should write a script/test program to check for known
miscompilations. to cure himself from deprecation disease.

Alexey "make cc_check" Dobriyan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [bug] crash when reading /proc/mounts (was: Re: Linux 2.6.23-rc9 and a heads-up for the 2.6.24 series..)

2007-10-04 Thread Adrian Bunk
On Wed, Oct 03, 2007 at 08:11:05AM -0700, Linus Torvalds wrote:
>...
> and btw, there is no question what-so-ever about whether your compiler 
> might be doing a legal optimization - the compiler really is wrong, and is 
> total shit. You need to make a gcc bug-report.
>...

Ingo can't send a gcc bug-report since gcc 4.0 is no longer supported 
upstream and a 4.1.2 compiler was confirmed to work.

Our only options are to either stop supporting the broken gcc versions 
as compiler for the kernel or to work around this compiler bug in the 
kernel.

>   Linus

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [bug] crash when reading /proc/mounts (was: Re: Linux 2.6.23-rc9 and a heads-up for the 2.6.24 series..)

2007-10-03 Thread Linus Torvalds


On Wed, 3 Oct 2007, Jan Engelhardt wrote:
>
> >When I'm ruler of the universe, it *will* be illegal. I'm just getting a 
> >bit ahead of myself.
> 
> Any time frame when that will happen?

I'm working on it, I'm working on it. I'm just as frustrated as you are. 
It turns out to be a non-trivial problem. 

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [bug] crash when reading /proc/mounts (was: Re: Linux 2.6.23-rc9 and a heads-up for the 2.6.24 series..)

2007-10-03 Thread Jan Engelhardt

On Oct 3 2007 09:09, Linus Torvalds wrote:
>On Wed, 3 Oct 2007, Alan Cox wrote:
>>
>> > and btw, there is no question what-so-ever about whether your compiler 
>> > might be doing a legal optimization - the compiler really is wrong, and is 
>> 
>> Pedant: valid. Almost all optimizations are legal, nobody has yet written
>> laws about compilers. Sorry but I'm forever fixing misuse of the word
>> "illegal" in printks, docs and the like and it gets annoying after a bit.
>
>Heh.
>
>When I'm ruler of the universe, it *will* be illegal. I'm just getting a 
>bit ahead of myself.

Any time frame when that will happen?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [bug] crash when reading /proc/mounts (was: Re: Linux 2.6.23-rc9 and a heads-up for the 2.6.24 series..)

2007-10-03 Thread Linus Torvalds


On Wed, 3 Oct 2007, Alan Cox wrote:
>
> > and btw, there is no question what-so-ever about whether your compiler 
> > might be doing a legal optimization - the compiler really is wrong, and is 
> 
> Pedant: valid. Almost all optimizations are legal, nobody has yet written
> laws about compilers. Sorry but I'm forever fixing misuse of the word
> "illegal" in printks, docs and the like and it gets annoying after a bit.

Heh.

When I'm ruler of the universe, it *will* be illegal. I'm just getting a 
bit ahead of myself.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [bug] crash when reading /proc/mounts (was: Re: Linux 2.6.23-rc9 and a heads-up for the 2.6.24 series..)

2007-10-03 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> 
> * Linus Torvalds <[EMAIL PROTECTED]> wrote:
> 
> > Your compiler generates
> > 
> > movl-16(%ebp),%edx
> > movl(%edx),%edi /* this is _totally_ bogus! */
> > incl%edx
> > movl%edx,-16(%ebp)
> > movl%edi,%ecx
> > testb   %cl,%cl
> > je  ...
> 
> ah, ok.
> 
> > while I get (gcc version 4.1.2 20070925 (Red Hat 4.1.2-28)):
> > 
> > movl-16(%ebp), %eax # p,
> > movzbl  (%eax), %edi#, c/* not bogus! */
> > movl%edi, %edx  # c,
> > testb   %dl, %dl#
> > je  .L64#,
> > incl%eax#
> > movsbl  %dl,%ebx#, D.12414
> > movl%eax, -16(%ebp) #, p
> > 
> > where the difference (apart from doing the increment differently and 
> > different register allocation) is that I have a "movzbl" (correct), 
> > while you have a "movl" (pure and utter crap).
> 
> i'll try with another compiler in a minute.

i just tried:

  gcc version 4.1.2 20070626 (Red Hat 4.1.2-13)

and indeed the crash is gone. So you are completely right, it's a 
compiler bug in 4.0.2 (it's vanilla gcc 4.0.2 built by me, not a distro 
compiler). It should not affect normal kernels too much this bug needs 
CONFIG_DEBUG_PAGEALLOC. (or it needs a _really_ unlucky allocation being 
at the far upper end of RAM - but those are usually taken up by 
boot-time allocations anyway).

i also just re-tried the other config as well - and crash is gone there 
too. (not surprisingly)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [bug] crash when reading /proc/mounts (was: Re: Linux 2.6.23-rc9 and a heads-up for the 2.6.24 series..)

2007-10-03 Thread Linus Torvalds


On Wed, 3 Oct 2007, Ingo Molnar wrote:
> 
> >  - and as a result you get an exception on the *next* page:
> > 
> > BUG: unable to handle kernel paging request at virtual address f2a4
> 
> Hm, are you sure? This is a CONFIG_DEBUG_PAGEALLOC=y kernel, so even a 
> slight overrun of a non-NIL terminated string (as suspected by Al) could 
> run into a non-mapped kernel page. (which would indicate not a compiler 
> bug but use-after free)

I am 100% sure. I can look at the disassembly, and point to the fact that 
your Oops happens on code that is simply totally bogus.

That string is NUL-terminated, which is why the access is to f2a3fffe in 
the first place: we explicitly asked d_path() to create us a string at the 
end of the page (it creates them backwards), so the path string has a NUL 
a the end at address f2a3, which is exactly what we'd expect.

Your compiler really does seem to be total crap.

Do a "make fs/seq_file.s" (and make sure you *disable* CONFIG_DEBUG_INFO 
first, otherwise the result will be unreadable crud), and look at 
seq_path(). It's going to be more readable than the disassembly that I got 
through gdb, but I bet it's going to show it even more clearly.

> i just found another config under which i get similar crashes, config 
> attached. One common theme is CONFIG_DEBUG_FS and DEBUG_PAGEALLOC - and 
> CONFIG_MAC80211_DEBUGFS is not enabled in this one so it's off the hook 
> i think. (the crashes are attached below)

.. of *course* DEBUG_PAGEALLOC is going to be implied in the problem. If 
you don't have DEBUG_PAGEALLOC, you'll never see this, because you'll have 
all pages mapped, and the only page that it could happen to is the very 
last page in memory, and you'll never hit that one in practice.

> (my serial log on this box goes back about 6 months, and that alone 
> shows more than 3500 successful kernel bootups on that particular 
> testsystem, each kernel built by this compiler - and there's another 
> testsystem that i use even more frequently. Despite that, a compiler bug 
> is still possible of course.)

It's not about "possible". It's a fact. Send me your "seq_file.s" output 
for that function to be sure - it *could* be memory corruption that 
changes a "movb" into a "movl", and maybe the compiler did a byte move to 
start with, but quite frankly, that is such a remote possibility that I 
don't consider it realistic.

> BUG: unable to handle kernel paging request at virtual address f6207000
>  printing eip:
> c016ecf1
> *pdpt = 3001
> *pde = 00ac1067
> *pte = 36207000
> Oops:  [#1]
> SMP DEBUG_PAGEALLOC
> Modules linked in:
> CPU:1
> EIP:0060:[]Not tainted VLI
> EFLAGS: 00010297   (2.6.23-rc9 #20)
> EIP is at seq_path+0x60/0xca
> eax: f6206ffe   ebx: c2de0f50   ecx: 002b   edx: f6206ffe
> esi: f6206007   edi: c2dddfb0   ebp: f6503f18   esp: f6503f00
> ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
> Process awk (pid: 1160, ti=f6503000 task=f73a8390 task.ti=f6503000)
> Stack: 0ff9 f6e5cf70 f6206ffe c2dddf80 f6e5cf70 c2dddfb0 f6503f30 
> c016ce40 
>c05d71b5 f6730f38 f6e5cf70 c2dddfb0 f6503f70 c016f05d 0400 
> 08098f18 
>f6730f38 f6e5cf90  0806bc2e 0003 08094320 f6503fb0 
>  
> Call Trace:
>  [] show_trace_log_lvl+0x19/0x2e
>  [] show_stack_log_lvl+0x9d/0xa5
>  [] show_registers+0x1af/0x281
>  [] die+0x11a/0x1e8
>  [] do_page_fault+0x632/0x715
>  [] error_code+0x72/0x80
>  [] show_vfsmnt+0x43/0x120
>  [] seq_read+0xf1/0x269
>  [] vfs_read+0x90/0x10e
>  [] sys_read+0x3f/0x63
>  [] sysenter_past_esp+0x5f/0x89
>  ===
> Code: f0 ff ff 76 77 eb 7a 8b 55 ec 8b 02 89 c2 8b 4d ec 03 51 0c 89 f7 29 c7 
> 89 79 0c 89 f0 29 d0 eb 6c 89 f8 88 06 46 eb 54 8b 55 f0 <8b> 3a 42 89 55 f0 
> 89 f9 84 c9 74 d0 0f be d9 89 da 8b 45 08 e8 

This looks like *exactly* the same thing, except you're in 
"show_vfsmnt()" this time.

Again: the oopsing instruction (8b 3a) is "movl". And again, the address 
is f6206ffe, and it oopses because the (incorrect) 32-bit access will 
touch the next page, so you get a paging request fault on f6207000 - which 
is some *totally* different allocation, and one that isn't mapped because 
it doesn't exist, so DEBUG_PAGE_ALLOC has removed it.

.. and again: exact same thing.

> EIP: [] seq_path+0x60/0xca SS:ESP 0068:f6503f00
> BUG: unable to handle kernel paging request at virtual address f63d1000
> eax: f63d0ffe   ebx: c2de0f50   ecx: 002c   edx: f63d0ffe
> Code: .. <8b> 3a ..

.. and again:

> EIP: [] seq_path+0x60/0xca SS:ESP 0068:f6367f00
> BUG: unable to handle kernel paging request at virtual address f63d1000
> eax: f63d0ffe   ebx: c2de0f50   ecx: 002c   edx: f63d0ffe
> Code: ..  <8b> 3a ..

And I can even tell you exactly what path it is:

 - it's going to be the first path that shows up in the path list, since 
   the seq_file interface will re-use that page, so if you hit it, you'll 
   hit it on the first entry (unless seq_file h

Re: [bug] crash when reading /proc/mounts (was: Re: Linux 2.6.23-rc9 and a heads-up for the 2.6.24 series..)

2007-10-03 Thread Ingo Molnar

* Linus Torvalds <[EMAIL PROTECTED]> wrote:

> Your compiler generates
> 
>   movl-16(%ebp),%edx
>   movl(%edx),%edi /* this is _totally_ bogus! */
>   incl%edx
>   movl%edx,-16(%ebp)
>   movl%edi,%ecx
>   testb   %cl,%cl
>   je  ...

ah, ok.

> while I get (gcc version 4.1.2 20070925 (Red Hat 4.1.2-28)):
> 
> movl-16(%ebp), %eax # p,
> movzbl  (%eax), %edi#, c  /* not bogus! */
> movl%edi, %edx  # c,
> testb   %dl, %dl#
> je  .L64#,
> incl%eax#
> movsbl  %dl,%ebx#, D.12414
> movl%eax, -16(%ebp) #, p
> 
> where the difference (apart from doing the increment differently and 
> different register allocation) is that I have a "movzbl" (correct), 
> while you have a "movl" (pure and utter crap).

i'll try with another compiler in a minute.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [bug] crash when reading /proc/mounts (was: Re: Linux 2.6.23-rc9 and a heads-up for the 2.6.24 series..)

2007-10-03 Thread Linus Torvalds


On Wed, 3 Oct 2007, Linus Torvalds wrote:
> 
>  - the bug happens on this:
> 
>   char c = *p++;
> 
>  - which has been compiled into
> 
>   8b 3a   mov(%edx),%edi

Btw, this definitely doesn't happen for me, either on x86-64 or plain x86. 
The x86 thing I tested was Fedora 8 testing (ie not even some stable 
setup), so I wonder what experimental compiler you have.

Your compiler generates

movl-16(%ebp),%edx
movl(%edx),%edi /* this is _totally_ bogus! */
incl%edx
movl%edx,-16(%ebp)
movl%edi,%ecx
testb   %cl,%cl
je  ...

while I get (gcc version 4.1.2 20070925 (Red Hat 4.1.2-28)):

movl-16(%ebp), %eax # p,
movzbl  (%eax), %edi#, c/* not bogus! */
movl%edi, %edx  # c,
testb   %dl, %dl#
je  .L64#,
incl%eax#
movsbl  %dl,%ebx#, D.12414
movl%eax, -16(%ebp) #, p

where the difference (apart from doing the increment differently and 
different register allocation) is that I have a "movzbl" (correct), while 
you have a "movl" (pure and utter crap).

I *suspect* that the compiler bug is along the lines of:
 (a) start off with movzbl
 (b) notice that the higher bits don't matter, because nobody subsequently 
 uses them
 (c) turn the thing into just a byte move. 
 (d) make the totally incorrect optimization of using a full 32-bit move 
 in order to avoid a partial register access stall

and the thing is, that final optimization can actually speed things up 
(although it can also slow things down for any access that crosses a cache 
sector boundary - 8/16 bytes), but it's seriously bogus, exactly because 
it can cause an invalid access to the three next bytes that may not even 
exist.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [bug] crash when reading /proc/mounts (was: Re: Linux 2.6.23-rc9 and a heads-up for the 2.6.24 series..)

2007-10-03 Thread Alan Cox
> and btw, there is no question what-so-ever about whether your compiler 
> might be doing a legal optimization - the compiler really is wrong, and is 

Pedant: valid. Almost all optimizations are legal, nobody has yet written
laws about compilers. Sorry but I'm forever fixing misuse of the word
"illegal" in printks, docs and the like and it gets annoying after a bit.

> total shit. You need to make a gcc bug-report. Because this is not a 
> question of "the standard is ambiguous", 

Agreed - the standard is not ambiguous here. (For reference the standard
says that a valid pointer must point at an object _OR_ one past the end
of the object (in the latter case it is not dereferencable)). So its a
compiler bug.

Alan


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [bug] crash when reading /proc/mounts (was: Re: Linux 2.6.23-rc9 and a heads-up for the 2.6.24 series..)

2007-10-03 Thread Arjan van de Ven
On Wed, 3 Oct 2007 15:26:01 +0100
Al Viro <[EMAIL PROTECTED]> wrote:

> On Wed, Oct 03, 2007 at 04:08:42PM +0200, Ingo Molnar wrote:
> > > Charming...  So we get d_path() either returning junk or we get 
> > > something that isn't NUL-terminated.  Which one it is?  I.e. what
> > > does p look like and what's in s?
> > 
> > could be use-after-free as well, as CONFIG_PAGEALLOC was enabled.
> 
> Umm...  d_path() had just written there, so use-after-free is not too
> likely to trigger page fault on read immediately afterwards - you'd
> need a pretty tight race to hit it.

I suspect we want the following patch out of general principles; Ingo,
can you see if this one helps?
(if not, it's still worth considering; it looks like we're first
destroying the device object (which holds the name of the directory)
before we unregister the directory... if that fails then we have a mess.

Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]>


--- linux-2.6.23-rc2/net/wireless/core.c~   2007-10-03 08:04:45.0 
-0700
+++ linux-2.6.23-rc2/net/wireless/core.c2007-10-03 08:04:45.0 
-0700
@@ -133,8 +133,8 @@ void wiphy_unregister(struct wiphy *wiph
mutex_unlock(&drv->mtx);
 
list_del(&drv->list);
-   device_del(&drv->wiphy.dev);
debugfs_remove(drv->wiphy.debugfsdir);
+   device_del(&drv->wiphy.dev);
 
mutex_unlock(&cfg80211_drv_mutex);
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [bug] crash when reading /proc/mounts (was: Re: Linux 2.6.23-rc9 and a heads-up for the 2.6.24 series..)

2007-10-03 Thread Linus Torvalds


On Wed, 3 Oct 2007, Ingo Molnar wrote:
> 
> hm, i just triggered the procfs crash below with -rc9 on a testbox. 

You have a terminally buggy piece of shit compiler.

Lookie here:

 - the bug happens on this:

char c = *p++;

 - which has been compiled into

8b 3a   mov(%edx),%edi

   which is a *word* access.

 - the pointer is at the end of a page (very much on purpose):

edx: f2a3fffe   

 - and as a result you get an exception on the *next* page:

BUG: unable to handle kernel paging request at virtual address f2a4

and btw, there is no question what-so-ever about whether your compiler 
might be doing a legal optimization - the compiler really is wrong, and is 
total shit. You need to make a gcc bug-report. Because this is not a 
question of "the standard is ambiguous", this is a question of "the 
compiler turned good code into code that could SIGSEGV in user space too, 
if 'malloc()' happened to return a pointer at the end of an allocation".

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [bug] crash when reading /proc/mounts (was: Re: Linux 2.6.23-rc9 and a heads-up for the 2.6.24 series..)

2007-10-03 Thread Al Viro
On Wed, Oct 03, 2007 at 04:08:42PM +0200, Ingo Molnar wrote:
> > Charming...  So we get d_path() either returning junk or we get 
> > something that isn't NUL-terminated.  Which one it is?  I.e. what does 
> > p look like and what's in s?
> 
> could be use-after-free as well, as CONFIG_PAGEALLOC was enabled.

Umm...  d_path() had just written there, so use-after-free is not too
likely to trigger page fault on read immediately afterwards - you'd
need a pretty tight race to hit it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [bug] crash when reading /proc/mounts (was: Re: Linux 2.6.23-rc9 and a heads-up for the 2.6.24 series..)

2007-10-03 Thread Ingo Molnar
* Al Viro <[EMAIL PROTECTED]> wrote:

> On Wed, Oct 03, 2007 at 10:46:07AM +0200, Ingo Molnar wrote:
> > 
> > hm, i just triggered the procfs crash below with -rc9 on a testbox. 
> > Config attached. It's easy to reproduce it via 'service sshd restart'. 
> > The crash site is:
> > 
> >  (gdb) list *0xc017599d
> >  0xc017599d is in seq_path (fs/seq_file.c:354).
> >  349 if (m->count < m->size) {
> >  350 char *s = m->buf + m->count;
> >  351 char *p = d_path(dentry, mnt, s, m->size - 
> > m->count);
> >  352 if (!IS_ERR(p)) {
> >  353 while (s <= p) {
> >  354 char c = *p++;
> >  355 if (!c) {
> >  356 p = m->buf + m->count;
> >  357 m->count = s - m->buf;
> >  358 return s - p;
> >  (gdb)
> > 
> > any ideas? Fortunately i was able to do an strace of the incident:
> 
> Charming...  So we get d_path() either returning junk or we get 
> something that isn't NUL-terminated.  Which one it is?  I.e. what does 
> p look like and what's in s?

could be use-after-free as well, as CONFIG_PAGEALLOC was enabled.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [bug] crash when reading /proc/mounts (was: Re: Linux 2.6.23-rc9 and a heads-up for the 2.6.24 series..)

2007-10-03 Thread Al Viro
On Wed, Oct 03, 2007 at 10:46:07AM +0200, Ingo Molnar wrote:
> 
> hm, i just triggered the procfs crash below with -rc9 on a testbox. 
> Config attached. It's easy to reproduce it via 'service sshd restart'. 
> The crash site is:
> 
>  (gdb) list *0xc017599d
>  0xc017599d is in seq_path (fs/seq_file.c:354).
>  349 if (m->count < m->size) {
>  350 char *s = m->buf + m->count;
>  351 char *p = d_path(dentry, mnt, s, m->size - m->count);
>  352 if (!IS_ERR(p)) {
>  353 while (s <= p) {
>  354 char c = *p++;
>  355 if (!c) {
>  356 p = m->buf + m->count;
>  357 m->count = s - m->buf;
>  358 return s - p;
>  (gdb)
> 
> any ideas? Fortunately i was able to do an strace of the incident:

Charming...  So we get d_path() either returning junk or we get something
that isn't NUL-terminated.  Which one it is?  I.e. what does p look like
and what's in s?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [bug] crash when reading /proc/mounts (was: Re: Linux 2.6.23-rc9 and a heads-up for the 2.6.24 series..)

2007-10-03 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> -CONFIG_MAC80211_DEBUGFS=y

it's CONFIG_MAC80211_DEBUGFS=y causing the crash.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [bug] crash when reading /proc/mounts (was: Re: Linux 2.6.23-rc9 and a heads-up for the 2.6.24 series..)

2007-10-03 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

>  nodev /debug debugfs rw 0 0
>  ) = 290
>  read(3, "", 4096)   = 0
>  close(3)= 0
> 
> there's nothing particularly interesting in it. (perhaps debugfs)

disabling debugfs makes the crash go away so it's debugfs related. The 
.config delta is below.

Ingo

--- .config.broken.000  2007-10-03 10:28:14.0 +0200
+++ .config.good.0002007-10-03 11:11:18.0 +0200
@@ -85,7 +85,7 @@ CONFIG_MODULE_SRCVERSION_ALL=y
 # CONFIG_KMOD is not set
 CONFIG_BLOCK=y
 # CONFIG_LBD is not set
-CONFIG_BLK_DEV_IO_TRACE=y
+# CONFIG_BLK_DEV_IO_TRACE is not set
 CONFIG_LSF=y
 CONFIG_BLK_DEV_BSG=y
 
@@ -631,7 +631,6 @@ CONFIG_CFG80211=y
 CONFIG_WIRELESS_EXT=y
 CONFIG_MAC80211=y
 CONFIG_MAC80211_LEDS=y
-CONFIG_MAC80211_DEBUGFS=y
 # CONFIG_MAC80211_DEBUG is not set
 CONFIG_IEEE80211=m
 CONFIG_IEEE80211_DEBUG=y
@@ -1689,7 +1688,7 @@ CONFIG_TRACE_IRQFLAGS_SUPPORT=y
 CONFIG_ENABLE_MUST_CHECK=y
 # CONFIG_MAGIC_SYSRQ is not set
 CONFIG_UNUSED_SYMBOLS=y
-CONFIG_DEBUG_FS=y
+# CONFIG_DEBUG_FS is not set
 # CONFIG_HEADERS_CHECK is not set
 CONFIG_DEBUG_KERNEL=y
 CONFIG_DEBUG_SHIRQ=y
@@ -1724,8 +1724,6 @@ CONFIG_FAULT_INJECTION=y
 # CONFIG_FAILSLAB is not set
 CONFIG_FAIL_PAGE_ALLOC=y
 CONFIG_FAIL_MAKE_REQUEST=y
-CONFIG_FAULT_INJECTION_DEBUG_FS=y
-CONFIG_FAULT_INJECTION_STACKTRACE_FILTER=y
 CONFIG_EARLY_PRINTK=y
 CONFIG_DEBUG_STACKOVERFLOW=y
 # CONFIG_DEBUG_STACK_USAGE is not set
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [bug] crash when reading /proc/mounts (was: Re: Linux 2.6.23-rc9 and a heads-up for the 2.6.24 series..)

2007-10-03 Thread Ingo Molnar

update: occasionally the reading of /proc/mounts succeeds, and it's:

 open("/proc/mounts", O_RDONLY|O_LARGEFILE) = 3
 fstat64(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
 read(3, "rootfs / rootfs rw 0 0\n/dev/root"..., 4096) = 290
 write(1, "rootfs / rootfs rw 0 0\n/dev/root"..., 290rootfs / rootfs rw 0 0
 /dev/root / ext3 rw,noatime,nodiratime,data=ordered 0 0
 /proc /proc proc rw 0 0
 /proc/bus/usb /proc/bus/usb usbfs rw 0 0
 /sys /sys sysfs rw 0 0
 /dev/devpts /dev/pts devpts rw 0 0
 /dev/sda2 /home ext3 rw,noatime,nodiratime,data=ordered 0 0
 nodev /debug debugfs rw 0 0
 ) = 290
 read(3, "", 4096)   = 0
 close(3)= 0

there's nothing particularly interesting in it. (perhaps debugfs)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[bug] crash when reading /proc/mounts (was: Re: Linux 2.6.23-rc9 and a heads-up for the 2.6.24 series..)

2007-10-03 Thread Ingo Molnar

hm, i just triggered the procfs crash below with -rc9 on a testbox. 
Config attached. It's easy to reproduce it via 'service sshd restart'. 
The crash site is:

 (gdb) list *0xc017599d
 0xc017599d is in seq_path (fs/seq_file.c:354).
 349 if (m->count < m->size) {
 350 char *s = m->buf + m->count;
 351 char *p = d_path(dentry, mnt, s, m->size - m->count);
 352 if (!IS_ERR(p)) {
 353 while (s <= p) {
 354 char c = *p++;
 355 if (!c) {
 356 p = m->buf + m->count;
 357 m->count = s - m->buf;
 358 return s - p;
 (gdb)

any ideas? Fortunately i was able to do an strace of the incident:

 3247  munmap(0xb7f3e000, 4096)  = 0
 3247  open("/proc/mounts", O_RDONLY|O_LARGEFILE) = 3
 3247  fstat64(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
 3247  mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 
0) = 0xb7f3e000
 3247  read(3,  
 3247  +++ killed by SIGSEGV +++

and doing "cat /proc/mounts" triggers the crash reliably.

Ingo

>
BUG: unable to handle kernel paging request at virtual address f2a4
 printing eip:
c017599d
*pdpt = 1001
*pde = 00aee067
*pte = 32a4
Oops:  [#1]
PREEMPT DEBUG_PAGEALLOC
Modules linked in:
CPU:0
EIP:0060:[]Not tainted VLI
EFLAGS: 00010297   (2.6.23-rc9 #89)
EIP is at seq_path+0x60/0xca
eax: f2a3fffe   ebx: c290c8d4   ecx: f6e341f0   edx: f2a3fffe
esi: f2a3f007   edi: c29097f0   ebp: ec5ddf1c   esp: ec5ddf04
ds: 007b   es: 007b   fs:   gs: 0033  ss: 0068
Process sshd (pid: 2743, ti=ec5dc000 task=f6e341f0 task.ti=ec5dc000)
Stack: 0ff9 c2bf6b40 f2a3fffe c29097c0 c2bf6b40 c29097f0 ec5ddf34 c0173c41 
   c05ffe64 0400 c2bf6b40 c29097f0 ec5ddf74 c0175d2b 0400 b7fa2000 
   f5277600 c2bf6b60  c0109e99 ec5ddf80 0246 c01555e6  
Call Trace:
 [] show_trace_log_lvl+0x19/0x2e
 [] show_stack_log_lvl+0x9b/0xa3
 [] show_registers+0x1c4/0x2e3
 [] die+0x115/0x1e0
 [] do_page_fault+0x808/0x8e1
 [] error_code+0x6a/0x70
 [] show_vfsmnt+0x44/0x11e
 [] seq_read+0xeb/0x25f
 [] vfs_read+0x87/0xe5
 [] sys_read+0x3d/0x61
 [] sysenter_past_esp+0x6b/0xb5
 ===
Code: 89 45 f0 76 77 eb 7a 8b 55 ec 8b 4d ec 89 f7 8b 02 89 c2 03 51 0c 29 c7 
89 f0 89 79 0c 29 d0 eb 6c 89 f8 88 06 46 eb 54 8b 55 f0 <8b> 3a 42 89 55 f0 89 
f9 84 c9 74 d0 8b 45 08 0f be d9 89 da e8 
EIP: [] seq_path+0x60/0xca SS:ESP 0068:ec5ddf04
BUG: unable to handle kernel paging request at virtual address f2a4
 printing eip:
c017599d
*pdpt = 1001
*pde = 00aee067
*pte = 32a4
Oops:  [#2]
PREEMPT DEBUG_PAGEALLOC
Modules linked in:
CPU:0
EIP:0060:[]Tainted: G  D VLI
EFLAGS: 00010297   (2.6.23-rc9 #89)
EIP is at seq_path+0x60/0xca
eax: f2a3fffe   ebx: c290c8d4   ecx: c02be275   edx: f2a3fffe
esi: f2a3f007   edi: c29097f0   ebp: ef2b7f1c   esp: ef2b7f04
ds: 007b   es: 007b   fs:   gs: 0033  ss: 0068
Process sshd (pid: 2744, ti=ef2b6000 task=f6e5cce0 task.ti=ef2b6000)
Stack: 0ff9 c2bf6b40 f2a3fffe c29097c0 c2bf6b40 c29097f0 ef2b7f34 c0173c41 
   c05ffe64 0400 c2bf6b40 c29097f0 ef2b7f74 c0175d2b 0400 b7f09000 
   f7375240 c2bf6b60  0073 ef2b7f80 0246 c01555e6  
Call Trace:
 [] show_trace_log_lvl+0x19/0x2e
 [] show_stack_log_lvl+0x9b/0xa3
 [] show_registers+0x1c4/0x2e3
 [] die+0x115/0x1e0
 [] do_page_fault+0x808/0x8e1
 [] error_code+0x6a/0x70
 [] show_vfsmnt+0x44/0x11e
 [] seq_read+0xeb/0x25f
 [] vfs_read+0x87/0xe5
 [] sys_read+0x3d/0x61
 [] sysenter_past_esp+0x6b/0xb5
 ===
Code: 89 45 f0 76 77 eb 7a 8b 55 ec 8b 4d ec 89 f7 8b 02 89 c2 03 51 0c 29 c7 
89 f0 89 79 0c 29 d0 eb 6c 89 f8 88 06 46 eb 54 8b 55 f0 <8b> 3a 42 89 55 f0 89 
f9 84 c9 74 d0 8b 45 08 0f be d9 89 da e8 
EIP: [] seq_path+0x60/0xca SS:ESP 0068:ef2b7f04

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.23-rc9
# Wed Oct  3 10:35:03 2007
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_SWAP=y
# CONFIG_SYSVIPC is not set
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
#