Re: [PATCH] x86: optionally show last exception from/to register contents

2007-08-13 Thread Keith Owens
Andi Kleen (on Mon, 13 Aug 2007 15:08:45 +0200) wrote:
>On Mon, Aug 13, 2007 at 12:33:05PM +0100, Jan Beulich wrote:
>
>
>>  
>>  if (cpu_has_ds) {
>>  unsigned int l1;
>> --- linux-2.6.23-rc3/arch/i386/kernel/traps.c2007-08-13 
>> 08:59:45.0 +0200
>> +++ 2.6.23-rc3-x86-ler/arch/i386/kernel/traps.c  2007-08-07 
>> 10:42:55.0 +0200
>> @@ -321,6 +321,13 @@ void show_registers(struct pt_regs *regs
>>  unsigned int code_len = code_bytes;
>>  unsigned char c;
>>  
>> +if (__get_cpu_var(ler_msr)) {
>> +u32 from, to, hi;
>> +
>> +rdmsr(__get_cpu_var(ler_msr), from, hi);
>> +rdmsr(__get_cpu_var(ler_msr) + 1, to, hi);
>> +printk("LER: %08x -> %08x\n", from, to);
>> +}
>
>This seems racy -- AFAIK the MSR will record the last branch
>before an interrupt too, and the trap handlers enable interrupts
>before coming here. 
>
>Can't think of a good way to avoid that for page fault at least
>without impacting interrupt latency or reading the MSR always.

KDB used to have a "last branch recording" (lbr) feature.  The page
fault handler was modified to disable lbr before entering
do_page_fault().  Nobody seemed to care about the slight slowdown but
also nobody seemed to be using that feature for debugging, we rarely
get wild branches into the middle of nowhere.  lbr was messy to
maintain for very little gain, so I removed it from KDB at
2.6.17-i386-2.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Unbalanced stack usage in arch/i386/math-emu/wm_sqrt.S

2007-08-13 Thread Keith Owens
Originally sent to the maintainer of the i386 math-emu code
([EMAIL PROTECTED]) but that mail was bounced[1].  Is anybody
maintaining the math-emu code and do we even care about it anymore?

I am doing static code analysis on the kernel and have found a stack
imbalance in arch/i386/math-emu/wm_sqrt.S.  2.6.23-rc2, but it has
probably been there for a while.

The code starts off with

pushl   %ebp
movl%esp,%ebp
#ifndef NON_REENTRANT_FPU
subl$28,%esp
#endif /* NON_REENTRANT_FPU */
pushl   %esi
pushl   %edi
pushl   %ebx

At this point, the code is using 0x2c bytes of stack space.

 do some work

sqrt_stage_2_finish:
sarl$1,%ecx /* divide by 2 */
rcrl$1,%eax

/* Form the new estimate in %esi:%edi */
movl%eax,%edi
addl%ecx,%esi

jnz sqrt_stage_2_done   /* result should be [1..2) */

... still using 0x2c bytes of stack space

#ifdef PARANOID
/* It should be possible to get here only if the arg is  */
cmp $0x,FPU_fsqrt_arg_1
jnz sqrt_stage_2_error
#endif /* PARANOID */

/* The best rounded result. */
xorl%eax,%eax
decl%eax
movl%eax,%edi
movl%eax,%esi
movl$0x7fff,%eax
jmp sqrt_round_result

#ifdef PARANOID
sqrt_stage_2_error:

 0x2c bytes of stack space

pushl   EX_INTERNAL|0x213

 0x30 bytes of stack space

callEXCEPTION

 EXCEPTION is FPU_exception which only aborts if __DEBUG__ is
defined, __DEBUG__ is not defined.  So FPU_exception will return
and we still have 0x30 bytes of stack used.  But the code drops
through to sqrt_stage_2_done which (like the rest of the code) only
expects 0x2c bytes of stack ===>  stack imbalance.

#endif /* PARANOID */ 

sqrt_stage_2_done:


The obvious fix is to add 'pop %eax' after 'call EXCEPTION' which will
remove the extra word from the stack.  Alas that only fixes the stack
imbalance, but does it even make sense for the code to continue after
calling EXCEPTION?


[1] <[EMAIL PROTECTED]>: host suburbia.com.au[203.24.247.1] said: 554
<[EMAIL PROTECTED]>: Recipient address rejected: Access denied (in
reply to RCPT TO command)

The URL listed in MAINTAINERS for FPU Emulator gets a 404 as well.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Unbalanced stack usage in arch/i386/math-emu/wm_sqrt.S

2007-08-13 Thread Keith Owens
Originally sent to the maintainer of the i386 math-emu code
([EMAIL PROTECTED]) but that mail was bounced[1].  Is anybody
maintaining the math-emu code and do we even care about it anymore?

I am doing static code analysis on the kernel and have found a stack
imbalance in arch/i386/math-emu/wm_sqrt.S.  2.6.23-rc2, but it has
probably been there for a while.

The code starts off with

pushl   %ebp
movl%esp,%ebp
#ifndef NON_REENTRANT_FPU
subl$28,%esp
#endif /* NON_REENTRANT_FPU */
pushl   %esi
pushl   %edi
pushl   %ebx

At this point, the code is using 0x2c bytes of stack space.

 do some work

sqrt_stage_2_finish:
sarl$1,%ecx /* divide by 2 */
rcrl$1,%eax

/* Form the new estimate in %esi:%edi */
movl%eax,%edi
addl%ecx,%esi

jnz sqrt_stage_2_done   /* result should be [1..2) */

... still using 0x2c bytes of stack space

#ifdef PARANOID
/* It should be possible to get here only if the arg is  */
cmp $0x,FPU_fsqrt_arg_1
jnz sqrt_stage_2_error
#endif /* PARANOID */

/* The best rounded result. */
xorl%eax,%eax
decl%eax
movl%eax,%edi
movl%eax,%esi
movl$0x7fff,%eax
jmp sqrt_round_result

#ifdef PARANOID
sqrt_stage_2_error:

 0x2c bytes of stack space

pushl   EX_INTERNAL|0x213

 0x30 bytes of stack space

callEXCEPTION

 EXCEPTION is FPU_exception which only aborts if __DEBUG__ is
defined, __DEBUG__ is not defined.  So FPU_exception will return
and we still have 0x30 bytes of stack used.  But the code drops
through to sqrt_stage_2_done which (like the rest of the code) only
expects 0x2c bytes of stack ===  stack imbalance.

#endif /* PARANOID */ 

sqrt_stage_2_done:


The obvious fix is to add 'pop %eax' after 'call EXCEPTION' which will
remove the extra word from the stack.  Alas that only fixes the stack
imbalance, but does it even make sense for the code to continue after
calling EXCEPTION?


[1] [EMAIL PROTECTED]: host suburbia.com.au[203.24.247.1] said: 554
[EMAIL PROTECTED]: Recipient address rejected: Access denied (in
reply to RCPT TO command)

The URL listed in MAINTAINERS for FPU Emulator gets a 404 as well.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: optionally show last exception from/to register contents

2007-08-13 Thread Keith Owens
Andi Kleen (on Mon, 13 Aug 2007 15:08:45 +0200) wrote:
On Mon, Aug 13, 2007 at 12:33:05PM +0100, Jan Beulich wrote:


  
  if (cpu_has_ds) {
  unsigned int l1;
 --- linux-2.6.23-rc3/arch/i386/kernel/traps.c2007-08-13 
 08:59:45.0 +0200
 +++ 2.6.23-rc3-x86-ler/arch/i386/kernel/traps.c  2007-08-07 
 10:42:55.0 +0200
 @@ -321,6 +321,13 @@ void show_registers(struct pt_regs *regs
  unsigned int code_len = code_bytes;
  unsigned char c;
  
 +if (__get_cpu_var(ler_msr)) {
 +u32 from, to, hi;
 +
 +rdmsr(__get_cpu_var(ler_msr), from, hi);
 +rdmsr(__get_cpu_var(ler_msr) + 1, to, hi);
 +printk(LER: %08x - %08x\n, from, to);
 +}

This seems racy -- AFAIK the MSR will record the last branch
before an interrupt too, and the trap handlers enable interrupts
before coming here. 

Can't think of a good way to avoid that for page fault at least
without impacting interrupt latency or reading the MSR always.

KDB used to have a last branch recording (lbr) feature.  The page
fault handler was modified to disable lbr before entering
do_page_fault().  Nobody seemed to care about the slight slowdown but
also nobody seemed to be using that feature for debugging, we rarely
get wild branches into the middle of nowhere.  lbr was messy to
maintain for very little gain, so I removed it from KDB at
2.6.17-i386-2.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 09/23] Add cmpxchg_local, cmpxchg64 and cmpxchg64_local to ia64

2007-08-12 Thread Keith Owens
Mathieu Desnoyers (on Sun, 12 Aug 2007 10:54:43 -0400) wrote:
>Add the primitives cmpxchg_local, cmpxchg64 and cmpxchg64_local to ia64. They
>use cmpxchg_acq as underlying macro, just like the already existing ia64
>cmpxchg().
>
>Signed-off-by: Mathieu Desnoyers <[EMAIL PROTECTED]>
>CC: [EMAIL PROTECTED]
>CC: [EMAIL PROTECTED]
>---
> include/asm-ia64/intrinsics.h |4 
> 1 file changed, 4 insertions(+)
>
>Index: linux-2.6-lttng/include/asm-ia64/intrinsics.h
>===
>--- linux-2.6-lttng.orig/include/asm-ia64/intrinsics.h 2007-07-20 
>18:36:09.0 -0400
>+++ linux-2.6-lttng/include/asm-ia64/intrinsics.h  2007-07-20 
>19:29:17.0 -0400
>@@ -158,6 +158,10 @@ extern long ia64_cmpxchg_called_with_bad
> 
> /* for compatibility with other platforms: */
> #define cmpxchg(ptr,o,n)  cmpxchg_acq(ptr,o,n)
>+#define cmpxchg_local(ptr,o,n)cmpxchg_acq(ptr,o,n)
>+
>+#define cmpxchg64(ptr,o,n)cmpxchg_acq(ptr,o,n)
>+#define cmpxchg64_local(ptr,o,n)  cmpxchg_acq(ptr,o,n)

As a matter of coding style, I prefer

#define cmpxchg_local   cmpxchg
#define cmpxchg64_local cmpxchg64

Which makes it absolutely clear that they are the same code.  With your
patch, humans have to do a string compare of two defines to see if they
are the same.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 09/23] Add cmpxchg_local, cmpxchg64 and cmpxchg64_local to ia64

2007-08-12 Thread Keith Owens
Mathieu Desnoyers (on Sun, 12 Aug 2007 10:54:43 -0400) wrote:
Add the primitives cmpxchg_local, cmpxchg64 and cmpxchg64_local to ia64. They
use cmpxchg_acq as underlying macro, just like the already existing ia64
cmpxchg().

Signed-off-by: Mathieu Desnoyers [EMAIL PROTECTED]
CC: [EMAIL PROTECTED]
CC: [EMAIL PROTECTED]
---
 include/asm-ia64/intrinsics.h |4 
 1 file changed, 4 insertions(+)

Index: linux-2.6-lttng/include/asm-ia64/intrinsics.h
===
--- linux-2.6-lttng.orig/include/asm-ia64/intrinsics.h 2007-07-20 
18:36:09.0 -0400
+++ linux-2.6-lttng/include/asm-ia64/intrinsics.h  2007-07-20 
19:29:17.0 -0400
@@ -158,6 +158,10 @@ extern long ia64_cmpxchg_called_with_bad
 
 /* for compatibility with other platforms: */
 #define cmpxchg(ptr,o,n)  cmpxchg_acq(ptr,o,n)
+#define cmpxchg_local(ptr,o,n)cmpxchg_acq(ptr,o,n)
+
+#define cmpxchg64(ptr,o,n)cmpxchg_acq(ptr,o,n)
+#define cmpxchg64_local(ptr,o,n)  cmpxchg_acq(ptr,o,n)

As a matter of coding style, I prefer

#define cmpxchg_local   cmpxchg
#define cmpxchg64_local cmpxchg64

Which makes it absolutely clear that they are the same code.  With your
patch, humans have to do a string compare of two defines to see if they
are the same.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Smack: Simplified Mandatory Access Control Kernel

2007-08-11 Thread Keith Owens
Casey Schaufler (on Sat, 11 Aug 2007 10:57:31 -0700) wrote:
>Smack is the Simplified Mandatory Access Control Kernel.
>
> [snip]
>
>Smack defines and uses these labels:
>
>"*" - pronounced "star"
>"_" - pronounced "floor"
>"^" - pronounced "hat"
>"?" - pronounced "huh"
>
>The access rules enforced by Smack are, in order:
>
>1. Any access requested by a task labeled "*" is denied.
>2. A read or execute access requested by a task labeled "^"
>   is permitted.
>3. A read or execute access requested on an object labeled "_"
>   is permitted.
>4. Any access requested on an object labeled "*" is permitted.
>5. Any access requested by a task on an object with the same
>   label is permitted.
>6. Any access requested that is explicitly defined in the loaded
>   rule set is permitted.
>7. Any other access is denied.

Some security systems that have the concept of "no default access"
(task labeled "*") also allow access by those tasks but only if there
is an explicit rule giving access to the task.  IOW, rule 6 is applied
before rule 1.  In my experience this simplifies special cases where a
task should only have access to a very small set of resources.  I'm
curious why smack goes the other way?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Smack: Simplified Mandatory Access Control Kernel

2007-08-11 Thread Keith Owens
Casey Schaufler (on Sat, 11 Aug 2007 12:56:42 -0700 (PDT)) wrote:
>
>--- Arjan van de Ven <[EMAIL PROTECTED]> wrote:
>> > +#include 
>> > +#include 
>> > +#include 
>> > +#include 
>> > +#include 
>> > +#include "../../net/netlabel/netlabel_domainhash.h"
>> 
>> can't you move this header to include/ instead?
>
>Paul Moore, the developer of netlabel, promised to work out
>the right solution for this with me at a future date. He
>doesn't want to move the header, and I respect that.

foo.c has

#include "netlabel_domainhash.h"

Makefile has CFLAGS_foo.o += -I$(srctree)/net/netlabel

I prefer to use -I $(srctree)/net/netlabel for readability but '-I '
breaks on SuSE builds for some reason that I cannot be bothered working
out.  -I$(srctree)/net/netlabel works.

>> > +  doip = kmalloc(sizeof(struct cipso_v4_doi), GFP_KERNEL);
>> > +  if (doip == NULL)
>> > +  panic("smack:  Failed to initialize cipso DOI.\n");
>> > +  doip->map.std = NULL;
>> > +
>> > +  ndmp = kmalloc(sizeof(struct netlbl_dom_map), GFP_KERNEL);
>> > +  if (ndmp == NULL)
>> > +  panic("smack:  Failed to initialize cipso ndmp.\n");
>> 
>> 
>> is panic() really the right thing here? It's usually considered quite
>> rude ;)
>
>It's really early in start-up and if you're out of memory at that
>point you are not going very far into the future.

Not to mention that you might end up running with an insecure system.
Security must be failsafe.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Smack: Simplified Mandatory Access Control Kernel

2007-08-11 Thread Keith Owens
Casey Schaufler (on Sat, 11 Aug 2007 12:56:42 -0700 (PDT)) wrote:

--- Arjan van de Ven [EMAIL PROTECTED] wrote:
  +#include linux/kernel.h
  +#include linux/vmalloc.h
  +#include linux/security.h
  +#include linux/mutex.h
  +#include net/netlabel.h
  +#include ../../net/netlabel/netlabel_domainhash.h
 
 can't you move this header to include/ instead?

Paul Moore, the developer of netlabel, promised to work out
the right solution for this with me at a future date. He
doesn't want to move the header, and I respect that.

foo.c has

#include netlabel_domainhash.h

Makefile has CFLAGS_foo.o += -I$(srctree)/net/netlabel

I prefer to use -I $(srctree)/net/netlabel for readability but '-I '
breaks on SuSE builds for some reason that I cannot be bothered working
out.  -I$(srctree)/net/netlabel works.

  +  doip = kmalloc(sizeof(struct cipso_v4_doi), GFP_KERNEL);
  +  if (doip == NULL)
  +  panic(smack:  Failed to initialize cipso DOI.\n);
  +  doip-map.std = NULL;
  +
  +  ndmp = kmalloc(sizeof(struct netlbl_dom_map), GFP_KERNEL);
  +  if (ndmp == NULL)
  +  panic(smack:  Failed to initialize cipso ndmp.\n);
 
 
 is panic() really the right thing here? It's usually considered quite
 rude ;)

It's really early in start-up and if you're out of memory at that
point you are not going very far into the future.

Not to mention that you might end up running with an insecure system.
Security must be failsafe.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Smack: Simplified Mandatory Access Control Kernel

2007-08-11 Thread Keith Owens
Casey Schaufler (on Sat, 11 Aug 2007 10:57:31 -0700) wrote:
Smack is the Simplified Mandatory Access Control Kernel.

 [snip]

Smack defines and uses these labels:

* - pronounced star
_ - pronounced floor
^ - pronounced hat
? - pronounced huh

The access rules enforced by Smack are, in order:

1. Any access requested by a task labeled * is denied.
2. A read or execute access requested by a task labeled ^
   is permitted.
3. A read or execute access requested on an object labeled _
   is permitted.
4. Any access requested on an object labeled * is permitted.
5. Any access requested by a task on an object with the same
   label is permitted.
6. Any access requested that is explicitly defined in the loaded
   rule set is permitted.
7. Any other access is denied.

Some security systems that have the concept of no default access
(task labeled *) also allow access by those tasks but only if there
is an explicit rule giving access to the task.  IOW, rule 6 is applied
before rule 1.  In my experience this simplifies special cases where a
task should only have access to a very small set of resources.  I'm
curious why smack goes the other way?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Handling kernel stack overflows

2007-08-04 Thread Keith Owens
Eric W. Biederman (on Fri, 03 Aug 2007 06:36:23 -0600) wrote:
>
>Well we currently keep a struct thread_info on the stack
>which while not as bad as task_struct has it's own uses
>and implications which may limit what you are trying
>to do.

Not an issue.  We already copy struct thread_info when switching stacks
on i386 with 4K stacks.  x86_64 always has multiple stacks, but uses
thread_info in the first stack, even on a stack switch.

>That said a function like:
>
>int call_on_new_stack(int (*continuation)(void *), void *closure)
>{
>   struct task_struct *tsk;
>   struct thread_info *ti;
>
>   if (plenty_of_stack_space())
>   return continuation(closure);
>
>   tsk = current();
>   ti = alloc_thread_info(tsk);
>   if (!ti)
>   return -ENOMEM;
>
>   setup_extra_thread_info(tsk, ti, continuation, closure);
>   schedule();
>}
>
>Might make sense.  Last I heard the block layer and xfs seemed
>to have largely solved their problems with running short on stack
>space, so I don't know if it is necessary but it certainly sounds
>relatively simple and interesting.

Solved for simple configurations.  But when you start nesting
flesystems, especially if a network filesystem is involved, then the
small amounts of stack used by each subsystem add up fast.

>Running short on stack space is a recurring theme so a function that
>allows us to have a little more when we really need it and be able to
>switch even x86_64 to 4K stacks would be interesting.

4K stacks on x86_64 are not a real option.  If i386 needs 4K then
x86_64 needs twice as much just to handle the doubling of pointers and
saved arguments.

>I'm not quite certain where we could insert calls to call_on_new_stack,

It has to be at a point that can return an error.  The best place is at
the start of the block and VFS layers, if those layers can detect that
they are about to do a nested call.  For example, crypto over loopback
over ext3 over NFS over software raid over SCSI.  The loopback, ext3
and NFS layers are nested filesystems calls, crypto is not.  The SCSI
layer is a nested block layer call, software raid is not.  So loopback,
ext3, NFS and SCSI would switch to a new stack if necessary.

>but it looks simple enough that it is worth coding up and playing
>with.

I have just resigned and will be taking a long break away from
computers.  Feel free to play with the idea, otherwise I will look at
it again sometime in October.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Handling kernel stack overflows

2007-08-04 Thread Keith Owens
Eric W. Biederman (on Fri, 03 Aug 2007 06:36:23 -0600) wrote:

Well we currently keep a struct thread_info on the stack
which while not as bad as task_struct has it's own uses
and implications which may limit what you are trying
to do.

Not an issue.  We already copy struct thread_info when switching stacks
on i386 with 4K stacks.  x86_64 always has multiple stacks, but uses
thread_info in the first stack, even on a stack switch.

That said a function like:

int call_on_new_stack(int (*continuation)(void *), void *closure)
{
   struct task_struct *tsk;
   struct thread_info *ti;

   if (plenty_of_stack_space())
   return continuation(closure);

   tsk = current();
   ti = alloc_thread_info(tsk);
   if (!ti)
   return -ENOMEM;

   setup_extra_thread_info(tsk, ti, continuation, closure);
   schedule();
}

Might make sense.  Last I heard the block layer and xfs seemed
to have largely solved their problems with running short on stack
space, so I don't know if it is necessary but it certainly sounds
relatively simple and interesting.

Solved for simple configurations.  But when you start nesting
flesystems, especially if a network filesystem is involved, then the
small amounts of stack used by each subsystem add up fast.

Running short on stack space is a recurring theme so a function that
allows us to have a little more when we really need it and be able to
switch even x86_64 to 4K stacks would be interesting.

4K stacks on x86_64 are not a real option.  If i386 needs 4K then
x86_64 needs twice as much just to handle the doubling of pointers and
saved arguments.

I'm not quite certain where we could insert calls to call_on_new_stack,

It has to be at a point that can return an error.  The best place is at
the start of the block and VFS layers, if those layers can detect that
they are about to do a nested call.  For example, crypto over loopback
over ext3 over NFS over software raid over SCSI.  The loopback, ext3
and NFS layers are nested filesystems calls, crypto is not.  The SCSI
layer is a nested block layer call, software raid is not.  So loopback,
ext3, NFS and SCSI would switch to a new stack if necessary.

but it looks simple enough that it is worth coding up and playing
with.

I have just resigned and will be taking a long break away from
computers.  Feel free to play with the idea, otherwise I will look at
it again sometime in October.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] add kdump_after_notifier

2007-08-03 Thread Keith Owens
Andrew Morton (on Thu, 2 Aug 2007 23:25:02 -0700) wrote:
>On Fri, 03 Aug 2007 14:05:47 +1000 Keith Owens <[EMAIL PROTECTED]> wrote:

Switching to [EMAIL PROTECTED], I just resigned from SGI.

>> I have pretty well given up on RAS code in the Linux kernel.  Everybody
>> has different ideas, there is no overall plan and little interest from
>> Linus in getting RAS tools into the kernel.  We are just thrashing.
>
>Lots of different groups, little commonality in their desired funtionality,
>little interest in sharing infrastructure or concepts.  Sometimes people
>need a bit of motivational help.
>
>In this case that motivation would come from the understanding that all the
>RAS tools would be *required* to use such infrastructure if it was merged. 
>Going off and open-coding your own stuff would henceforth not be acceptable.
>If it turns out that it really was unsuitable for a particular group's RAS
>feature, and we merged it anyway, well, that mismatch is that group's
>fault.
>
>It was a sizeable mistake to send those patches to a few obscure mailing
>lists - this is the first I've heard of it, for example.

linux-arch is obscure??  Where else do you send patches that affect
multiple architectures?

>So.  Please, send it all again, copy the correct lists and people, make sure
>that at least one client of the infrastructure is wired up and working 
>(ideally,
>all such in-kernel clients should be wired up) and let's take a look at it.

Already tried that.  The only RAS tool that is currently in the kernel is
kexec/kdump and they insist on doing things their own way.  That makes
it impossible to put a common RAS structure in place, because kexec
will not use it.

Sorry to keep beating on this drum, but kexec insist that their code
must have priority and that they do not trust the rest of the kernel.
Until that changes, there is no point is discussing how to make kexec
coexist with other RAS tools.  If kexec change their mind then we can
look at using a common RAS interface, otherwise it is a waste of time
and I have better things to do with my life.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] add kdump_after_notifier

2007-08-03 Thread Keith Owens
Andrew Morton (on Thu, 2 Aug 2007 23:25:02 -0700) wrote:
On Fri, 03 Aug 2007 14:05:47 +1000 Keith Owens [EMAIL PROTECTED] wrote:

Switching to [EMAIL PROTECTED], I just resigned from SGI.

 I have pretty well given up on RAS code in the Linux kernel.  Everybody
 has different ideas, there is no overall plan and little interest from
 Linus in getting RAS tools into the kernel.  We are just thrashing.

Lots of different groups, little commonality in their desired funtionality,
little interest in sharing infrastructure or concepts.  Sometimes people
need a bit of motivational help.

In this case that motivation would come from the understanding that all the
RAS tools would be *required* to use such infrastructure if it was merged. 
Going off and open-coding your own stuff would henceforth not be acceptable.
If it turns out that it really was unsuitable for a particular group's RAS
feature, and we merged it anyway, well, that mismatch is that group's
fault.

It was a sizeable mistake to send those patches to a few obscure mailing
lists - this is the first I've heard of it, for example.

linux-arch is obscure??  Where else do you send patches that affect
multiple architectures?

So.  Please, send it all again, copy the correct lists and people, make sure
that at least one client of the infrastructure is wired up and working 
(ideally,
all such in-kernel clients should be wired up) and let's take a look at it.

Already tried that.  The only RAS tool that is currently in the kernel is
kexec/kdump and they insist on doing things their own way.  That makes
it impossible to put a common RAS structure in place, because kexec
will not use it.

Sorry to keep beating on this drum, but kexec insist that their code
must have priority and that they do not trust the rest of the kernel.
Until that changes, there is no point is discussing how to make kexec
coexist with other RAS tools.  If kexec change their mind then we can
look at using a common RAS interface, otherwise it is a waste of time
and I have better things to do with my life.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] Handling kernel stack overflows

2007-08-02 Thread Keith Owens
First a bit of background for people who are not familiar with kernel
stack constructs.

* Every process has a dedicated kernel stack.  In this context,
  'process' includes user space processes and threads, plus those
  processes that only exist inside the kernel (e.g. kswapd, xfslogd).

* When a process is sleeping, there is some state in its kernel stack
  to let the scheduler wake it up, that stack does not change until the
  scheduler assigns the task to a cpu.  When a processing is running
  and is scheduled on a cpu, it is actively reading and writing its
  stack.

* Kernel stacks are a fixed size.  Unlike user space stacks, they do
  not expand as required.  Also unlike user space stacks, kernel stacks
  are not swappable.

* Each architecture uses different size kernel stacks (defined by
  THREAD_SIZE).  THREAD_SIZE is typically (always?) a multiple of the
  kernel page size.

* When a kernel stack occupies multiple pages, the pages assigned to
  that stack must be physically contiguous and in memory.

* Kernel stacks are typically aligned on a THREAD_SIZE boundary so
  (stack_pointer & ~(THREAD_SIZE - 1)) gives you the start of the
  stack.

* If you overflow the kernel stack for a process then you corrupt the
  next page, with undefined results.  This usually causes very strange
  kernel oops.

* Historically (up to 2.4/2.5 kernels), 'struct task' for a process was
  embedded in the kernel stack for that process, reducing the usable
  amount of stack space.  Variable 'current' pointed to both the
  'struct task' and the kernel stack.

* In more recent kernels, almost all architectures separate 'struct
  task' from the stack.  'struct task' is created via a slab allocator,
  the stack is created from a page allocator.  'current' now points to
  'struct task' which in turn points to the active kernel stack.

* Separating 'struct task' from the stack means that a single 'struct
  task' can point to different kernel stacks throughout its lifetime.
  This makes it possible to use additional kernel stacks for special
  processing like interrupt handling.

* IA64 is different (isn't it always?).  It is the only architecture
  that still embeds 'struct task' within the kernel stack.  I wish that
  it separated the two, it would make MCA/INIT handling so much easier.
  Alas David Mosberger vetoed that approach for IA64.  On IA64,
  'current' still points to both 'struct task' and the kernel stack,
  making it impossible to use specialized kernel stacks on IA64.

* On i386, kernel stacks were historically 8K in size, with all
  processing being done on that single 8K stack (no additional
  specialized stacks).  On i386 boxes with large numbers of processes,
  it became difficult to kmalloc() a kernel stack when forking a new
  process.  Each new process required two physically contiguous pages,
  starting on an 8K boundary.  i386 boxes would often get into a state
  where there were enough free pages, but they were not contiguous and
  8K aligned.

* Interrupt processing can be separated into hard and soft IRQ
  contexts.  Hard IRQ context is when the kernel is servicing the
  hardware, e.g. talking to a disk controller.  To minimize the amount
  of time that the hardware is disabled, most drivers will grab some
  data from the hardware, store the data in a kernel structure,
  schedule some work to be done later then re-enable the hardware.  When
  the kernel returns from a hard interrupt it checks for any scheduled
  work then runs that work in a soft IRQ context.

* Even when an 8K kernel stack could be allocated, 8K was not always
  enough room to cope with an interrupt arriving while a process was
  active.  Normal processing could be interrupted by a hard IRQ which
  could schedule a soft IRQ which could in turn be interrupted by
  another hard IRQ.  Those three levels of processing all had to fit
  into 8K.

* Enter CONFIG_4KSTACKS for i386.  It recognizes that activity on the
  kernel stack only occurs while the process is running, and that much
  of that activity occurs in response to an interrupt, either in hard
  or soft IRQ context.

* By reducing the per-process stack to 4K, CONFIG_4KSTACKS makes it
  easier to allocate the kernel stack for a new process when the system
  is under heavy memory pressure.

* If an interrupt occurs while a CONFIG_4KSTACKS process is running,
  the kernel retains the current task but switches the stack to a
  separate specialized stack.  There are two additional 4K stacks on
  each cpu, for soft and hard IRQ processing.  The combination of the
  normal process stack plus the soft and hard IRQ stacks gives up to
  12K of stack for an active process, instead of the previous total of
  8K.

* By definition, processes cannot sleep in interrupt context.
  Therefore when a process does sleep, it is guaranteed that the soft
  and hard IRQ stacks on that cpu are not in use.  The process state is
  saved in its dedicated 4K stack and the per cpu IRQ stacks are now
  free 

Re: [patch] add kdump_after_notifier

2007-08-02 Thread Keith Owens
Vivek Goyal (on Thu, 2 Aug 2007 16:58:52 +0530) wrote:
>On Wed, Aug 01, 2007 at 04:00:48AM -0600, Eric W. Biederman wrote:
>> Takenori Nagano <[EMAIL PROTECTED]> writes:
>> 
>> >> No.  The problem with your patch is that it doesn't have a code
>> >> impact.  We need to see who is using this and why.
>> >
>> > My motivation is very simple. I want to use both kdb and kdump, but I 
>> > think it
>> > is too weak to satisfy kexec guys. Then I brought up the example enterprise
>> > software. But it isn't a lie. I know some drivers which use panic_notifier.
>> > IMHO, they use only major distribution, and they has the workaround or they
>> > don't notice this problem yet. I think they will be in trouble if all
>> > distributions choose only kdump.
>> 
>> Possibly.
>> 
>> > BTW, I use kdb and lkcd now, but I want to use kdb and kdump. I sent a 
>> > patch to
>> > kdb community but it was rejected. kdb maintainer Keith Owens said,
>> 
>> >> Both KDB and crash_kexec should be using the panic_notifier_chain, with
>> >> KDB having a higher priority than crash_exec.  The whole point of
>> >> notifier chains is to handle cases like this, so we should not be
>> >> adding more code to the panic routine.
>> >>
>> >> The real problem here is the way that the crash_exec code is hard coded
>> >> into various places instead of using notifier chains.  The same issue
>> >> exists in arch/ia64/kernel/mca.c because of bad coding practices from
>> >> kexec.
>> 
>> I respectfully disagree with his opinion, as using notifier chains
>> assumes more of the kernel works.  Although following it's argument
>> to it's logical conclusion we should call crash_kexec as the very
>> first thing inside of panic.  Given how much state something like
>> bust_spinlocks messes up that might not be a bad idea.
>> 
>> It does make adding an alternative debug mechanism in there difficult.
>> Does anyone know if this also affects kgdb?
>> 
>> > Then I gave up to merge my patch to kdb, and I tried to send another patch 
>> > to
>> > kexec community. I can understand his opinion, but it is very difficult to
>> > modify that kdump is called from panic_notifier. Because it has a reason 
>> > why
>> > kdump don't use panic_notifier. So, I made this patch.
>> >
>> > Please do something about this problem.
>> 
>> Hmm.  Tricky.  These appear to be two code bases with a completely different
>> philosophy on what errors are being avoided.
>> 
>> The kexec on panic assumption is that the kernel is broken and we better not
>> touch it something horrible has gone wrong.  And this is the reason why
>> kexec on panic is replacing lkcd.  Because the strong assumption results
>> in more errors getting captured with less likely hood of messing up your
>> system.
>> 
>> The kdb assumption appears to be that the kernel is mostly ok, and that there
>> are just some specific thing that is wrong.
>> 
>
>Thinking more about it. So basically there are two kind of users. One who
>believe that despite the kernel has crashed  something meaningful can
>be done. In fact kernel also thinks so. That's why we have created
>panic_notifier_list and even exported it to modules and now we have some
>users. These users most of the time do non-disruptive activities and
>can co-exist.
>
>OTOH, we have kexec on panic, which thinks that once kernel is crashed
>nothing meaningful can be done and it is disruptive and can't co-exist
>with other users.
>
>Some thoughts on possible solutions for this problem.
>
>- Stop exporting panic_notifier_list list to modules. Audit the in kernel
>  users of panic_notifier_list. Let crash_kexec() run once all other users
>  of panic_notifier_list have been executed. This has fall side of breaking
>  down external modules using panic_notifier_list and at the same time
>  there is no gurantee that audited code will not run into the issues.
>
>- Continue with existing policy. If kdump is configured, panic_notifier_list
>  notifications will not be invoked. Any post panic action should be executed
>  in second kernel. There might be 1-2 odd cases like in kernel debugger
>  which still needs to be invoked in first kernel. These users should
>  explicitly put hooks in panic() routine and refrain from using
>  panic_notifier list.
>
>  One thing to keep in mind, doing things in second kernel might not be easy
>  as we have lost all the config data of the first kernel. For example,
>  if one wan

Re: [patch] add kdump_after_notifier

2007-08-02 Thread Keith Owens
Vivek Goyal (on Thu, 2 Aug 2007 16:58:52 +0530) wrote:
On Wed, Aug 01, 2007 at 04:00:48AM -0600, Eric W. Biederman wrote:
 Takenori Nagano [EMAIL PROTECTED] writes:
 
  No.  The problem with your patch is that it doesn't have a code
  impact.  We need to see who is using this and why.
 
  My motivation is very simple. I want to use both kdb and kdump, but I 
  think it
  is too weak to satisfy kexec guys. Then I brought up the example enterprise
  software. But it isn't a lie. I know some drivers which use panic_notifier.
  IMHO, they use only major distribution, and they has the workaround or they
  don't notice this problem yet. I think they will be in trouble if all
  distributions choose only kdump.
 
 Possibly.
 
  BTW, I use kdb and lkcd now, but I want to use kdb and kdump. I sent a 
  patch to
  kdb community but it was rejected. kdb maintainer Keith Owens said,
 
  Both KDB and crash_kexec should be using the panic_notifier_chain, with
  KDB having a higher priority than crash_exec.  The whole point of
  notifier chains is to handle cases like this, so we should not be
  adding more code to the panic routine.
 
  The real problem here is the way that the crash_exec code is hard coded
  into various places instead of using notifier chains.  The same issue
  exists in arch/ia64/kernel/mca.c because of bad coding practices from
  kexec.
 
 I respectfully disagree with his opinion, as using notifier chains
 assumes more of the kernel works.  Although following it's argument
 to it's logical conclusion we should call crash_kexec as the very
 first thing inside of panic.  Given how much state something like
 bust_spinlocks messes up that might not be a bad idea.
 
 It does make adding an alternative debug mechanism in there difficult.
 Does anyone know if this also affects kgdb?
 
  Then I gave up to merge my patch to kdb, and I tried to send another patch 
  to
  kexec community. I can understand his opinion, but it is very difficult to
  modify that kdump is called from panic_notifier. Because it has a reason 
  why
  kdump don't use panic_notifier. So, I made this patch.
 
  Please do something about this problem.
 
 Hmm.  Tricky.  These appear to be two code bases with a completely different
 philosophy on what errors are being avoided.
 
 The kexec on panic assumption is that the kernel is broken and we better not
 touch it something horrible has gone wrong.  And this is the reason why
 kexec on panic is replacing lkcd.  Because the strong assumption results
 in more errors getting captured with less likely hood of messing up your
 system.
 
 The kdb assumption appears to be that the kernel is mostly ok, and that there
 are just some specific thing that is wrong.
 

Thinking more about it. So basically there are two kind of users. One who
believe that despite the kernel has crashed  something meaningful can
be done. In fact kernel also thinks so. That's why we have created
panic_notifier_list and even exported it to modules and now we have some
users. These users most of the time do non-disruptive activities and
can co-exist.

OTOH, we have kexec on panic, which thinks that once kernel is crashed
nothing meaningful can be done and it is disruptive and can't co-exist
with other users.

Some thoughts on possible solutions for this problem.

- Stop exporting panic_notifier_list list to modules. Audit the in kernel
  users of panic_notifier_list. Let crash_kexec() run once all other users
  of panic_notifier_list have been executed. This has fall side of breaking
  down external modules using panic_notifier_list and at the same time
  there is no gurantee that audited code will not run into the issues.

- Continue with existing policy. If kdump is configured, panic_notifier_list
  notifications will not be invoked. Any post panic action should be executed
  in second kernel. There might be 1-2 odd cases like in kernel debugger
  which still needs to be invoked in first kernel. These users should
  explicitly put hooks in panic() routine and refrain from using
  panic_notifier list.

  One thing to keep in mind, doing things in second kernel might not be easy
  as we have lost all the config data of the first kernel. For example,
  if one wants to send a kernel crash event over network to a system
  management software, he might have to pack in lot of software in 
  second kernel's initrd.

- Let the user decide if he wants to run panic_notifier_list after the
  crash or not with the help of a /proc option as suggested by the
  Takenori's patch. Fall side is, on what basis an enterprise user will
  take a decision whether he wants to run the notifiers or not. My gut
  feeling is that distro will end up setting this parameter as 1 by default,
  which would mean first run panic notifiers and then run crash_kexec().

- Make crash_kexec() a user of panic_notifier_list and let it run after all
  the callback handlers have run. This will invariably reduce the reliability
  of kdump.  

Personally I believe

[RFC] Handling kernel stack overflows

2007-08-02 Thread Keith Owens
First a bit of background for people who are not familiar with kernel
stack constructs.

* Every process has a dedicated kernel stack.  In this context,
  'process' includes user space processes and threads, plus those
  processes that only exist inside the kernel (e.g. kswapd, xfslogd).

* When a process is sleeping, there is some state in its kernel stack
  to let the scheduler wake it up, that stack does not change until the
  scheduler assigns the task to a cpu.  When a processing is running
  and is scheduled on a cpu, it is actively reading and writing its
  stack.

* Kernel stacks are a fixed size.  Unlike user space stacks, they do
  not expand as required.  Also unlike user space stacks, kernel stacks
  are not swappable.

* Each architecture uses different size kernel stacks (defined by
  THREAD_SIZE).  THREAD_SIZE is typically (always?) a multiple of the
  kernel page size.

* When a kernel stack occupies multiple pages, the pages assigned to
  that stack must be physically contiguous and in memory.

* Kernel stacks are typically aligned on a THREAD_SIZE boundary so
  (stack_pointer  ~(THREAD_SIZE - 1)) gives you the start of the
  stack.

* If you overflow the kernel stack for a process then you corrupt the
  next page, with undefined results.  This usually causes very strange
  kernel oops.

* Historically (up to 2.4/2.5 kernels), 'struct task' for a process was
  embedded in the kernel stack for that process, reducing the usable
  amount of stack space.  Variable 'current' pointed to both the
  'struct task' and the kernel stack.

* In more recent kernels, almost all architectures separate 'struct
  task' from the stack.  'struct task' is created via a slab allocator,
  the stack is created from a page allocator.  'current' now points to
  'struct task' which in turn points to the active kernel stack.

* Separating 'struct task' from the stack means that a single 'struct
  task' can point to different kernel stacks throughout its lifetime.
  This makes it possible to use additional kernel stacks for special
  processing like interrupt handling.

* IA64 is different (isn't it always?).  It is the only architecture
  that still embeds 'struct task' within the kernel stack.  I wish that
  it separated the two, it would make MCA/INIT handling so much easier.
  Alas David Mosberger vetoed that approach for IA64.  On IA64,
  'current' still points to both 'struct task' and the kernel stack,
  making it impossible to use specialized kernel stacks on IA64.

* On i386, kernel stacks were historically 8K in size, with all
  processing being done on that single 8K stack (no additional
  specialized stacks).  On i386 boxes with large numbers of processes,
  it became difficult to kmalloc() a kernel stack when forking a new
  process.  Each new process required two physically contiguous pages,
  starting on an 8K boundary.  i386 boxes would often get into a state
  where there were enough free pages, but they were not contiguous and
  8K aligned.

* Interrupt processing can be separated into hard and soft IRQ
  contexts.  Hard IRQ context is when the kernel is servicing the
  hardware, e.g. talking to a disk controller.  To minimize the amount
  of time that the hardware is disabled, most drivers will grab some
  data from the hardware, store the data in a kernel structure,
  schedule some work to be done later then re-enable the hardware.  When
  the kernel returns from a hard interrupt it checks for any scheduled
  work then runs that work in a soft IRQ context.

* Even when an 8K kernel stack could be allocated, 8K was not always
  enough room to cope with an interrupt arriving while a process was
  active.  Normal processing could be interrupted by a hard IRQ which
  could schedule a soft IRQ which could in turn be interrupted by
  another hard IRQ.  Those three levels of processing all had to fit
  into 8K.

* Enter CONFIG_4KSTACKS for i386.  It recognizes that activity on the
  kernel stack only occurs while the process is running, and that much
  of that activity occurs in response to an interrupt, either in hard
  or soft IRQ context.

* By reducing the per-process stack to 4K, CONFIG_4KSTACKS makes it
  easier to allocate the kernel stack for a new process when the system
  is under heavy memory pressure.

* If an interrupt occurs while a CONFIG_4KSTACKS process is running,
  the kernel retains the current task but switches the stack to a
  separate specialized stack.  There are two additional 4K stacks on
  each cpu, for soft and hard IRQ processing.  The combination of the
  normal process stack plus the soft and hard IRQ stacks gives up to
  12K of stack for an active process, instead of the previous total of
  8K.

* By definition, processes cannot sleep in interrupt context.
  Therefore when a process does sleep, it is guaranteed that the soft
  and hard IRQ stacks on that cpu are not in use.  The process state is
  saved in its dedicated 4K stack and the per cpu IRQ stacks are now
  free 

Re: [PATCH][RFC] getting rid of stupid loop in BUG()

2007-07-24 Thread Keith Owens
Trent Piepho (on Tue, 24 Jul 2007 19:31:36 -0700 (PDT)) wrote:
>Adding __builtin_trap after the
>asm might be an ok fix.  It will emit a spurious int 6, but that won't even be
>reached since the asm doesn't return, and it probably be less extra code than
>the loop.

int 6 is a two byte instruction, the loop generates jmp with an 8 bit
offset, also two bytes.  No change in code size.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] getting rid of stupid loop in BUG()

2007-07-24 Thread Keith Owens
Trent Piepho (on Tue, 24 Jul 2007 19:31:36 -0700 (PDT)) wrote:
Adding __builtin_trap after the
asm might be an ok fix.  It will emit a spurious int 6, but that won't even be
reached since the asm doesn't return, and it probably be less extra code than
the loop.

int 6 is a two byte instruction, the loop generates jmp with an 8 bit
offset, also two bytes.  No change in code size.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kdb: add rdmsr and wrmsr commands for i386

2007-05-17 Thread Keith Owens
Bernardo Innocenti (on Thu, 17 May 2007 02:36:21 -0400) wrote:
>Keith Owens wrote:
>
>> Before using MSR, you must first check that the cpu supports the
>> instruction, rd/wrmsr cause an oops on 486 or earlier.  Also using an
>> invalid msr number causes an oops, so use rd/wrmsr_safe().
>
>I didn't bother implementing those checks because kdb recovers
>nicely from GPF anyway.

Yes and no.  Yes, kdb will recover from a GPF.  No, because if the
system was already running correctly (i.e. manual entry into kdb), then
taking a GPF and not recovering will flag the rest of the system as
corrupt and can kill a running system.  I try to avoid adding spurious
system corruption.

>It's the valid MSR writes that could
>cause unrecoveable problems! :)

Tell me about it :-(

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kdb: add rdmsr and wrmsr commands for i386

2007-05-17 Thread Keith Owens
Bernardo Innocenti (on Tue, 15 May 2007 23:03:55 -0400) wrote:
>Jordan Crouse wrote:
>
>> Can you break this up with a : between the high dword and the low dword?
>> That makes it easier to parse when debugging.
>
>Good idea, but I used "_" instead because it's what AMD uses in their
>documentation and it looks better with a "0x" prefix.
>
>> Also, would it make sense to coordinate the order of the high and low
>> dwords with the order they are specified with 'wrmsr'?  
>
>Yeah, I did it as suggested by Mitch.  Here's a thrid
>revision of the patch with everything included:
>
>From 1850ca76585306e2484cf5e709434049f1df3c1f Mon Sep 17 00:00:00 2001
>From: Bernardo Innocenti <[EMAIL PROTECTED]>
>Date: Tue, 15 May 2007 15:29:48 -0400
>Subject: [PATCH] kdb: add rdmsr and wrmsr commands for i386 (take 3)
>
>The syntax is:
>  rdmsr 
>  wrmsr   
>
>Signed-off-by: Bernardo Innocenti <[EMAIL PROTECTED]>
>---
> arch/i386/kdb/kdbasupport.c |   47 +++---
> kdb/kdbmain.c   |3 +-

Thanks for the patch, but ...

Before using MSR, you must first check that the cpu supports the
instruction, rd/wrmsr cause an oops on 486 or earlier.  Also using an
invalid msr number causes an oops, so use rd/wrmsr_safe().  Finally, kdb
on x86_64 needs the commands as well, so move it to kdb/modules/kdbm_x86.c
(common i386/x86_64 code).  Cleaned up patch + changelogs.

---

 arch/i386/kdb/ChangeLog   |6 
 arch/i386/kdb/kdbasupport.c   |7 ++--
 arch/x86_64/kdb/ChangeLog |6 
 arch/x86_64/kdb/kdbasupport.c |7 ++--
 kdb/ChangeLog |7 
 kdb/kdbmain.c |3 --
 kdb/modules/kdbm_x86.c|   59 ++
 7 files changed, 85 insertions(+), 10 deletions(-)

diff -u linux/arch/i386/kdb/ChangeLog linux/arch/i386/kdb/ChangeLog
--- linux/arch/i386/kdb/ChangeLog
+++ linux/arch/i386/kdb/ChangeLog
@@ -1,3 +1,9 @@
+2007-05-17 Keith Owens  <[EMAIL PROTECTED]>
+
+   * Update dumpregs comments for rdmsr and wrmsr commands.
+ Bernardo Innocenti.
+   * kdb v4.4-2.6.21-i386-3.
+
 2007-05-15 Keith Owens  <[EMAIL PROTECTED]>
 
* Change kdba_late_init to kdba_arch_init so KDB_ENTER() can be used
diff -u linux/arch/i386/kdb/kdbasupport.c linux/arch/i386/kdb/kdbasupport.c
--- linux/arch/i386/kdb/kdbasupport.c
+++ linux/arch/i386/kdb/kdbasupport.c
@@ -474,13 +474,14 @@
  * argument is NULL (struct pt_regs).   The alternate register
  * set types supported by this function:
  *
- * d   Debug registers
+ * d   Debug registers
  * c   Control registers
  * u   User registers at most recent entry to kernel
  * for the process currently selected with "pid" command.
  * Following not yet implemented:
- * m   Model Specific Registers (extra defines register #)
  * r   Memory Type Range Registers (extra defines register)
+ *
+ * MSR on i386/x86_64 are handled by rdmsr/wrmsr commands.
  */
 
 int
@@ -546,8 +547,6 @@
   cr[0], cr[1], cr[2], cr[3], cr[4]);
return 0;
}
-   case 'm':
-   break;
case 'r':
break;
default:
diff -u linux/arch/x86_64/kdb/ChangeLog linux/arch/x86_64/kdb/ChangeLog
--- linux/arch/x86_64/kdb/ChangeLog
+++ linux/arch/x86_64/kdb/ChangeLog
@@ -1,3 +1,9 @@
+2007-05-17 Keith Owens  <[EMAIL PROTECTED]>
+
+   * Update dumpregs comments for rdmsr and wrmsr commands.
+ Bernardo Innocenti.
+   * kdb v4.4-2.6.21-x86_64-3.
+
 2007-05-15 Keith Owens  <[EMAIL PROTECTED]>
 
* Change kdba_late_init to kdba_arch_init so KDB_ENTER() can be used
diff -u linux/arch/x86_64/kdb/kdbasupport.c linux/arch/x86_64/kdb/kdbasupport.c
--- linux/arch/x86_64/kdb/kdbasupport.c
+++ linux/arch/x86_64/kdb/kdbasupport.c
@@ -470,12 +470,13 @@
  * argument is NULL (struct pt_regs).   The alternate register
  * set types supported by this function:
  *
- * d   Debug registers
+ * d   Debug registers
  * c   Control registers
  * u   User registers at most recent entry to kernel
  * Following not yet implemented:
- * m   Model Specific Registers (extra defines register #)
  * r   Memory Type Range Registers (extra defines register)
+ *
+ * MSR on i386/x86_64 are handled by rdmsr/wrmsr commands.
  */
 
 int
@@ -536,8 +537,6 @@
   cr[0], cr[1], cr[2], cr[3], cr[4]);
return 0;
}
-   case 'm':
-   break;
case 'r':
    break;
default:
diff -u linux/kdb/ChangeLog linux/kdb/ChangeLog
--- linux/kdb/ChangeLog
+++ linux/kdb/ChangeLog
@@

Re: kdb: add rdmsr and wrmsr commands for i386

2007-05-17 Thread Keith Owens
Bernardo Innocenti (on Tue, 15 May 2007 23:03:55 -0400) wrote:
Jordan Crouse wrote:

 Can you break this up with a : between the high dword and the low dword?
 That makes it easier to parse when debugging.

Good idea, but I used _ instead because it's what AMD uses in their
documentation and it looks better with a 0x prefix.

 Also, would it make sense to coordinate the order of the high and low
 dwords with the order they are specified with 'wrmsr'?  

Yeah, I did it as suggested by Mitch.  Here's a thrid
revision of the patch with everything included:

From 1850ca76585306e2484cf5e709434049f1df3c1f Mon Sep 17 00:00:00 2001
From: Bernardo Innocenti [EMAIL PROTECTED]
Date: Tue, 15 May 2007 15:29:48 -0400
Subject: [PATCH] kdb: add rdmsr and wrmsr commands for i386 (take 3)

The syntax is:
  rdmsr addr
  wrmsr addr h l

Signed-off-by: Bernardo Innocenti [EMAIL PROTECTED]
---
 arch/i386/kdb/kdbasupport.c |   47 +++---
 kdb/kdbmain.c   |3 +-

Thanks for the patch, but ...

Before using MSR, you must first check that the cpu supports the
instruction, rd/wrmsr cause an oops on 486 or earlier.  Also using an
invalid msr number causes an oops, so use rd/wrmsr_safe().  Finally, kdb
on x86_64 needs the commands as well, so move it to kdb/modules/kdbm_x86.c
(common i386/x86_64 code).  Cleaned up patch + changelogs.

---

 arch/i386/kdb/ChangeLog   |6 
 arch/i386/kdb/kdbasupport.c   |7 ++--
 arch/x86_64/kdb/ChangeLog |6 
 arch/x86_64/kdb/kdbasupport.c |7 ++--
 kdb/ChangeLog |7 
 kdb/kdbmain.c |3 --
 kdb/modules/kdbm_x86.c|   59 ++
 7 files changed, 85 insertions(+), 10 deletions(-)

diff -u linux/arch/i386/kdb/ChangeLog linux/arch/i386/kdb/ChangeLog
--- linux/arch/i386/kdb/ChangeLog
+++ linux/arch/i386/kdb/ChangeLog
@@ -1,3 +1,9 @@
+2007-05-17 Keith Owens  [EMAIL PROTECTED]
+
+   * Update dumpregs comments for rdmsr and wrmsr commands.
+ Bernardo Innocenti.
+   * kdb v4.4-2.6.21-i386-3.
+
 2007-05-15 Keith Owens  [EMAIL PROTECTED]
 
* Change kdba_late_init to kdba_arch_init so KDB_ENTER() can be used
diff -u linux/arch/i386/kdb/kdbasupport.c linux/arch/i386/kdb/kdbasupport.c
--- linux/arch/i386/kdb/kdbasupport.c
+++ linux/arch/i386/kdb/kdbasupport.c
@@ -474,13 +474,14 @@
  * argument is NULL (struct pt_regs).   The alternate register
  * set types supported by this function:
  *
- * d   Debug registers
+ * d   Debug registers
  * c   Control registers
  * u   User registers at most recent entry to kernel
  * for the process currently selected with pid command.
  * Following not yet implemented:
- * m   Model Specific Registers (extra defines register #)
  * r   Memory Type Range Registers (extra defines register)
+ *
+ * MSR on i386/x86_64 are handled by rdmsr/wrmsr commands.
  */
 
 int
@@ -546,8 +547,6 @@
   cr[0], cr[1], cr[2], cr[3], cr[4]);
return 0;
}
-   case 'm':
-   break;
case 'r':
break;
default:
diff -u linux/arch/x86_64/kdb/ChangeLog linux/arch/x86_64/kdb/ChangeLog
--- linux/arch/x86_64/kdb/ChangeLog
+++ linux/arch/x86_64/kdb/ChangeLog
@@ -1,3 +1,9 @@
+2007-05-17 Keith Owens  [EMAIL PROTECTED]
+
+   * Update dumpregs comments for rdmsr and wrmsr commands.
+ Bernardo Innocenti.
+   * kdb v4.4-2.6.21-x86_64-3.
+
 2007-05-15 Keith Owens  [EMAIL PROTECTED]
 
* Change kdba_late_init to kdba_arch_init so KDB_ENTER() can be used
diff -u linux/arch/x86_64/kdb/kdbasupport.c linux/arch/x86_64/kdb/kdbasupport.c
--- linux/arch/x86_64/kdb/kdbasupport.c
+++ linux/arch/x86_64/kdb/kdbasupport.c
@@ -470,12 +470,13 @@
  * argument is NULL (struct pt_regs).   The alternate register
  * set types supported by this function:
  *
- * d   Debug registers
+ * d   Debug registers
  * c   Control registers
  * u   User registers at most recent entry to kernel
  * Following not yet implemented:
- * m   Model Specific Registers (extra defines register #)
  * r   Memory Type Range Registers (extra defines register)
+ *
+ * MSR on i386/x86_64 are handled by rdmsr/wrmsr commands.
  */
 
 int
@@ -536,8 +537,6 @@
   cr[0], cr[1], cr[2], cr[3], cr[4]);
return 0;
}
-   case 'm':
-   break;
case 'r':
break;
default:
diff -u linux/kdb/ChangeLog linux/kdb/ChangeLog
--- linux/kdb/ChangeLog
+++ linux/kdb/ChangeLog
@@ -1,3 +1,10 @@
+2007-05-17 Keith Owens  [EMAIL PROTECTED]
+
+   * Add rdmsr and wrmsr commands for i386 and x86_64.  Original patch by
+ Bernardo Innocenti for i386, reworked by Keith

Re: kdb: add rdmsr and wrmsr commands for i386

2007-05-17 Thread Keith Owens
Bernardo Innocenti (on Thu, 17 May 2007 02:36:21 -0400) wrote:
Keith Owens wrote:

 Before using MSR, you must first check that the cpu supports the
 instruction, rd/wrmsr cause an oops on 486 or earlier.  Also using an
 invalid msr number causes an oops, so use rd/wrmsr_safe().

I didn't bother implementing those checks because kdb recovers
nicely from GPF anyway.

Yes and no.  Yes, kdb will recover from a GPF.  No, because if the
system was already running correctly (i.e. manual entry into kdb), then
taking a GPF and not recovering will flag the rest of the system as
corrupt and can kill a running system.  I try to avoid adding spurious
system corruption.

It's the valid MSR writes that could
cause unrecoveable problems! :)

Tell me about it :-(

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/10] safe_apic_wait_icr_idle - i386

2007-04-25 Thread Keith Owens
Fernando Luis =?ISO-8859-1?Q?V=E1zquez?= Cao (on Wed, 25 Apr 2007 20:13:28 
+0900) wrote:
>+static __inline__ unsigned long safe_apic_wait_icr_idle(void)
>+{
>+  unsigned long send_status;
>+  int timeout;
>+
>+  timeout = 0;
>+  do {
>+  udelay(100);
>+  send_status = apic_read(APIC_ICR) & APIC_ICR_BUSY;
>+  } while (send_status && (timeout++ < 1000));
>+
>+  return send_status;
>+}
>+

safe_apic_wait_icr_idle() as coded guarantees a minimum 100 usec delay
before sending the IPI, this extra delay is unnecessary.  Change it to

do {
send_status = apic_read(APIC_ICR) & APIC_ICR_BUSY;
if (send_status)
break;
udelay(100);
} while (timeout++ < 1000);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/10] safe_apic_wait_icr_idle - i386

2007-04-25 Thread Keith Owens
Fernando Luis =?ISO-8859-1?Q?V=E1zquez?= Cao (on Wed, 25 Apr 2007 20:13:28 
+0900) wrote:
+static __inline__ unsigned long safe_apic_wait_icr_idle(void)
+{
+  unsigned long send_status;
+  int timeout;
+
+  timeout = 0;
+  do {
+  udelay(100);
+  send_status = apic_read(APIC_ICR)  APIC_ICR_BUSY;
+  } while (send_status  (timeout++  1000));
+
+  return send_status;
+}
+

safe_apic_wait_icr_idle() as coded guarantees a minimum 100 usec delay
before sending the IPI, this extra delay is unnecessary.  Change it to

do {
send_status = apic_read(APIC_ICR)  APIC_ICR_BUSY;
if (send_status)
break;
udelay(100);
} while (timeout++  1000);

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: $CHECK can't be overridden

2007-03-21 Thread Keith Owens
Dave Jones (on Thu, 22 Mar 2007 01:37:14 -0400) wrote:
>On Thu, Mar 22, 2007 at 04:26:39PM +1100, Keith Owens wrote:
> > Dave Jones (on Thu, 22 Mar 2007 01:15:25 -0400) wrote:
> > >make help implies that supplying $CHECK on the command line
> > >should override sparse as the checker used when building with C=1
> > >Yet, this doesn't seem to be the case.
> > >
> > >This would be useful for cases where for eg, sparse isn't in
> > >the $PATH, allowing an explicit path to the executable to be
> > >passed in automated build environments.
> > 
> > Works for me.
> > 
> > # make C=1 CHECK=foo
>
>Ah, my bad. I was thinking it was an environment var rather
>than a makefile var.  I was using 'CHECK=foo make bzImage C=1'

The default for 'make' is that environment variables do _not_ override
variables defined in the Makefiles.  You can change that behaviour with
the -e flag, 'CHECK=foo make -e bzImage C=1' should work.  'info make'
recommends against using -e, changing environments can lead to
unexpected side effects.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: $CHECK can't be overridden

2007-03-21 Thread Keith Owens
Dave Jones (on Thu, 22 Mar 2007 01:15:25 -0400) wrote:
>make help implies that supplying $CHECK on the command line
>should override sparse as the checker used when building with C=1
>Yet, this doesn't seem to be the case.
>
>This would be useful for cases where for eg, sparse isn't in
>the $PATH, allowing an explicit path to the executable to be
>passed in automated build environments.

Works for me.

# make C=1 CHECK=foo
Using somedir/linux as source for kernel
GEN someobj/Makefile
CHK include/linux/version.h
CHK include/linux/utsrelease.h
CHK include/linux/compile.h
CHECK   somedir/linux/kdb/modules/kdb_bt2.c
/bin/sh: foo: command not found


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PNPACPI probes serial twice, messes up serial console

2007-03-21 Thread Keith Owens
Bjorn Helgaas (on Wed, 21 Mar 2007 10:35:38 -0600) wrote:
>On Tuesday 20 March 2007 08:32, Bjorn Helgaas wrote:
>> On Tuesday 20 March 2007 00:46, Keith Owens wrote:
>> > Booting with 'console=tty console=ttyS0,9600'.  The serial console on
>> > ttyS0 (0x3f8, irq 4) is probed twice, once from serial8250_init() and
>> > again from serial_pnp_probe().
>> 
>> I played with this last summer, but was too timid to finish it
>> and post it.  My plan was to remove the legacy SERIAL_PORT_DFNS,
>> make platform devices for them, and only register the platform
>> devices in the absence of PNP.
>> 
>> My motivation at the time was to prevent 8250 from claiming IRDA
>> devices that happened to live at legacy UART addresses.  I also
>> wanted to make IRDA (smsc-ircc2 in my case) smart enough to use
>> PNP to locate its devices, since 8250 would no longer claim them.
>> 
>> Here's the dusty patch (against 2.6.18-rc1-mm2).  If it seems
>> like a reasonable thing to do, I can update it, polish it up,
>> add a changelog, and post it.
>
>Keith, does this patch help?  Russell didn't complain about it, so
>if it fixes your problem, maybe we could put it in -mm and see if
>it breaks anything else.

The aim of the patch looks sensible, but it will not compile for
2.6.21-rc4.  8250_x86.c tests pnp_platform_devices, which does not
exist.  Also the combination of CONFIG_SERIAL_8250_X86=y and
CONFIG_SERIAL_8250_PNP=m would result in 8250_x86.o being built into
vmlinux but referring to serial8250_nopnp in module 8250_pnp.o, kernel
to module references are tricky.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Use X86_EFLAGS_IF in irqflags.h, lguest.

2007-03-21 Thread Keith Owens
Rusty Russell (on Thu, 22 Mar 2007 14:52:29 +1100) wrote:
>On Thu, 2007-03-22 at 14:24 +1100, Rusty Russell wrote:
>> Belay this: there's a X86_EFLAGS_IF in asm/processor.h which we should
>> use.  Will send patch.
>
>How's this.  There may be other users, but they're not easy to grep for.

One less set of definitions for KDB to create - Thanks.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PNPACPI probes serial twice, messes up serial console

2007-03-21 Thread Keith Owens
Bjorn Helgaas (on Wed, 21 Mar 2007 10:35:38 -0600) wrote:
On Tuesday 20 March 2007 08:32, Bjorn Helgaas wrote:
 On Tuesday 20 March 2007 00:46, Keith Owens wrote:
  Booting with 'console=tty console=ttyS0,9600'.  The serial console on
  ttyS0 (0x3f8, irq 4) is probed twice, once from serial8250_init() and
  again from serial_pnp_probe().
 
 I played with this last summer, but was too timid to finish it
 and post it.  My plan was to remove the legacy SERIAL_PORT_DFNS,
 make platform devices for them, and only register the platform
 devices in the absence of PNP.
 
 My motivation at the time was to prevent 8250 from claiming IRDA
 devices that happened to live at legacy UART addresses.  I also
 wanted to make IRDA (smsc-ircc2 in my case) smart enough to use
 PNP to locate its devices, since 8250 would no longer claim them.
 
 Here's the dusty patch (against 2.6.18-rc1-mm2).  If it seems
 like a reasonable thing to do, I can update it, polish it up,
 add a changelog, and post it.

Keith, does this patch help?  Russell didn't complain about it, so
if it fixes your problem, maybe we could put it in -mm and see if
it breaks anything else.

The aim of the patch looks sensible, but it will not compile for
2.6.21-rc4.  8250_x86.c tests pnp_platform_devices, which does not
exist.  Also the combination of CONFIG_SERIAL_8250_X86=y and
CONFIG_SERIAL_8250_PNP=m would result in 8250_x86.o being built into
vmlinux but referring to serial8250_nopnp in module 8250_pnp.o, kernel
to module references are tricky.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: $CHECK can't be overridden

2007-03-21 Thread Keith Owens
Dave Jones (on Thu, 22 Mar 2007 01:15:25 -0400) wrote:
make help implies that supplying $CHECK on the command line
should override sparse as the checker used when building with C=1
Yet, this doesn't seem to be the case.

This would be useful for cases where for eg, sparse isn't in
the $PATH, allowing an explicit path to the executable to be
passed in automated build environments.

Works for me.

# make C=1 CHECK=foo
Using somedir/linux as source for kernel
GEN someobj/Makefile
CHK include/linux/version.h
CHK include/linux/utsrelease.h
CHK include/linux/compile.h
CHECK   somedir/linux/kdb/modules/kdb_bt2.c
/bin/sh: foo: command not found


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: $CHECK can't be overridden

2007-03-21 Thread Keith Owens
Dave Jones (on Thu, 22 Mar 2007 01:37:14 -0400) wrote:
On Thu, Mar 22, 2007 at 04:26:39PM +1100, Keith Owens wrote:
  Dave Jones (on Thu, 22 Mar 2007 01:15:25 -0400) wrote:
  make help implies that supplying $CHECK on the command line
  should override sparse as the checker used when building with C=1
  Yet, this doesn't seem to be the case.
  
  This would be useful for cases where for eg, sparse isn't in
  the $PATH, allowing an explicit path to the executable to be
  passed in automated build environments.
  
  Works for me.
  
  # make C=1 CHECK=foo

Ah, my bad. I was thinking it was an environment var rather
than a makefile var.  I was using 'CHECK=foo make bzImage C=1'

The default for 'make' is that environment variables do _not_ override
variables defined in the Makefiles.  You can change that behaviour with
the -e flag, 'CHECK=foo make -e bzImage C=1' should work.  'info make'
recommends against using -e, changing environments can lead to
unexpected side effects.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Use X86_EFLAGS_IF in irqflags.h, lguest.

2007-03-21 Thread Keith Owens
Rusty Russell (on Thu, 22 Mar 2007 14:52:29 +1100) wrote:
On Thu, 2007-03-22 at 14:24 +1100, Rusty Russell wrote:
 Belay this: there's a X86_EFLAGS_IF in asm/processor.h which we should
 use.  Will send patch.

How's this.  There may be other users, but they're not easy to grep for.

One less set of definitions for KDB to create - Thanks.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


PNPACPI probes serial twice, messes up serial console

2007-03-19 Thread Keith Owens
Dell SC1425 x86_64 running in i386 mode (the problem also occurs in
x86_64 mode).  Kernel 2.6.21-rc4, gcc 4.1.0.  Config extract at end.

Booting with 'console=tty console=ttyS0,9600'.  The serial console on
ttyS0 (0x3f8, irq 4) is probed twice, once from serial8250_init() and
again from serial_pnp_probe().  The serial console output is correct
until the second probe (from PNP) gets to these lines in
serial8250_config_port()

if (flags & UART_CONFIG_TYPE)
autoconfig(up, probeflags);

After the call to autoconfig(), the serial console starts printing NUL
characters instead of the console output.  The number of NUL characters
corresponds closely with the number of characters written to the VT
console, IOW it outputs each serial character as NUL instead of the
correct character.  When the kernel boots /sbin/init, the console
resets to printing normal characters.

AFAICT, the second probe of the UART is doing something nasty to the
hardware.  This is not a recent problem, I can reproduce the problem on
2.6.16.  Booting with pnpacpi=off removes the problem, but that
supresses all the PNPACPI code, not just the second probe of the serial
devices.

Should pnpacpi probe and setup the serial devices even when thay have
already been setup?  Or this is something strange about the UART in
this particular box?

FWIW, the serial console is plugged into a serial to USB converter
(pl2303), my laptop has no serial ports.  That should not make a
difference, but just in case it does ...

Config extract:

X86_32=y
GENERIC_TIME=y
CLOCKSOURCE_WATCHDOG=y
GENERIC_CLOCKEVENTS=y
GENERIC_CLOCKEVENTS_BROADCAST=y
LOCKDEP_SUPPORT=y
STACKTRACE_SUPPORT=y
SEMAPHORE_SLEEPERS=y
X86=y
MMU=y
ZONE_DMA=y
GENERIC_ISA_DMA=y
GENERIC_IOMAP=y
GENERIC_BUG=y
GENERIC_HWEIGHT=y
ARCH_MAY_HAVE_PC_FDC=y
DMI=y
DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
EXPERIMENTAL=y
LOCK_KERNEL=y
INIT_ENV_ARG_LIMIT=32
LOCALVERSION="-i386-kaos"
LOCALVERSION_AUTO=y
SWAP=y
SYSVIPC=y
SYSVIPC_SYSCTL=y
POSIX_MQUEUE=y
IKCONFIG=y
IKCONFIG_PROC=y
SYSFS_DEPRECATED=y
CC_OPTIMIZE_FOR_SIZE=y
SYSCTL=y
EMBEDDED=y
SYSCTL_SYSCALL=y
KALLSYMS=y
KALLSYMS_ALL=y
HOTPLUG=y
PRINTK=y
BUG=y
ELF_CORE=y
BASE_FULL=y
FUTEX=y
EPOLL=y
SHMEM=y
SLAB=y
VM_EVENT_COUNTERS=y
RT_MUTEXES=y
BASE_SMALL=0
MODULES=y
MODULE_UNLOAD=y
KMOD=y
STOP_MACHINE=y
BLOCK=y
LBD=y
LSF=y
IOSCHED_NOOP=y
IOSCHED_AS=y
IOSCHED_DEADLINE=y
IOSCHED_CFQ=y
DEFAULT_DEADLINE=y
DEFAULT_IOSCHED="deadline"
TICK_ONESHOT=y
HIGH_RES_TIMERS=y
SMP=y
X86_PC=y
MPENTIUM4=y
X86_CMPXCHG=y
X86_L1_CACHE_SHIFT=7
RWSEM_XCHGADD_ALGORITHM=y
GENERIC_CALIBRATE_DELAY=y
X86_WP_WORKS_OK=y
X86_INVLPG=y
X86_BSWAP=y
X86_POPAD_OK=y
X86_CMPXCHG64=y
X86_GOOD_APIC=y
X86_INTEL_USERCOPY=y
X86_USE_PPRO_CHECKSUM=y
X86_TSC=y
HPET_TIMER=y
HPET_EMULATE_RTC=y
NR_CPUS=8
SCHED_SMT=y
PREEMPT_NONE=y
X86_LOCAL_APIC=y
X86_IO_APIC=y
X86_MCE=y
X86_MCE_NONFATAL=y
X86_MCE_P4THERMAL=y
MICROCODE=m
MICROCODE_OLD_INTERFACE=y
X86_MSR=m
X86_CPUID=m
HIGHMEM4G=y
VMSPLIT_3G=y
PAGE_OFFSET=0xC000
HIGHMEM=y
ARCH_FLATMEM_ENABLE=y
ARCH_SPARSEMEM_ENABLE=y
ARCH_SELECT_MEMORY_MODEL=y
ARCH_POPULATES_NODE_MAP=y
SELECT_MEMORY_MODEL=y
FLATMEM_MANUAL=y
FLATMEM=y
FLAT_NODE_MEM_MAP=y
SPARSEMEM_STATIC=y
SPLIT_PTLOCK_CPUS=4
ZONE_DMA_FLAG=1
MTRR=y
IRQBALANCE=y
HZ_250=y
HZ=250
PHYSICAL_START=0x10
PHYSICAL_ALIGN=0x10
COMPAT_VDSO=y
ARCH_ENABLE_MEMORY_HOTPLUG=y
PM=y
ACPI=y
ACPI_PROCFS=y
ACPI_BUTTON=m
ACPI_FAN=m
ACPI_PROCESSOR=m
ACPI_BLACKLIST_YEAR=0
ACPI_EC=y
ACPI_POWER=y
ACPI_SYSTEM=y
PCI=y
PCI_GOANY=y
PCI_BIOS=y
PCI_DIRECT=y
PCI_MMCONFIG=y
PCIEPORTBUS=y
PCIEAER=y
PCI_MSI=y
HT_IRQ=y
ISA_DMA_API=y
BINFMT_ELF=y
BINFMT_MISC=m
NET=y
PACKET=y
PACKET_MMAP=y
UNIX=y
XFRM=y
INET=y
IP_MULTICAST=y
IP_ADVANCED_ROUTER=y
ASK_IP_FIB_HASH=y
IP_FIB_HASH=y
IP_ROUTE_MULTIPATH=y
IP_ROUTE_VERBOSE=y
SYN_COOKIES=y
INET_XFRM_MODE_BEET=y
INET_DIAG=y
INET_TCP_DIAG=y
TCP_CONG_CUBIC=y
DEFAULT_TCP_CONG="cubic"
NETFILTER=y
NETFILTER_NETLINK=m
NETFILTER_NETLINK_LOG=m
NETFILTER_XTABLES=y
NETFILTER_XT_TARGET_CLASSIFY=m
NETFILTER_XT_TARGET_MARK=m
NETFILTER_XT_MATCH_COMMENT=m
NETFILTER_XT_MATCH_DCCP=m
NETFILTER_XT_MATCH_ESP=m
NETFILTER_XT_MATCH_LENGTH=m
NETFILTER_XT_MATCH_LIMIT=m
NETFILTER_XT_MATCH_MAC=m
NETFILTER_XT_MATCH_MARK=m
NETFILTER_XT_MATCH_MULTIPORT=m
NETFILTER_XT_MATCH_PKTTYPE=m
NETFILTER_XT_MATCH_QUOTA=m
NETFILTER_XT_MATCH_REALM=m
NETFILTER_XT_MATCH_SCTP=m
NETFILTER_XT_MATCH_STATISTIC=m
NETFILTER_XT_MATCH_TCPMSS=m
IP_NF_IPTABLES=y
IP_NF_MATCH_IPRANGE=m
IP_NF_MATCH_TOS=m
IP_NF_MATCH_RECENT=m
IP_NF_MATCH_ECN=m
IP_NF_MATCH_AH=m
IP_NF_MATCH_TTL=m
IP_NF_MATCH_OWNER=m
IP_NF_MATCH_ADDRTYPE=m
IP_NF_FILTER=y
IP_NF_TARGET_REJECT=y
IP_NF_TARGET_ULOG=y
VLAN_8021Q=y
NET_CLS_ROUTE=y
STANDALONE=y
PREVENT_FIRMWARE_BUILD=y
FW_LOADER=m
CONNECTOR=m
PNP=y
PNP_DEBUG=y
PNPACPI=y
BLK_DEV_FD=m
BLK_DEV_LOOP=m
IDE=m
IDE_MAX_HWIFS=4
BLK_DEV_IDE=m
BLK_DEV_IDEDISK=m
IDEDISK_MULTI_MODE=y
BLK_DEV_IDECD=m
IDE_TASK_IOCTL=y
BLK_DEV_IDEPCI=y
IDEPCI_SHARE_IRQ=y
BLK_DEV_IDEDMA_PCI=y
IDEDMA_PCI_AUTO=y
BLK_DEV_PIIX=m
BLK_DEV_IDEDMA=y

PNPACPI probes serial twice, messes up serial console

2007-03-19 Thread Keith Owens
Dell SC1425 x86_64 running in i386 mode (the problem also occurs in
x86_64 mode).  Kernel 2.6.21-rc4, gcc 4.1.0.  Config extract at end.

Booting with 'console=tty console=ttyS0,9600'.  The serial console on
ttyS0 (0x3f8, irq 4) is probed twice, once from serial8250_init() and
again from serial_pnp_probe().  The serial console output is correct
until the second probe (from PNP) gets to these lines in
serial8250_config_port()

if (flags  UART_CONFIG_TYPE)
autoconfig(up, probeflags);

After the call to autoconfig(), the serial console starts printing NUL
characters instead of the console output.  The number of NUL characters
corresponds closely with the number of characters written to the VT
console, IOW it outputs each serial character as NUL instead of the
correct character.  When the kernel boots /sbin/init, the console
resets to printing normal characters.

AFAICT, the second probe of the UART is doing something nasty to the
hardware.  This is not a recent problem, I can reproduce the problem on
2.6.16.  Booting with pnpacpi=off removes the problem, but that
supresses all the PNPACPI code, not just the second probe of the serial
devices.

Should pnpacpi probe and setup the serial devices even when thay have
already been setup?  Or this is something strange about the UART in
this particular box?

FWIW, the serial console is plugged into a serial to USB converter
(pl2303), my laptop has no serial ports.  That should not make a
difference, but just in case it does ...

Config extract:

X86_32=y
GENERIC_TIME=y
CLOCKSOURCE_WATCHDOG=y
GENERIC_CLOCKEVENTS=y
GENERIC_CLOCKEVENTS_BROADCAST=y
LOCKDEP_SUPPORT=y
STACKTRACE_SUPPORT=y
SEMAPHORE_SLEEPERS=y
X86=y
MMU=y
ZONE_DMA=y
GENERIC_ISA_DMA=y
GENERIC_IOMAP=y
GENERIC_BUG=y
GENERIC_HWEIGHT=y
ARCH_MAY_HAVE_PC_FDC=y
DMI=y
DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config
EXPERIMENTAL=y
LOCK_KERNEL=y
INIT_ENV_ARG_LIMIT=32
LOCALVERSION=-i386-kaos
LOCALVERSION_AUTO=y
SWAP=y
SYSVIPC=y
SYSVIPC_SYSCTL=y
POSIX_MQUEUE=y
IKCONFIG=y
IKCONFIG_PROC=y
SYSFS_DEPRECATED=y
CC_OPTIMIZE_FOR_SIZE=y
SYSCTL=y
EMBEDDED=y
SYSCTL_SYSCALL=y
KALLSYMS=y
KALLSYMS_ALL=y
HOTPLUG=y
PRINTK=y
BUG=y
ELF_CORE=y
BASE_FULL=y
FUTEX=y
EPOLL=y
SHMEM=y
SLAB=y
VM_EVENT_COUNTERS=y
RT_MUTEXES=y
BASE_SMALL=0
MODULES=y
MODULE_UNLOAD=y
KMOD=y
STOP_MACHINE=y
BLOCK=y
LBD=y
LSF=y
IOSCHED_NOOP=y
IOSCHED_AS=y
IOSCHED_DEADLINE=y
IOSCHED_CFQ=y
DEFAULT_DEADLINE=y
DEFAULT_IOSCHED=deadline
TICK_ONESHOT=y
HIGH_RES_TIMERS=y
SMP=y
X86_PC=y
MPENTIUM4=y
X86_CMPXCHG=y
X86_L1_CACHE_SHIFT=7
RWSEM_XCHGADD_ALGORITHM=y
GENERIC_CALIBRATE_DELAY=y
X86_WP_WORKS_OK=y
X86_INVLPG=y
X86_BSWAP=y
X86_POPAD_OK=y
X86_CMPXCHG64=y
X86_GOOD_APIC=y
X86_INTEL_USERCOPY=y
X86_USE_PPRO_CHECKSUM=y
X86_TSC=y
HPET_TIMER=y
HPET_EMULATE_RTC=y
NR_CPUS=8
SCHED_SMT=y
PREEMPT_NONE=y
X86_LOCAL_APIC=y
X86_IO_APIC=y
X86_MCE=y
X86_MCE_NONFATAL=y
X86_MCE_P4THERMAL=y
MICROCODE=m
MICROCODE_OLD_INTERFACE=y
X86_MSR=m
X86_CPUID=m
HIGHMEM4G=y
VMSPLIT_3G=y
PAGE_OFFSET=0xC000
HIGHMEM=y
ARCH_FLATMEM_ENABLE=y
ARCH_SPARSEMEM_ENABLE=y
ARCH_SELECT_MEMORY_MODEL=y
ARCH_POPULATES_NODE_MAP=y
SELECT_MEMORY_MODEL=y
FLATMEM_MANUAL=y
FLATMEM=y
FLAT_NODE_MEM_MAP=y
SPARSEMEM_STATIC=y
SPLIT_PTLOCK_CPUS=4
ZONE_DMA_FLAG=1
MTRR=y
IRQBALANCE=y
HZ_250=y
HZ=250
PHYSICAL_START=0x10
PHYSICAL_ALIGN=0x10
COMPAT_VDSO=y
ARCH_ENABLE_MEMORY_HOTPLUG=y
PM=y
ACPI=y
ACPI_PROCFS=y
ACPI_BUTTON=m
ACPI_FAN=m
ACPI_PROCESSOR=m
ACPI_BLACKLIST_YEAR=0
ACPI_EC=y
ACPI_POWER=y
ACPI_SYSTEM=y
PCI=y
PCI_GOANY=y
PCI_BIOS=y
PCI_DIRECT=y
PCI_MMCONFIG=y
PCIEPORTBUS=y
PCIEAER=y
PCI_MSI=y
HT_IRQ=y
ISA_DMA_API=y
BINFMT_ELF=y
BINFMT_MISC=m
NET=y
PACKET=y
PACKET_MMAP=y
UNIX=y
XFRM=y
INET=y
IP_MULTICAST=y
IP_ADVANCED_ROUTER=y
ASK_IP_FIB_HASH=y
IP_FIB_HASH=y
IP_ROUTE_MULTIPATH=y
IP_ROUTE_VERBOSE=y
SYN_COOKIES=y
INET_XFRM_MODE_BEET=y
INET_DIAG=y
INET_TCP_DIAG=y
TCP_CONG_CUBIC=y
DEFAULT_TCP_CONG=cubic
NETFILTER=y
NETFILTER_NETLINK=m
NETFILTER_NETLINK_LOG=m
NETFILTER_XTABLES=y
NETFILTER_XT_TARGET_CLASSIFY=m
NETFILTER_XT_TARGET_MARK=m
NETFILTER_XT_MATCH_COMMENT=m
NETFILTER_XT_MATCH_DCCP=m
NETFILTER_XT_MATCH_ESP=m
NETFILTER_XT_MATCH_LENGTH=m
NETFILTER_XT_MATCH_LIMIT=m
NETFILTER_XT_MATCH_MAC=m
NETFILTER_XT_MATCH_MARK=m
NETFILTER_XT_MATCH_MULTIPORT=m
NETFILTER_XT_MATCH_PKTTYPE=m
NETFILTER_XT_MATCH_QUOTA=m
NETFILTER_XT_MATCH_REALM=m
NETFILTER_XT_MATCH_SCTP=m
NETFILTER_XT_MATCH_STATISTIC=m
NETFILTER_XT_MATCH_TCPMSS=m
IP_NF_IPTABLES=y
IP_NF_MATCH_IPRANGE=m
IP_NF_MATCH_TOS=m
IP_NF_MATCH_RECENT=m
IP_NF_MATCH_ECN=m
IP_NF_MATCH_AH=m
IP_NF_MATCH_TTL=m
IP_NF_MATCH_OWNER=m
IP_NF_MATCH_ADDRTYPE=m
IP_NF_FILTER=y
IP_NF_TARGET_REJECT=y
IP_NF_TARGET_ULOG=y
VLAN_8021Q=y
NET_CLS_ROUTE=y
STANDALONE=y
PREVENT_FIRMWARE_BUILD=y
FW_LOADER=m
CONNECTOR=m
PNP=y
PNP_DEBUG=y
PNPACPI=y
BLK_DEV_FD=m
BLK_DEV_LOOP=m
IDE=m
IDE_MAX_HWIFS=4
BLK_DEV_IDE=m
BLK_DEV_IDEDISK=m
IDEDISK_MULTI_MODE=y
BLK_DEV_IDECD=m
IDE_TASK_IOCTL=y
BLK_DEV_IDEPCI=y
IDEPCI_SHARE_IRQ=y
BLK_DEV_IDEDMA_PCI=y
IDEDMA_PCI_AUTO=y
BLK_DEV_PIIX=m
BLK_DEV_IDEDMA=y

Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

2006-11-30 Thread Keith Owens
Willy Tarreau (on Fri, 1 Dec 2006 06:26:53 +0100) wrote:
>On Fri, Dec 01, 2006 at 04:14:04PM +1100, Keith Owens wrote:
>> SuSE's SLES10 ships with gcc 4.1.0.  There is nothing to stop a
>> distributor from backporting the bug fix from gcc 4.1.1 to 4.1.0, but
>> this patch would not allow the fixed compiler to build the kernel.
>
>Then maybe replace #error with #warning ? It's too dangerous to let people
>build their kernel with a known broken compiler without being informed.

Agreed.

>I think this shows the limit of backports to known broken versions.
>Providing a full update to 4.1.1 would certainly be cleaner for all
>customers than backporting 4.1.1 to 4.1.0 and calling it 4.1.0.

Agreed, but Enterprise customers expect bug fixes, not wholesale
replacements of critical programs.

>Another solution would be to be able to check gcc for known bugs in the
>makefile, just like we check it for specific options. But I don't know
>how we can check gcc for bad code, especially in cross-compile environments

It is doable, but it is as ugly as hell.  Note the lack of a
signed-off-by :)

---
 arch/i386/kernel/Makefile |   16 
 1 file changed, 16 insertions(+)

Index: linux/arch/i386/kernel/Makefile
===
--- linux.orig/arch/i386/kernel/Makefile
+++ linux/arch/i386/kernel/Makefile
@@ -83,3 +83,19 @@ $(obj)/vsyscall-syms.o: $(src)/vsyscall.
 k8-y  += ../../x86_64/kernel/k8.o
 stacktrace-y += ../../x86_64/kernel/stacktrace.o
 
+# Some versions of gcc generate invalid code for hpet_timer, depending
+# on other config options.  Make sure that the generated code is valid.
+# Invalid versions of gcc generate a tight loop in wait_hpet_tick, with
+# no 'cmp' instructions.  Extract the generated object code for
+# wait_hpet_tick, down to the next function then check that the code
+# contains at least one comparison.
+
+ifeq ($(CONFIG_HPET_TIMER),y)
+$(obj)/built-in.o: $(obj)/.tmp_check_gcc_bug
+
+$(obj)/.tmp_check_gcc_bug: $(obj)/time_hpet.o
+   $(Q)[ -n "`$(OBJDUMP) -Sr $< | grep -A40 ':' | sed -e 
'1d; />:$$/,$$d;' | grep -w cmp`" ] || \
+   (echo gcc volatile bug detected, fix your gcc; exit 1)
+   $(Q)touch $@
+
+endif

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

2006-11-30 Thread Keith Owens
Andrew Morton (on Thu, 30 Nov 2006 21:05:51 -0800) wrote:
>On Wed, 29 Nov 2006 21:14:10 +0100
>Willy Tarreau <[EMAIL PROTECTED]> wrote:
>
>> Then why not simply check for gcc 4.1.0 in compiler.h and refuse to build
>> with 4.1.0 if it's known to produce bad code ?
>
>Think so.  I'll queue this and see how many howls it causes.
>
>--- a/init/main.c~gcc-4-1-0-is-bust
>+++ a/init/main.c
>@@ -75,6 +75,10 @@
> #error Sorry, your GCC is too old. It builds incorrect kernels.
> #endif
> 
>+#if __GNUC__ == 4 && __GNUC_MINOR__ == 1 && __GNUC_PATCHLEVEL__ == 0
>+#error gcc-4.1.0 is known to miscompile the kernel.  Please use a different 
>compiler version.
>+#endif
>+
> static int init(void *);
> 
> extern void init_IRQ(void);

SuSE's SLES10 ships with gcc 4.1.0.  There is nothing to stop a
distributor from backporting the bug fix from gcc 4.1.1 to 4.1.0, but
this patch would not allow the fixed compiler to build the kernel.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

2006-11-30 Thread Keith Owens
Andrew Morton (on Thu, 30 Nov 2006 21:05:51 -0800) wrote:
On Wed, 29 Nov 2006 21:14:10 +0100
Willy Tarreau [EMAIL PROTECTED] wrote:

 Then why not simply check for gcc 4.1.0 in compiler.h and refuse to build
 with 4.1.0 if it's known to produce bad code ?

Think so.  I'll queue this and see how many howls it causes.

--- a/init/main.c~gcc-4-1-0-is-bust
+++ a/init/main.c
@@ -75,6 +75,10 @@
 #error Sorry, your GCC is too old. It builds incorrect kernels.
 #endif
 
+#if __GNUC__ == 4  __GNUC_MINOR__ == 1  __GNUC_PATCHLEVEL__ == 0
+#error gcc-4.1.0 is known to miscompile the kernel.  Please use a different 
compiler version.
+#endif
+
 static int init(void *);
 
 extern void init_IRQ(void);

SuSE's SLES10 ships with gcc 4.1.0.  There is nothing to stop a
distributor from backporting the bug fix from gcc 4.1.1 to 4.1.0, but
this patch would not allow the fixed compiler to build the kernel.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

2006-11-30 Thread Keith Owens
Willy Tarreau (on Fri, 1 Dec 2006 06:26:53 +0100) wrote:
On Fri, Dec 01, 2006 at 04:14:04PM +1100, Keith Owens wrote:
 SuSE's SLES10 ships with gcc 4.1.0.  There is nothing to stop a
 distributor from backporting the bug fix from gcc 4.1.1 to 4.1.0, but
 this patch would not allow the fixed compiler to build the kernel.

Then maybe replace #error with #warning ? It's too dangerous to let people
build their kernel with a known broken compiler without being informed.

Agreed.

I think this shows the limit of backports to known broken versions.
Providing a full update to 4.1.1 would certainly be cleaner for all
customers than backporting 4.1.1 to 4.1.0 and calling it 4.1.0.

Agreed, but Enterprise customers expect bug fixes, not wholesale
replacements of critical programs.

Another solution would be to be able to check gcc for known bugs in the
makefile, just like we check it for specific options. But I don't know
how we can check gcc for bad code, especially in cross-compile environments

It is doable, but it is as ugly as hell.  Note the lack of a
signed-off-by :)

---
 arch/i386/kernel/Makefile |   16 
 1 file changed, 16 insertions(+)

Index: linux/arch/i386/kernel/Makefile
===
--- linux.orig/arch/i386/kernel/Makefile
+++ linux/arch/i386/kernel/Makefile
@@ -83,3 +83,19 @@ $(obj)/vsyscall-syms.o: $(src)/vsyscall.
 k8-y  += ../../x86_64/kernel/k8.o
 stacktrace-y += ../../x86_64/kernel/stacktrace.o
 
+# Some versions of gcc generate invalid code for hpet_timer, depending
+# on other config options.  Make sure that the generated code is valid.
+# Invalid versions of gcc generate a tight loop in wait_hpet_tick, with
+# no 'cmp' instructions.  Extract the generated object code for
+# wait_hpet_tick, down to the next function then check that the code
+# contains at least one comparison.
+
+ifeq ($(CONFIG_HPET_TIMER),y)
+$(obj)/built-in.o: $(obj)/.tmp_check_gcc_bug
+
+$(obj)/.tmp_check_gcc_bug: $(obj)/time_hpet.o
+   $(Q)[ -n `$(OBJDUMP) -Sr $ | grep -A40 'wait_hpet_tick:' | sed -e 
'1d; /:$$/,$$d;' | grep -w cmp` ] || \
+   (echo gcc volatile bug detected, fix your gcc; exit 1)
+   $(Q)touch $@
+
+endif

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

2006-11-28 Thread Keith Owens
David Miller (on Tue, 28 Nov 2006 20:04:53 -0800 (PST)) wrote:
>From: Keith Owens 
>Date: Wed, 29 Nov 2006 14:56:20 +1100
>
>> Secondly, I believe that this is a separate problem from bug 22278.
>> hpet_readl() is correctly using volatile internally, but its result is
>> being assigned to a pair of normal integers (not declared as volatile).
>> In the context of wait_hpet_tick, all the variables are unqualified so
>> gcc is allowed to optimize the comparison away.
>> 
>> The same problem may exist in other parts of arch/i386/kernel/time_hpet.c,
>> where the return value from hpet_readl() is assigned to a normal
>> variable.  Nothing in the C standard says that those unqualified
>> variables should be magically treated as volatile, just because the
>> original code that extracted the value used volatile.  IOW, time_hpet.c
>> needs to declare any variables that hold the result of hpet_readl() as
>> being volatile variables.
>
>I disagree with this.
>
>readl() returns values from an opaque source, and it is declared
>as such to show this to GCC.  It's like a function that GCC
>cannot see the implementation of, which it cannot determine
>anything about wrt. return values.
>
>The volatile'ness does not simply disappear the moment you
>assign the result to some local variable which is not volatile.
>
>Half of our drivers would break if this were true.

This is definitely a gcc bug, 4.1.0 is doing something weird.  Compile
with CONFIG_CC_OPTIMIZE_FOR_SIZE=n and the bug appears,
CONFIG_CC_OPTIMIZE_FOR_SIZE=y has no problem.

Compile with CONFIG_CC_OPTIMIZE_FOR_SIZE=n and _either_ of the patches
below and the problem disappears.

Index: linux/arch/i386/kernel/time_hpet.c
===
--- linux.orig/arch/i386/kernel/time_hpet.c 2006-11-29 13:51:33.900462088 
+1100
+++ linux/arch/i386/kernel/time_hpet.c  2006-11-29 15:25:47.853245938 +1100
@@ -35,7 +35,8 @@ static void __iomem * hpet_virt_address;
 
 int hpet_readl(unsigned long a)
 {
-   return readl(hpet_virt_address + a);
+   volatile int v = readl(hpet_virt_address + a);
+   return v;
 }
 
 static void hpet_writel(unsigned long d, unsigned long a)


Index: linux-2.6/arch/i386/kernel/time_hpet.c
===
--- linux-2.6.orig/arch/i386/kernel/time_hpet.c
+++ linux-2.6/arch/i386/kernel/time_hpet.c
@@ -51,7 +51,7 @@ static void hpet_writel(unsigned long d,
  */
 static void __devinit wait_hpet_tick(void)
 {
-   unsigned int start_cmp_val, end_cmp_val;
+   unsigned volatile int start_cmp_val, end_cmp_val;
 
start_cmp_val = hpet_readl(HPET_T0_CMP);
do {

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

2006-11-28 Thread Keith Owens
Nicholas Miell (on Tue, 28 Nov 2006 19:08:25 -0800) wrote:
>On Wed, 2006-11-29 at 13:22 +1100, Keith Owens wrote:
>> Compiling 2.6.19-rc6 with gcc version 4.1.0 (SUSE Linux),
>> wait_hpet_tick is optimized away to a never ending loop and the kernel
>> hangs on boot in timer setup.
>> 
>> 001a :
>>   1a:   55  push   %ebp
>>   1b:   89 e5   mov%esp,%ebp
>>   1d:   eb fe   jmp1d 
>> 
>> This is not a problem with gcc 3.3.5.  Adding barrier() calls to
>> wait_hpet_tick does not help, making the variables volatile does.
>> 
>> Signed-off-by: Keith Owens 
>> 
>> ---
>>  arch/i386/kernel/time_hpet.c |2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> Index: linux-2.6/arch/i386/kernel/time_hpet.c
>> ===
>> --- linux-2.6.orig/arch/i386/kernel/time_hpet.c
>> +++ linux-2.6/arch/i386/kernel/time_hpet.c
>> @@ -51,7 +51,7 @@ static void hpet_writel(unsigned long d,
>>   */
>>  static void __devinit wait_hpet_tick(void)
>>  {
>> -unsigned int start_cmp_val, end_cmp_val;
>> +unsigned volatile int start_cmp_val, end_cmp_val;
>>  
>>  start_cmp_val = hpet_readl(HPET_T0_CMP);
>>  do {
>
>When you examine the inlined functions involved, this looks an awful lot
>like http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22278
>
>Perhaps SUSE should fix their gcc instead of working around compiler
>problems in the kernel?

Firstly, the fix for 22278 is included in gcc 4.1.0.

Secondly, I believe that this is a separate problem from bug 22278.
hpet_readl() is correctly using volatile internally, but its result is
being assigned to a pair of normal integers (not declared as volatile).
In the context of wait_hpet_tick, all the variables are unqualified so
gcc is allowed to optimize the comparison away.

The same problem may exist in other parts of arch/i386/kernel/time_hpet.c,
where the return value from hpet_readl() is assigned to a normal
variable.  Nothing in the C standard says that those unqualified
variables should be magically treated as volatile, just because the
original code that extracted the value used volatile.  IOW, time_hpet.c
needs to declare any variables that hold the result of hpet_readl() as
being volatile variables.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

2006-11-28 Thread Keith Owens
Compiling 2.6.19-rc6 with gcc version 4.1.0 (SUSE Linux),
wait_hpet_tick is optimized away to a never ending loop and the kernel
hangs on boot in timer setup.

001a :
  1a:   55  push   %ebp
  1b:   89 e5   mov%esp,%ebp
  1d:   eb fe   jmp1d 

This is not a problem with gcc 3.3.5.  Adding barrier() calls to
wait_hpet_tick does not help, making the variables volatile does.

Signed-off-by: Keith Owens 

---
 arch/i386/kernel/time_hpet.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/arch/i386/kernel/time_hpet.c
===
--- linux-2.6.orig/arch/i386/kernel/time_hpet.c
+++ linux-2.6/arch/i386/kernel/time_hpet.c
@@ -51,7 +51,7 @@ static void hpet_writel(unsigned long d,
  */
 static void __devinit wait_hpet_tick(void)
 {
-   unsigned int start_cmp_val, end_cmp_val;
+   unsigned volatile int start_cmp_val, end_cmp_val;
 
start_cmp_val = hpet_readl(HPET_T0_CMP);
do {

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

2006-11-28 Thread Keith Owens
Compiling 2.6.19-rc6 with gcc version 4.1.0 (SUSE Linux),
wait_hpet_tick is optimized away to a never ending loop and the kernel
hangs on boot in timer setup.

001a wait_hpet_tick:
  1a:   55  push   %ebp
  1b:   89 e5   mov%esp,%ebp
  1d:   eb fe   jmp1d wait_hpet_tick+0x3

This is not a problem with gcc 3.3.5.  Adding barrier() calls to
wait_hpet_tick does not help, making the variables volatile does.

Signed-off-by: Keith Owens kaos@ocs.com.au

---
 arch/i386/kernel/time_hpet.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/arch/i386/kernel/time_hpet.c
===
--- linux-2.6.orig/arch/i386/kernel/time_hpet.c
+++ linux-2.6/arch/i386/kernel/time_hpet.c
@@ -51,7 +51,7 @@ static void hpet_writel(unsigned long d,
  */
 static void __devinit wait_hpet_tick(void)
 {
-   unsigned int start_cmp_val, end_cmp_val;
+   unsigned volatile int start_cmp_val, end_cmp_val;
 
start_cmp_val = hpet_readl(HPET_T0_CMP);
do {

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

2006-11-28 Thread Keith Owens
Nicholas Miell (on Tue, 28 Nov 2006 19:08:25 -0800) wrote:
On Wed, 2006-11-29 at 13:22 +1100, Keith Owens wrote:
 Compiling 2.6.19-rc6 with gcc version 4.1.0 (SUSE Linux),
 wait_hpet_tick is optimized away to a never ending loop and the kernel
 hangs on boot in timer setup.
 
 001a wait_hpet_tick:
   1a:   55  push   %ebp
   1b:   89 e5   mov%esp,%ebp
   1d:   eb fe   jmp1d wait_hpet_tick+0x3
 
 This is not a problem with gcc 3.3.5.  Adding barrier() calls to
 wait_hpet_tick does not help, making the variables volatile does.
 
 Signed-off-by: Keith Owens kaos@ocs.com.au
 
 ---
  arch/i386/kernel/time_hpet.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 Index: linux-2.6/arch/i386/kernel/time_hpet.c
 ===
 --- linux-2.6.orig/arch/i386/kernel/time_hpet.c
 +++ linux-2.6/arch/i386/kernel/time_hpet.c
 @@ -51,7 +51,7 @@ static void hpet_writel(unsigned long d,
   */
  static void __devinit wait_hpet_tick(void)
  {
 -unsigned int start_cmp_val, end_cmp_val;
 +unsigned volatile int start_cmp_val, end_cmp_val;
  
  start_cmp_val = hpet_readl(HPET_T0_CMP);
  do {

When you examine the inlined functions involved, this looks an awful lot
like http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22278

Perhaps SUSE should fix their gcc instead of working around compiler
problems in the kernel?

Firstly, the fix for 22278 is included in gcc 4.1.0.

Secondly, I believe that this is a separate problem from bug 22278.
hpet_readl() is correctly using volatile internally, but its result is
being assigned to a pair of normal integers (not declared as volatile).
In the context of wait_hpet_tick, all the variables are unqualified so
gcc is allowed to optimize the comparison away.

The same problem may exist in other parts of arch/i386/kernel/time_hpet.c,
where the return value from hpet_readl() is assigned to a normal
variable.  Nothing in the C standard says that those unqualified
variables should be magically treated as volatile, just because the
original code that extracted the value used volatile.  IOW, time_hpet.c
needs to declare any variables that hold the result of hpet_readl() as
being volatile variables.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away

2006-11-28 Thread Keith Owens
David Miller (on Tue, 28 Nov 2006 20:04:53 -0800 (PST)) wrote:
From: Keith Owens kaos@ocs.com.au
Date: Wed, 29 Nov 2006 14:56:20 +1100

 Secondly, I believe that this is a separate problem from bug 22278.
 hpet_readl() is correctly using volatile internally, but its result is
 being assigned to a pair of normal integers (not declared as volatile).
 In the context of wait_hpet_tick, all the variables are unqualified so
 gcc is allowed to optimize the comparison away.
 
 The same problem may exist in other parts of arch/i386/kernel/time_hpet.c,
 where the return value from hpet_readl() is assigned to a normal
 variable.  Nothing in the C standard says that those unqualified
 variables should be magically treated as volatile, just because the
 original code that extracted the value used volatile.  IOW, time_hpet.c
 needs to declare any variables that hold the result of hpet_readl() as
 being volatile variables.

I disagree with this.

readl() returns values from an opaque source, and it is declared
as such to show this to GCC.  It's like a function that GCC
cannot see the implementation of, which it cannot determine
anything about wrt. return values.

The volatile'ness does not simply disappear the moment you
assign the result to some local variable which is not volatile.

Half of our drivers would break if this were true.

This is definitely a gcc bug, 4.1.0 is doing something weird.  Compile
with CONFIG_CC_OPTIMIZE_FOR_SIZE=n and the bug appears,
CONFIG_CC_OPTIMIZE_FOR_SIZE=y has no problem.

Compile with CONFIG_CC_OPTIMIZE_FOR_SIZE=n and _either_ of the patches
below and the problem disappears.

Index: linux/arch/i386/kernel/time_hpet.c
===
--- linux.orig/arch/i386/kernel/time_hpet.c 2006-11-29 13:51:33.900462088 
+1100
+++ linux/arch/i386/kernel/time_hpet.c  2006-11-29 15:25:47.853245938 +1100
@@ -35,7 +35,8 @@ static void __iomem * hpet_virt_address;
 
 int hpet_readl(unsigned long a)
 {
-   return readl(hpet_virt_address + a);
+   volatile int v = readl(hpet_virt_address + a);
+   return v;
 }
 
 static void hpet_writel(unsigned long d, unsigned long a)


Index: linux-2.6/arch/i386/kernel/time_hpet.c
===
--- linux-2.6.orig/arch/i386/kernel/time_hpet.c
+++ linux-2.6/arch/i386/kernel/time_hpet.c
@@ -51,7 +51,7 @@ static void hpet_writel(unsigned long d,
  */
 static void __devinit wait_hpet_tick(void)
 {
-   unsigned int start_cmp_val, end_cmp_val;
+   unsigned volatile int start_cmp_val, end_cmp_val;
 
start_cmp_val = hpet_readl(HPET_T0_CMP);
do {

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] Scheduler hooks to support separate ia64 MCA/INIT stacks

2005-09-08 Thread Keith Owens
The new ia64 MCA/INIT handlers[1] (think of them as super NMI) run on
separate stacks.  99% of the changes for these new handlers is ia64
only code, however they need a couple of scheduler hooks to support
these extra stacks.  The complete patch set will be coming through the
ia64 tree, this RFC covers just the scheduler changes, so they do not
come as a surprise when the ia64 tree is rolled up.

[1] http://marc.theaimsgroup.com/?l=linux-ia64=112537827113545=2
and the following patches.

IA64 MCA/INIT are completely asynchronous events.  They can be
delivered even when normal interrupts are disabled.  The cpu can be in
user context, in kernel context, in prom called from the kernel, the
cpu can even be in physical addressing mode with no valid stack
pointers when MCA/INIT is delivered.

Because of all the modes in which these events can occur, it is not
safe to use the normal kernel stacks, mainly because we cannot always
tell what state the kernel stacks are in when MCA/INIT is delivered.
The new MCA/INIT handlers define some additional per-cpu stacks.  When
MCA/INIT is delivered, the cpu starts using the relevant per-cpu stack,
effectively running as a new process.  If the original kernel stack can
be identified and verified then the process that was interrupted is
modified to look like a blocked task.

IA64 backtrace has two distinct starting points, a task can either be
running or it can be blocked.  The ia64 unwinder starts with two
different states, depending on the task state.  Since MCA/INIT
backtrace is done using one cpu to list all the tasks, there has to be
a way for the backtrace code to determine if a task is running on
another cpu or is blocked.  In 2.4 we used to have a flag to say that a
task was running on a cpu.  In 2.6, that data is stored in
cpu_curr(cpu), which needs to be exposed to support ia64 MCA/INIT
handlers.

The new MCA/INIT handlers are the equivalent of an asynchronous context
switch.  Because MCA/INIT can be delivered at any time, the handlers
cannot trust the state of any spinlock, MCA/INIT can occur when
spin_lock_irq has been issued, i.e. they can occur in the middle of
critical code.  Therefore it is not safe to call the normal scheduler
functions which update the runqueues.

This patch adds two small hooks that can be safely called from MCA/INIT
context.  If other architectures want to support NMI on separate stacks
then they can also use these functions.


Index: linux/include/linux/sched.h
===
- --- linux.orig/include/linux/sched.h  2005-09-08 16:47:05.668290545 +1000
+++ linux/include/linux/sched.h 2005-09-08 16:47:08.165015793 +1000
@@ -883,6 +883,8 @@ extern int task_curr(const task_t *p);
 extern int idle_cpu(int cpu);
 extern int sched_setscheduler(struct task_struct *, int, struct sched_param *);
 extern task_t *idle_task(int cpu);
+extern task_t *curr_task(int cpu);
+extern void set_curr_task(int cpu, task_t *p);
 
 void yield(void);
 
Index: linux/kernel/sched.c
===
- --- linux.orig/kernel/sched.c 2005-09-08 16:47:05.669266973 +1000
+++ linux/kernel/sched.c2005-09-09 11:36:53.017356186 +1000
@@ -3471,6 +3471,32 @@ task_t *idle_task(int cpu)
 }
 
 /**
+ * curr_task - return the current task for a given cpu.
+ * @cpu: the processor in question.
+ */
+task_t *curr_task(int cpu)
+{
+   return cpu_curr(cpu);
+}
+
+/**
+ * set_curr_task - set the current task for a given cpu.
+ * @cpu: the processor in question.
+ * @p: the task pointer to set.
+ *
+ * Description: This function must only be used when non-maskable interrupts
+ * are serviced on a separate stack.  It allows the architecture to switch the
+ * notion of the current task on a cpu in a non-blocking manner.  This function
+ * must be called with interrupts disabled, the caller must save the original
+ * value of the current task (see curr_task() above) and restore that value
+ * before reenabling interrupts.
+ */
+void set_curr_task(int cpu, task_t *p)
+{
+   cpu_curr(cpu) = p;
+}
+
+/**
  * find_process_by_pid - find a process with a matching PID value.
  * @pid: the pid in question.
  */




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] Scheduler hooks to support separate ia64 MCA/INIT stacks

2005-09-08 Thread Keith Owens
The new ia64 MCA/INIT handlers[1] (think of them as super NMI) run on
separate stacks.  99% of the changes for these new handlers is ia64
only code, however they need a couple of scheduler hooks to support
these extra stacks.  The complete patch set will be coming through the
ia64 tree, this RFC covers just the scheduler changes, so they do not
come as a surprise when the ia64 tree is rolled up.

[1] http://marc.theaimsgroup.com/?l=linux-ia64m=112537827113545w=2
and the following patches.

IA64 MCA/INIT are completely asynchronous events.  They can be
delivered even when normal interrupts are disabled.  The cpu can be in
user context, in kernel context, in prom called from the kernel, the
cpu can even be in physical addressing mode with no valid stack
pointers when MCA/INIT is delivered.

Because of all the modes in which these events can occur, it is not
safe to use the normal kernel stacks, mainly because we cannot always
tell what state the kernel stacks are in when MCA/INIT is delivered.
The new MCA/INIT handlers define some additional per-cpu stacks.  When
MCA/INIT is delivered, the cpu starts using the relevant per-cpu stack,
effectively running as a new process.  If the original kernel stack can
be identified and verified then the process that was interrupted is
modified to look like a blocked task.

IA64 backtrace has two distinct starting points, a task can either be
running or it can be blocked.  The ia64 unwinder starts with two
different states, depending on the task state.  Since MCA/INIT
backtrace is done using one cpu to list all the tasks, there has to be
a way for the backtrace code to determine if a task is running on
another cpu or is blocked.  In 2.4 we used to have a flag to say that a
task was running on a cpu.  In 2.6, that data is stored in
cpu_curr(cpu), which needs to be exposed to support ia64 MCA/INIT
handlers.

The new MCA/INIT handlers are the equivalent of an asynchronous context
switch.  Because MCA/INIT can be delivered at any time, the handlers
cannot trust the state of any spinlock, MCA/INIT can occur when
spin_lock_irq has been issued, i.e. they can occur in the middle of
critical code.  Therefore it is not safe to call the normal scheduler
functions which update the runqueues.

This patch adds two small hooks that can be safely called from MCA/INIT
context.  If other architectures want to support NMI on separate stacks
then they can also use these functions.


Index: linux/include/linux/sched.h
===
- --- linux.orig/include/linux/sched.h  2005-09-08 16:47:05.668290545 +1000
+++ linux/include/linux/sched.h 2005-09-08 16:47:08.165015793 +1000
@@ -883,6 +883,8 @@ extern int task_curr(const task_t *p);
 extern int idle_cpu(int cpu);
 extern int sched_setscheduler(struct task_struct *, int, struct sched_param *);
 extern task_t *idle_task(int cpu);
+extern task_t *curr_task(int cpu);
+extern void set_curr_task(int cpu, task_t *p);
 
 void yield(void);
 
Index: linux/kernel/sched.c
===
- --- linux.orig/kernel/sched.c 2005-09-08 16:47:05.669266973 +1000
+++ linux/kernel/sched.c2005-09-09 11:36:53.017356186 +1000
@@ -3471,6 +3471,32 @@ task_t *idle_task(int cpu)
 }
 
 /**
+ * curr_task - return the current task for a given cpu.
+ * @cpu: the processor in question.
+ */
+task_t *curr_task(int cpu)
+{
+   return cpu_curr(cpu);
+}
+
+/**
+ * set_curr_task - set the current task for a given cpu.
+ * @cpu: the processor in question.
+ * @p: the task pointer to set.
+ *
+ * Description: This function must only be used when non-maskable interrupts
+ * are serviced on a separate stack.  It allows the architecture to switch the
+ * notion of the current task on a cpu in a non-blocking manner.  This function
+ * must be called with interrupts disabled, the caller must save the original
+ * value of the current task (see curr_task() above) and restore that value
+ * before reenabling interrupts.
+ */
+void set_curr_task(int cpu, task_t *p)
+{
+   cpu_curr(cpu) = p;
+}
+
+/**
  * find_process_by_pid - find a process with a matching PID value.
  * @pid: the pid in question.
  */




-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 16/16] Add hardware breakpoint support for i386

2005-08-29 Thread Keith Owens
On Mon, 29 Aug 2005 09:12:08 -0700, 
Tom Rini <[EMAIL PROTECTED]> wrote:
>
>This adds hardware breakpoint support for i386.  This is not as well tested as
>software breakpoints, but in some minimal testing appears to be functional.

Hardware breakpoints must be per cpu, not global.  Also you will fall
over applications that are using gdb, because gdb uses the same
registers.  KDB has never really supported kernel hardware breakpoints,
they are hard to do without stamping on user space.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 16/16] Add hardware breakpoint support for i386

2005-08-29 Thread Keith Owens
On Mon, 29 Aug 2005 09:12:08 -0700, 
Tom Rini [EMAIL PROTECTED] wrote:

This adds hardware breakpoint support for i386.  This is not as well tested as
software breakpoints, but in some minimal testing appears to be functional.

Hardware breakpoints must be per cpu, not global.  Also you will fall
over applications that are using gdb, because gdb uses the same
registers.  KDB has never really supported kernel hardware breakpoints,
they are hard to do without stamping on user space.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Announce: kdb v4.4 is available for kernel 2.6.13

2005-08-28 Thread Keith Owens
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

KDB (Linux Kernel Debugger) has been updated.

ftp://oss.sgi.com/projects/kdb/download/v4.4/
ftp://ftp.ocs.com.au/pub/mirrors/oss.sgi.com/projects/kdb/download/v4.4/

Note:  Due to a spam attack, the kdb@oss.sgi.com mailing list is now
subscriber only.  If you reply to this mail, you may wish to trim
kdb@oss.sgi.com from the cc: list.

Current versions are :-

  kdb-v4.4-2.6.13-common-1.bz2
  kdb-v4.4-2.6.13-i386-1.bz2
  kdb-v4.4-2.6.13-ia64-1.bz2
  kdb-v4.4-2.6.11-x86-64-2.bz2 (may or may not work with 2.6.13).


Changelog extract since kdb-v4.4-2.6.12-common-1.

2005-08-29 Keith Owens  <[EMAIL PROTECTED]>

* kdb v4.4-2.6.13-common-1.

2005-08-24 Keith Owens  <[EMAIL PROTECTED]>

* kdb v4.4-2.6.13-rc7-common-1.

2005-08-08 Keith Owens  <[EMAIL PROTECTED]>

* kdb v4.4-2.6.13-rc6-common-1.

2005-08-02 Keith Owens  <[EMAIL PROTECTED]>

* Print more fields from filp, dentry.
* Add kdb=on-nokey to suppress kdb entry from the keyboard.
* kdb v4.4-2.6.13-rc5-common-1.

2005-07-30 Keith Owens  <[EMAIL PROTECTED]>

* kdb v4.4-2.6.13-rc4-common-1.

2005-07-26 Keith Owens  <[EMAIL PROTECTED]>

* Fix compile problem with CONFIG_USB_KBD.
* kdb v4.4-2.6.13-rc3-common-3.

2005-07-22 Keith Owens  <[EMAIL PROTECTED]>

* The asmlinkage kdb() patch was lost during packaging.  Reinstate it.
* kdb v4.4-2.6.13-rc3-common-2.

2005-07-19 Keith Owens  <[EMAIL PROTECTED]>

* Add support for USB keyboard (OHCI only).  Aaron Young, SGI.
* kdb v4.4-2.6.13-rc3-common-1.

2005-07-08 Keith Owens  <[EMAIL PROTECTED]>

    * kdb v4.4-2.6.13-rc2-common-1.

2005-07-01 Keith Owens  <[EMAIL PROTECTED]>

* Make kdb() asmlinkage to avoid problems with CONFIG_REGPARM.
* Change some uses of smp_processor_id() to be preempt safe.
* Use DEFINE_SPINLOCK().
* kdb v4.4-2.6.13-rc1-common-1.


Changelog extract since kdb-v4.4-2.6.12-i386-1.

2005-08-29 Keith Owens  <[EMAIL PROTECTED]>

* kdb v4.4-2.6.13-i386-1.

2005-08-24 Keith Owens  <[EMAIL PROTECTED]>

* kdb v4.4-2.6.13-rc7-i386-1.

2005-08-08 Keith Owens  <[EMAIL PROTECTED]>

* kdb v4.4-2.6.13-rc6-i386-1.

2005-08-02 Keith Owens  <[EMAIL PROTECTED]>

* kdb v4.4-2.6.13-rc5-i386-1.

2005-07-30 Keith Owens  <[EMAIL PROTECTED]>

* kdb v4.4-2.6.13-rc4-i386-1.

2005-07-22 Keith Owens  <[EMAIL PROTECTED]>

* Compile fix for kprobes.
* kdb v4.4-2.6.13-rc3-i386-2.

2005-07-19 Keith Owens  <[EMAIL PROTECTED]>

* Add support for USB keyboard (OHCI only).  Aaron Young, SGI.
* kdb v4.4-2.6.13-rc3-i386-1.

2005-07-08 Keith Owens  <[EMAIL PROTECTED]>

* kdb v4.4-2.6.13-rc2-i386-1.

2005-07-01 Keith Owens  <[EMAIL PROTECTED]>

* kdb v4.4-2.6.13-rc1-i386-1.

2005-06-19 Keith Owens  <[EMAIL PROTECTED]>

* gcc 4 compile fix, remove extern kdb_hardbreaks.  Steve Lord.
* kdb v4.4-2.6.12-i386-2.


Changelog extract since kdb v4.4-2.6.12-ia64-1.

2005-08-29 Keith Owens  <[EMAIL PROTECTED]>

* kdb v4.4-2.6.13-ia64-1.

2005-08-24 Keith Owens  <[EMAIL PROTECTED]>

* kdb v4.4-2.6.13-rc7-ia64-1.

2005-08-08 Keith Owens  <[EMAIL PROTECTED]>

* Add minstate command.
* kdb v4.4-2.6.13-rc6-ia64-1.

2005-08-02 Keith Owens  <[EMAIL PROTECTED]>

* Replace hard coded kdb declarations with #include .
    * kdb v4.4-2.6.13-rc5-ia64-1.

2005-07-30 Keith Owens  <[EMAIL PROTECTED]>

* kdb v4.4-2.6.13-rc4-ia64-1.

2005-07-22 Keith Owens  <[EMAIL PROTECTED]>

* Handle INIT delivered while in physical mode.
* kdb v4.4-2.6.13-rc3-ia64-2.

2005-07-19 Keith Owens  <[EMAIL PROTECTED]>

* Add support for USB keyboard (OHCI only).  Aaron Young, SGI.
* kdb v4.4-2.6.13-rc3-ia64-1.

2005-07-08 Keith Owens  <[EMAIL PROTECTED]>

* kdb v4.4-2.6.13-rc2-ia64-1.

2005-07-01 Keith Owens  <[EMAIL PROTECTED]>

* kdb v4.4-2.6.13-rc1-ia64-1.

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Exmh version 2.1.1 10/15/1999

iD8DBQFDEm8Si4UHNye0ZOoRAuu5AKCFIqHBB2+F9ttZlbKs4nObW+88oQCfT4IE
jA9tECuXxeB+Rwd7Giqkj4k=
=SnMr
-END PGP SIGNATURE-

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Announce: kdb v4.4 is available for kernel 2.6.13

2005-08-28 Thread Keith Owens
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

KDB (Linux Kernel Debugger) has been updated.

ftp://oss.sgi.com/projects/kdb/download/v4.4/
ftp://ftp.ocs.com.au/pub/mirrors/oss.sgi.com/projects/kdb/download/v4.4/

Note:  Due to a spam attack, the kdb@oss.sgi.com mailing list is now
subscriber only.  If you reply to this mail, you may wish to trim
kdb@oss.sgi.com from the cc: list.

Current versions are :-

  kdb-v4.4-2.6.13-common-1.bz2
  kdb-v4.4-2.6.13-i386-1.bz2
  kdb-v4.4-2.6.13-ia64-1.bz2
  kdb-v4.4-2.6.11-x86-64-2.bz2 (may or may not work with 2.6.13).


Changelog extract since kdb-v4.4-2.6.12-common-1.

2005-08-29 Keith Owens  [EMAIL PROTECTED]

* kdb v4.4-2.6.13-common-1.

2005-08-24 Keith Owens  [EMAIL PROTECTED]

* kdb v4.4-2.6.13-rc7-common-1.

2005-08-08 Keith Owens  [EMAIL PROTECTED]

* kdb v4.4-2.6.13-rc6-common-1.

2005-08-02 Keith Owens  [EMAIL PROTECTED]

* Print more fields from filp, dentry.
* Add kdb=on-nokey to suppress kdb entry from the keyboard.
* kdb v4.4-2.6.13-rc5-common-1.

2005-07-30 Keith Owens  [EMAIL PROTECTED]

* kdb v4.4-2.6.13-rc4-common-1.

2005-07-26 Keith Owens  [EMAIL PROTECTED]

* Fix compile problem with CONFIG_USB_KBD.
* kdb v4.4-2.6.13-rc3-common-3.

2005-07-22 Keith Owens  [EMAIL PROTECTED]

* The asmlinkage kdb() patch was lost during packaging.  Reinstate it.
* kdb v4.4-2.6.13-rc3-common-2.

2005-07-19 Keith Owens  [EMAIL PROTECTED]

* Add support for USB keyboard (OHCI only).  Aaron Young, SGI.
* kdb v4.4-2.6.13-rc3-common-1.

2005-07-08 Keith Owens  [EMAIL PROTECTED]

* kdb v4.4-2.6.13-rc2-common-1.

2005-07-01 Keith Owens  [EMAIL PROTECTED]

* Make kdb() asmlinkage to avoid problems with CONFIG_REGPARM.
* Change some uses of smp_processor_id() to be preempt safe.
* Use DEFINE_SPINLOCK().
* kdb v4.4-2.6.13-rc1-common-1.


Changelog extract since kdb-v4.4-2.6.12-i386-1.

2005-08-29 Keith Owens  [EMAIL PROTECTED]

* kdb v4.4-2.6.13-i386-1.

2005-08-24 Keith Owens  [EMAIL PROTECTED]

* kdb v4.4-2.6.13-rc7-i386-1.

2005-08-08 Keith Owens  [EMAIL PROTECTED]

* kdb v4.4-2.6.13-rc6-i386-1.

2005-08-02 Keith Owens  [EMAIL PROTECTED]

* kdb v4.4-2.6.13-rc5-i386-1.

2005-07-30 Keith Owens  [EMAIL PROTECTED]

* kdb v4.4-2.6.13-rc4-i386-1.

2005-07-22 Keith Owens  [EMAIL PROTECTED]

* Compile fix for kprobes.
* kdb v4.4-2.6.13-rc3-i386-2.

2005-07-19 Keith Owens  [EMAIL PROTECTED]

* Add support for USB keyboard (OHCI only).  Aaron Young, SGI.
* kdb v4.4-2.6.13-rc3-i386-1.

2005-07-08 Keith Owens  [EMAIL PROTECTED]

* kdb v4.4-2.6.13-rc2-i386-1.

2005-07-01 Keith Owens  [EMAIL PROTECTED]

* kdb v4.4-2.6.13-rc1-i386-1.

2005-06-19 Keith Owens  [EMAIL PROTECTED]

* gcc 4 compile fix, remove extern kdb_hardbreaks.  Steve Lord.
* kdb v4.4-2.6.12-i386-2.


Changelog extract since kdb v4.4-2.6.12-ia64-1.

2005-08-29 Keith Owens  [EMAIL PROTECTED]

* kdb v4.4-2.6.13-ia64-1.

2005-08-24 Keith Owens  [EMAIL PROTECTED]

* kdb v4.4-2.6.13-rc7-ia64-1.

2005-08-08 Keith Owens  [EMAIL PROTECTED]

* Add minstate command.
* kdb v4.4-2.6.13-rc6-ia64-1.

2005-08-02 Keith Owens  [EMAIL PROTECTED]

* Replace hard coded kdb declarations with #include asm/sections.
* kdb v4.4-2.6.13-rc5-ia64-1.

2005-07-30 Keith Owens  [EMAIL PROTECTED]

* kdb v4.4-2.6.13-rc4-ia64-1.

2005-07-22 Keith Owens  [EMAIL PROTECTED]

* Handle INIT delivered while in physical mode.
* kdb v4.4-2.6.13-rc3-ia64-2.

2005-07-19 Keith Owens  [EMAIL PROTECTED]

* Add support for USB keyboard (OHCI only).  Aaron Young, SGI.
* kdb v4.4-2.6.13-rc3-ia64-1.

2005-07-08 Keith Owens  [EMAIL PROTECTED]

* kdb v4.4-2.6.13-rc2-ia64-1.

2005-07-01 Keith Owens  [EMAIL PROTECTED]

* kdb v4.4-2.6.13-rc1-ia64-1.

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Exmh version 2.1.1 10/15/1999

iD8DBQFDEm8Si4UHNye0ZOoRAuu5AKCFIqHBB2+F9ttZlbKs4nObW+88oQCfT4IE
jA9tECuXxeB+Rwd7Giqkj4k=
=SnMr
-END PGP SIGNATURE-

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc7 qla2xxx unaligned accesses

2005-08-24 Thread Keith Owens
On Wed, 24 Aug 2005 11:22:52 -0700, 
Andrew Vasquez <[EMAIL PROTECTED]> wrote:
>On Wed, 24 Aug 2005, Keith Owens wrote:
>
>> 2.6.13-rc7 + kdb on ia64.  The qla2xxx drivers are getting unaligned
>> accesses at startup.
>> 
>> qla2300 :01:02.0: Found an ISP2312, irq 66, iobase 0xc0080f30
>> qla2300 :01:02.0: Configuring PCI space...
>> PCI: slot :01:02.0 has incorrect PCI cache line size of 0 bytes, 
>> correcting to 128
>> qla2300 :01:02.0: Configure NVRAM parameters...
>> qla2300 :01:02.0: Verifying loaded RISC code...
>> qla2300 :01:02.0: Waiting for LIP to complete...
>> qla2300 :01:02.0: Cable is unplugged...
>> scsi1 : qla2xxx
>> kernel unaligned access to 0xe0300667800c, ip=0xa001005cd0b1
>
>Yes, I have a fix for this in my patch-queue.  I'll attach it here for
>reference.  I'll forward onto linux-scsi post 2.6.13.
>
>--
>av
>
>---
>
>On some platforms the hard-casting of the 8 byte node_name
>and port_name arrays to an u64 would cause unaligned-access
>warnings.  Generalize the conversions with consistent
>shifting of WWN bytes.
>
>Signed-off-by: Andrew Vasquez <[EMAIL PROTECTED]>
>---
>
> drivers/scsi/qla2xxx/qla_attr.c |   27 +--
> 1 files changed, 17 insertions(+), 10 deletions(-)
>
>24e16c86578498fd71a3e33bebbd8be7323a03c6
>diff --git a/drivers/scsi/qla2xxx/qla_attr.c b/drivers/scsi/qla2xxx/qla_attr.c
>--- a/drivers/scsi/qla2xxx/qla_attr.c
>+++ b/drivers/scsi/qla2xxx/qla_attr.c
>@@ -345,6 +345,15 @@ struct class_device_attribute *qla2x00_h
> 
> /* Host attributes. */
> 
>+static u64
>+wwn_to_u64(uint8_t *wwn)
>+{
>+  return (u64)wwn[0] << 56 | (u64)wwn[1] << 48 |
>+  (u64)wwn[2] << 40 | (u64)wwn[3] << 32 |
>+  (u64)wwn[4] << 24 | (u64)wwn[5] << 16 |
>+  (u64)wwn[6] <<  8 | (u64)wwn[7];
>+}
>+

Any reason you defined your own function instead of using the standard
get_unaligned()?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.13-rc7 qla2xxx unaligned accesses

2005-08-24 Thread Keith Owens
2.6.13-rc7 + kdb on ia64.  The qla2xxx drivers are getting unaligned
accesses at startup.

qla2300 :01:02.0: Found an ISP2312, irq 66, iobase 0xc0080f30
qla2300 :01:02.0: Configuring PCI space...
PCI: slot :01:02.0 has incorrect PCI cache line size of 0 bytes, correcting 
to 128
qla2300 :01:02.0: Configure NVRAM parameters...
qla2300 :01:02.0: Verifying loaded RISC code...
qla2300 :01:02.0: Waiting for LIP to complete...
qla2300 :01:02.0: Cable is unplugged...
scsi1 : qla2xxx
kernel unaligned access to 0xe0300667800c, ip=0xa001005cd0b1
qla2300 :01:02.0:
 QLogic Fibre Channel HBA Driver: 8.01.00b5-k
  QLogic QLA2342 -
  ISP2312: PCI (66 MHz) @ :01:02.0 hdma+, host#=1, fw=3.03.15 IPX
ACPI: PCI Interrupt :01:02.1[B]: no GSI
qla2300 :01:02.1: Found an ISP2312, irq 67, iobase 0xc0080f301000
qla2300 :01:02.1: Configuring PCI space...
PCI: slot :01:02.1 has incorrect PCI cache line size of 0 bytes, correcting 
to 128
qla2300 :01:02.1: Configure NVRAM parameters...
qla2300 :01:02.1: Verifying loaded RISC code...
qla2300 :01:02.1: Waiting for LIP to complete...
qla2300 :01:02.1: Cable is unplugged...
scsi2 : qla2xxx
kernel unaligned access to 0xe030066a400c, ip=0xa001005cd0b1
qla2300 :01:02.1:
 QLogic Fibre Channel HBA Driver: 8.01.00b5-k
  QLogic QLA2342 -
  ISP2312: PCI (66 MHz) @ :01:02.1 hdma+, host#=2, fw=3.03.15 IPX
ACPI: PCI Interrupt :02:02.0[A]: no GSI
qla2300 :02:02.0: Found an ISP2312, irq 63, iobase 0xc0080fa4
qla2300 :02:02.0: Configuring PCI space...
qla2300 :02:02.0: Configure NVRAM parameters...
qla2300 :02:02.0: Verifying loaded RISC code...
qla2300 :02:02.0: Waiting for LIP to complete...
qla2300 :02:02.0: Cable is unplugged...
scsi3 : qla2xxx
kernel unaligned access to 0xe030066c000c, ip=0xa001005cd0b1
qla2300 :02:02.0:
 QLogic Fibre Channel HBA Driver: 8.01.00b5-k
  QLogic QLA2342 -
  ISP2312: PCI-X (133 MHz) @ :02:02.0 hdma+, host#=3, fw=3.03.15 IPX
ACPI: PCI Interrupt :02:02.1[B]: no GSI
qla2300 :02:02.1: Found an ISP2312, irq 64, iobase 0xc0080fa41000
qla2300 :02:02.1: Configuring PCI space...
qla2300 :02:02.1: Configure NVRAM parameters...
qla2300 :02:02.1: Verifying loaded RISC code...
qla2300 :02:02.1: Waiting for LIP to complete...
qla2300 :02:02.1: Cable is unplugged...
scsi4 : qla2xxx
kernel unaligned access to 0xe030066d000c, ip=0xa001005cd0b1

0xa001005cd0b1 is qla2x00_init_host_attr+0x71.

void
qla2x00_init_host_attr(scsi_qla_host_t *ha)
{
fc_host_node_name(ha->host) =
be64_to_cpu(*(uint64_t *)ha->init_cb->node_name);
fc_host_port_name(ha->host) =
be64_to_cpu(*(uint64_t *)ha->init_cb->port_name);
}

It looks like ha->init_cb->port_name.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 2.6.13-rc7] Export pcibios_bus_to_resource

2005-08-24 Thread Keith Owens
pcibios_bus_to_resource is exported on all architectures except ia64
and sparc.  Add exports for the two missing architectures.  Needed when
Yenta socket support is compiled as a module.

Signed-off-by: Keith Owens <[EMAIL PROTECTED]>

Index: linux/arch/ia64/pci/pci.c
===
--- linux.orig/arch/ia64/pci/pci.c  2005-08-08 21:57:47.415210784 +1000
+++ linux/arch/ia64/pci/pci.c   2005-08-10 22:08:01.218842356 +1000
@@ -380,6 +380,7 @@ void pcibios_bus_to_resource(struct pci_
res->start = region->start + offset;
res->end = region->end + offset;
 }
+EXPORT_SYMBOL(pcibios_bus_to_resource);
 
 static int __devinit is_valid_resource(struct pci_dev *dev, int idx)
 {
Index: linux/arch/sparc64/kernel/pci.c
===
--- linux.orig/arch/sparc64/kernel/pci.c2005-08-10 13:57:47.295579310 
+1000
+++ linux/arch/sparc64/kernel/pci.c 2005-08-10 22:09:23.573376709 +1000
@@ -540,6 +540,7 @@ void pcibios_bus_to_resource(struct pci_
 
pbm->parent->resource_adjust(pdev, res, root);
 }
+EXPORT_SYMBOL(pcibios_bus_to_resource);
 
 char * __init pcibios_setup(char *str)
 {

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc7 qla2xxx unaligned accesses

2005-08-24 Thread Keith Owens
On Wed, 24 Aug 2005 11:22:52 -0700, 
Andrew Vasquez [EMAIL PROTECTED] wrote:
On Wed, 24 Aug 2005, Keith Owens wrote:

 2.6.13-rc7 + kdb on ia64.  The qla2xxx drivers are getting unaligned
 accesses at startup.
 
 qla2300 :01:02.0: Found an ISP2312, irq 66, iobase 0xc0080f30
 qla2300 :01:02.0: Configuring PCI space...
 PCI: slot :01:02.0 has incorrect PCI cache line size of 0 bytes, 
 correcting to 128
 qla2300 :01:02.0: Configure NVRAM parameters...
 qla2300 :01:02.0: Verifying loaded RISC code...
 qla2300 :01:02.0: Waiting for LIP to complete...
 qla2300 :01:02.0: Cable is unplugged...
 scsi1 : qla2xxx
 kernel unaligned access to 0xe0300667800c, ip=0xa001005cd0b1

Yes, I have a fix for this in my patch-queue.  I'll attach it here for
reference.  I'll forward onto linux-scsi post 2.6.13.

--
av

---

On some platforms the hard-casting of the 8 byte node_name
and port_name arrays to an u64 would cause unaligned-access
warnings.  Generalize the conversions with consistent
shifting of WWN bytes.

Signed-off-by: Andrew Vasquez [EMAIL PROTECTED]
---

 drivers/scsi/qla2xxx/qla_attr.c |   27 +--
 1 files changed, 17 insertions(+), 10 deletions(-)

24e16c86578498fd71a3e33bebbd8be7323a03c6
diff --git a/drivers/scsi/qla2xxx/qla_attr.c b/drivers/scsi/qla2xxx/qla_attr.c
--- a/drivers/scsi/qla2xxx/qla_attr.c
+++ b/drivers/scsi/qla2xxx/qla_attr.c
@@ -345,6 +345,15 @@ struct class_device_attribute *qla2x00_h
 
 /* Host attributes. */
 
+static u64
+wwn_to_u64(uint8_t *wwn)
+{
+  return (u64)wwn[0]  56 | (u64)wwn[1]  48 |
+  (u64)wwn[2]  40 | (u64)wwn[3]  32 |
+  (u64)wwn[4]  24 | (u64)wwn[5]  16 |
+  (u64)wwn[6]   8 | (u64)wwn[7];
+}
+

Any reason you defined your own function instead of using the standard
get_unaligned()?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 2.6.13-rc7] Export pcibios_bus_to_resource

2005-08-24 Thread Keith Owens
pcibios_bus_to_resource is exported on all architectures except ia64
and sparc.  Add exports for the two missing architectures.  Needed when
Yenta socket support is compiled as a module.

Signed-off-by: Keith Owens [EMAIL PROTECTED]

Index: linux/arch/ia64/pci/pci.c
===
--- linux.orig/arch/ia64/pci/pci.c  2005-08-08 21:57:47.415210784 +1000
+++ linux/arch/ia64/pci/pci.c   2005-08-10 22:08:01.218842356 +1000
@@ -380,6 +380,7 @@ void pcibios_bus_to_resource(struct pci_
res-start = region-start + offset;
res-end = region-end + offset;
 }
+EXPORT_SYMBOL(pcibios_bus_to_resource);
 
 static int __devinit is_valid_resource(struct pci_dev *dev, int idx)
 {
Index: linux/arch/sparc64/kernel/pci.c
===
--- linux.orig/arch/sparc64/kernel/pci.c2005-08-10 13:57:47.295579310 
+1000
+++ linux/arch/sparc64/kernel/pci.c 2005-08-10 22:09:23.573376709 +1000
@@ -540,6 +540,7 @@ void pcibios_bus_to_resource(struct pci_
 
pbm-parent-resource_adjust(pdev, res, root);
 }
+EXPORT_SYMBOL(pcibios_bus_to_resource);
 
 char * __init pcibios_setup(char *str)
 {

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.13-rc7 qla2xxx unaligned accesses

2005-08-24 Thread Keith Owens
2.6.13-rc7 + kdb on ia64.  The qla2xxx drivers are getting unaligned
accesses at startup.

qla2300 :01:02.0: Found an ISP2312, irq 66, iobase 0xc0080f30
qla2300 :01:02.0: Configuring PCI space...
PCI: slot :01:02.0 has incorrect PCI cache line size of 0 bytes, correcting 
to 128
qla2300 :01:02.0: Configure NVRAM parameters...
qla2300 :01:02.0: Verifying loaded RISC code...
qla2300 :01:02.0: Waiting for LIP to complete...
qla2300 :01:02.0: Cable is unplugged...
scsi1 : qla2xxx
kernel unaligned access to 0xe0300667800c, ip=0xa001005cd0b1
qla2300 :01:02.0:
 QLogic Fibre Channel HBA Driver: 8.01.00b5-k
  QLogic QLA2342 -
  ISP2312: PCI (66 MHz) @ :01:02.0 hdma+, host#=1, fw=3.03.15 IPX
ACPI: PCI Interrupt :01:02.1[B]: no GSI
qla2300 :01:02.1: Found an ISP2312, irq 67, iobase 0xc0080f301000
qla2300 :01:02.1: Configuring PCI space...
PCI: slot :01:02.1 has incorrect PCI cache line size of 0 bytes, correcting 
to 128
qla2300 :01:02.1: Configure NVRAM parameters...
qla2300 :01:02.1: Verifying loaded RISC code...
qla2300 :01:02.1: Waiting for LIP to complete...
qla2300 :01:02.1: Cable is unplugged...
scsi2 : qla2xxx
kernel unaligned access to 0xe030066a400c, ip=0xa001005cd0b1
qla2300 :01:02.1:
 QLogic Fibre Channel HBA Driver: 8.01.00b5-k
  QLogic QLA2342 -
  ISP2312: PCI (66 MHz) @ :01:02.1 hdma+, host#=2, fw=3.03.15 IPX
ACPI: PCI Interrupt :02:02.0[A]: no GSI
qla2300 :02:02.0: Found an ISP2312, irq 63, iobase 0xc0080fa4
qla2300 :02:02.0: Configuring PCI space...
qla2300 :02:02.0: Configure NVRAM parameters...
qla2300 :02:02.0: Verifying loaded RISC code...
qla2300 :02:02.0: Waiting for LIP to complete...
qla2300 :02:02.0: Cable is unplugged...
scsi3 : qla2xxx
kernel unaligned access to 0xe030066c000c, ip=0xa001005cd0b1
qla2300 :02:02.0:
 QLogic Fibre Channel HBA Driver: 8.01.00b5-k
  QLogic QLA2342 -
  ISP2312: PCI-X (133 MHz) @ :02:02.0 hdma+, host#=3, fw=3.03.15 IPX
ACPI: PCI Interrupt :02:02.1[B]: no GSI
qla2300 :02:02.1: Found an ISP2312, irq 64, iobase 0xc0080fa41000
qla2300 :02:02.1: Configuring PCI space...
qla2300 :02:02.1: Configure NVRAM parameters...
qla2300 :02:02.1: Verifying loaded RISC code...
qla2300 :02:02.1: Waiting for LIP to complete...
qla2300 :02:02.1: Cable is unplugged...
scsi4 : qla2xxx
kernel unaligned access to 0xe030066d000c, ip=0xa001005cd0b1

0xa001005cd0b1 is qla2x00_init_host_attr+0x71.

void
qla2x00_init_host_attr(scsi_qla_host_t *ha)
{
fc_host_node_name(ha-host) =
be64_to_cpu(*(uint64_t *)ha-init_cb-node_name);
fc_host_port_name(ha-host) =
be64_to_cpu(*(uint64_t *)ha-init_cb-port_name);
}

It looks like ha-init_cb-port_name.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 2.6.13-rc6] Export pcibios_bus_to_resource on ia64 and sparc64

2005-08-10 Thread Keith Owens
IA64 gets *** Warning: "pcibios_bus_to_resource"
[drivers/pcmcia/yenta_socket.ko] undefined!.  Trivial fix, export
pcibios_bus_to_resource.  Also export it on sparc64, which is the only
other architecture that defines pcibios_bus_to_resource but does not
export it.

Signed-off-by: Keith Owens <[EMAIL PROTECTED]>

Index: linux/arch/ia64/pci/pci.c
===
--- linux.orig/arch/ia64/pci/pci.c  2005-08-08 21:57:47.415210784 +1000
+++ linux/arch/ia64/pci/pci.c   2005-08-10 22:08:01.218842356 +1000
@@ -380,6 +380,7 @@ void pcibios_bus_to_resource(struct pci_
res->start = region->start + offset;
res->end = region->end + offset;
 }
+EXPORT_SYMBOL(pcibios_bus_to_resource);
 
 static int __devinit is_valid_resource(struct pci_dev *dev, int idx)
 {
Index: linux/arch/sparc64/kernel/pci.c
===
--- linux.orig/arch/sparc64/kernel/pci.c2005-08-10 13:57:47.295579310 
+1000
+++ linux/arch/sparc64/kernel/pci.c 2005-08-10 22:09:23.573376709 +1000
@@ -540,6 +540,7 @@ void pcibios_bus_to_resource(struct pci_
 
pbm->parent->resource_adjust(pdev, res, root);
 }
+EXPORT_SYMBOL(pcibios_bus_to_resource);
 
 char * __init pcibios_setup(char *str)
 {

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc4 use after free in class_device_attr_show

2005-08-10 Thread Keith Owens
FYI, the intermittent free after use in sysfs is still there in
2.6.13-rc6.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc4 use after free in class_device_attr_show

2005-08-10 Thread Keith Owens
FYI, the intermittent free after use in sysfs is still there in
2.6.13-rc6.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 2.6.13-rc6] Export pcibios_bus_to_resource on ia64 and sparc64

2005-08-10 Thread Keith Owens
IA64 gets *** Warning: pcibios_bus_to_resource
[drivers/pcmcia/yenta_socket.ko] undefined!.  Trivial fix, export
pcibios_bus_to_resource.  Also export it on sparc64, which is the only
other architecture that defines pcibios_bus_to_resource but does not
export it.

Signed-off-by: Keith Owens [EMAIL PROTECTED]

Index: linux/arch/ia64/pci/pci.c
===
--- linux.orig/arch/ia64/pci/pci.c  2005-08-08 21:57:47.415210784 +1000
+++ linux/arch/ia64/pci/pci.c   2005-08-10 22:08:01.218842356 +1000
@@ -380,6 +380,7 @@ void pcibios_bus_to_resource(struct pci_
res-start = region-start + offset;
res-end = region-end + offset;
 }
+EXPORT_SYMBOL(pcibios_bus_to_resource);
 
 static int __devinit is_valid_resource(struct pci_dev *dev, int idx)
 {
Index: linux/arch/sparc64/kernel/pci.c
===
--- linux.orig/arch/sparc64/kernel/pci.c2005-08-10 13:57:47.295579310 
+1000
+++ linux/arch/sparc64/kernel/pci.c 2005-08-10 22:09:23.573376709 +1000
@@ -540,6 +540,7 @@ void pcibios_bus_to_resource(struct pci_
 
pbm-parent-resource_adjust(pdev, res, root);
 }
+EXPORT_SYMBOL(pcibios_bus_to_resource);
 
 char * __init pcibios_setup(char *str)
 {

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/1] x86_64: Rename KDB_VECTOR to DEBUGGER_VECTOR

2005-08-08 Thread Keith Owens
On Tue, 9 Aug 2005 01:16:37 +0200, 
Andi Kleen <[EMAIL PROTECTED]> wrote:
>On Tue, Aug 09, 2005 at 09:14:52AM +1000, Keith Owens wrote:
>> On Mon, 8 Aug 2005 21:28:50 +0200, 
>> Andi Kleen <[EMAIL PROTECTED]> wrote:
>> >On Mon, Aug 08, 2005 at 12:27:10PM -0700, Tom Rini wrote:
>> >>  {
>> >>   unsigned int icr =  APIC_DM_FIXED | shortcut | vector | dest;
>> >> - if (vector == KDB_VECTOR)
>> >> + if (vector == NMI_VECTOR)
>> >>   icr = (icr & (~APIC_VECTOR_MASK)) | APIC_DM_NMI;
>> >
>> >That if () should be removed since it's useless.
>> >Can you do that please?
>> 
>> Why is 'if ()' useless?  Remove the if test and all ipis get sent as
>> NMI, we definitely do not want that.
>
>The if () with its following line. The same result can be gotten
>by passing suitable arguments.

Arguments to what?  The path for sending the NMI_VECTOR is
send_IPI_allbutself -> {cluster,flat,physflat}_send_IPI_allbutself ->
{__send_IPI_shortcut, physflat_send_IPI_mask, cluster_send_IPI_mask} ->
send_IPI_mask_sequence -> __prepare_ICR.

Pushing the check for NMI_VECTOR any higher than __prepare_ICR needs
multiple tests for NMI_VECTOR.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/1] x86_64: Rename KDB_VECTOR to DEBUGGER_VECTOR

2005-08-08 Thread Keith Owens
On Mon, 8 Aug 2005 21:28:50 +0200, 
Andi Kleen <[EMAIL PROTECTED]> wrote:
>On Mon, Aug 08, 2005 at 12:27:10PM -0700, Tom Rini wrote:
>>  {
>>  unsigned int icr =  APIC_DM_FIXED | shortcut | vector | dest;
>> -if (vector == KDB_VECTOR)
>> +if (vector == NMI_VECTOR)
>>  icr = (icr & (~APIC_VECTOR_MASK)) | APIC_DM_NMI;
>
>That if () should be removed since it's useless.
>Can you do that please?

Why is 'if ()' useless?  Remove the if test and all ipis get sent as
NMI, we definitely do not want that.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OOPS in 2.6.13-rc1-mm1 -- EIP is at sysfs_release+0x49/0xb0

2005-08-08 Thread Keith Owens
On Mon, 8 Aug 2005 10:44:04 -0700, 
Andrew Morton <[EMAIL PROTECTED]> wrote:
>Sonny Rao <[EMAIL PROTECTED]> wrote:
>> Modules linked in: cpufreq_userspace cpufreq_stats freq_table 
>> cpufreq_powersave 
>> cpufreq_ondemand cpufreq_conservative ipv6 video thermal processor hotkey 
>> fan co
>> ntainer button battery ac nfs lockd sunrpc af_packet tg3 ohci_hcd usbcore 
>> generi
>> c serverworks i2c_piix4 i2c_core sworks_agp agpgart pcspkr rtc floppy tsdev 
>> dm_m
>> od parport_pc lp parport ide_generic ide_disk ide_cd cdrom ide_core unix
>> CPU:0
>> EIP:0060:[]Not tainted VLI
>> EFLAGS: 00010246   (2.6.13-rc4-mm1) 
>> EIP is at sysfs_release+0x4c/0xb0
>> eax: 762f7373   ebx: 762f7373   ecx: 0001   edx: ef3c5000
>> esi: f596a188   edi: f21fecc0   ebp: ef3c5f3c   esp: ef3c5f2c
>> ds: 007b   es: 007b   ss: 0068
>> Process udev (pid: 11843, threadinfo=ef3c5000 task=ef78e550)
>> Stack: f596a188 0010 f762d580 c21bc944 ef3c5f68 c0166cea c21bc944 
>> f762d580 
>>  c2137980 ec7e9748 f762d580 dcae7300  
>> ef3c5f78 
>>c0166aeb f762d580 f762d580 ef3c5f94 c01650ab f762d580 dcae7300 
>> dcae7300 
>> Call Trace:
>>  [] show_stack+0x7f/0xa0
>>  [] show_registers+0x164/0x1d0
>>  [] die+0x122/0x1c0
>>  [] do_page_fault+0x2ce/0x600
>>  [] error_code+0x4f/0x54
>>  [] __fput+0x1da/0x1f0
>>  [] fput+0x2b/0x50
>>  [] filp_close+0x4b/0x80
>>  [] sys_close+0x6e/0x90
>>  [] sysenter_past_esp+0x54/0x75
>> Code: 85 f6 8b 40 14 8b 58 04 74 08 89 34 24 e8 0d 97 04 00 85 db 74 38 b8 
>> 01 00
>>  00 00 e8 af 18 f7 ff e8 4a e5 04 00 c1 e0 07 8d 04 18  88 00 01 00 00 
>> 83 3b
>>  02 74 49 b8 01 00 00 00 e8 cf 18 f7 ff 
>>  <6>note: udev[11843] exited with preempt_count 1
>> Using generic hotkey driver
>> ibm_acpi: acpi_evalf(DHKC, d, ...) failed: 4097
>> ibm_acpi: `enable,0x' invalid for parameter `hotkey'
>> toshiba_acpi: Unknown parameter `hotkeys_over_acpi'
>> apm: BIOS not found.
>> 
>> Let me see if I can reproduce this on either 2.6.13-rc4 or  2.6.13-rc6 
>> 
>> Machine is an IBM x335 (dual P4), and I'm not using any framebuffer
>> stuff. 
>> 
>
>Keith, does this look like the use-after-free which you've been hitting?

It is certainly in the same place, freeing the data that is chained off
sd->s_element.  This oops does not show any memory poisoning, but I am
guessing that the kernel was not compiled with slab debugging.  On
balance, it looks like the same problem.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OOPS in 2.6.13-rc1-mm1 -- EIP is at sysfs_release+0x49/0xb0

2005-08-08 Thread Keith Owens
On Mon, 8 Aug 2005 10:44:04 -0700, 
Andrew Morton [EMAIL PROTECTED] wrote:
Sonny Rao [EMAIL PROTECTED] wrote:
 Modules linked in: cpufreq_userspace cpufreq_stats freq_table 
 cpufreq_powersave 
 cpufreq_ondemand cpufreq_conservative ipv6 video thermal processor hotkey 
 fan co
 ntainer button battery ac nfs lockd sunrpc af_packet tg3 ohci_hcd usbcore 
 generi
 c serverworks i2c_piix4 i2c_core sworks_agp agpgart pcspkr rtc floppy tsdev 
 dm_m
 od parport_pc lp parport ide_generic ide_disk ide_cd cdrom ide_core unix
 CPU:0
 EIP:0060:[c01a8bcc]Not tainted VLI
 EFLAGS: 00010246   (2.6.13-rc4-mm1) 
 EIP is at sysfs_release+0x4c/0xb0
 eax: 762f7373   ebx: 762f7373   ecx: 0001   edx: ef3c5000
 esi: f596a188   edi: f21fecc0   ebp: ef3c5f3c   esp: ef3c5f2c
 ds: 007b   es: 007b   ss: 0068
 Process udev (pid: 11843, threadinfo=ef3c5000 task=ef78e550)
 Stack: f596a188 0010 f762d580 c21bc944 ef3c5f68 c0166cea c21bc944 
 f762d580 
  c2137980 ec7e9748 f762d580 dcae7300  
 ef3c5f78 
c0166aeb f762d580 f762d580 ef3c5f94 c01650ab f762d580 dcae7300 
 dcae7300 
 Call Trace:
  [c010401f] show_stack+0x7f/0xa0
  [c01041d4] show_registers+0x164/0x1d0
  [c0104422] die+0x122/0x1c0
  [c030db1e] do_page_fault+0x2ce/0x600
  [c0103ccb] error_code+0x4f/0x54
  [c0166cea] __fput+0x1da/0x1f0
  [c0166aeb] fput+0x2b/0x50
  [c01650ab] filp_close+0x4b/0x80
  [c016514e] sys_close+0x6e/0x90
  [c010312f] sysenter_past_esp+0x54/0x75
 Code: 85 f6 8b 40 14 8b 58 04 74 08 89 34 24 e8 0d 97 04 00 85 db 74 38 b8 
 01 00
  00 00 e8 af 18 f7 ff e8 4a e5 04 00 c1 e0 07 8d 04 18 ff 88 00 01 00 00 
 83 3b
  02 74 49 b8 01 00 00 00 e8 cf 18 f7 ff 
  6note: udev[11843] exited with preempt_count 1
 Using generic hotkey driver
 ibm_acpi: acpi_evalf(DHKC, d, ...) failed: 4097
 ibm_acpi: `enable,0x' invalid for parameter `hotkey'
 toshiba_acpi: Unknown parameter `hotkeys_over_acpi'
 apm: BIOS not found.
 
 Let me see if I can reproduce this on either 2.6.13-rc4 or  2.6.13-rc6 
 
 Machine is an IBM x335 (dual P4), and I'm not using any framebuffer
 stuff. 
 

Keith, does this look like the use-after-free which you've been hitting?

It is certainly in the same place, freeing the data that is chained off
sd-s_element.  This oops does not show any memory poisoning, but I am
guessing that the kernel was not compiled with slab debugging.  On
balance, it looks like the same problem.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/1] x86_64: Rename KDB_VECTOR to DEBUGGER_VECTOR

2005-08-08 Thread Keith Owens
On Mon, 8 Aug 2005 21:28:50 +0200, 
Andi Kleen [EMAIL PROTECTED] wrote:
On Mon, Aug 08, 2005 at 12:27:10PM -0700, Tom Rini wrote:
  {
  unsigned int icr =  APIC_DM_FIXED | shortcut | vector | dest;
 -if (vector == KDB_VECTOR)
 +if (vector == NMI_VECTOR)
  icr = (icr  (~APIC_VECTOR_MASK)) | APIC_DM_NMI;

That if () should be removed since it's useless.
Can you do that please?

Why is 'if ()' useless?  Remove the if test and all ipis get sent as
NMI, we definitely do not want that.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/1] x86_64: Rename KDB_VECTOR to DEBUGGER_VECTOR

2005-08-08 Thread Keith Owens
On Tue, 9 Aug 2005 01:16:37 +0200, 
Andi Kleen [EMAIL PROTECTED] wrote:
On Tue, Aug 09, 2005 at 09:14:52AM +1000, Keith Owens wrote:
 On Mon, 8 Aug 2005 21:28:50 +0200, 
 Andi Kleen [EMAIL PROTECTED] wrote:
 On Mon, Aug 08, 2005 at 12:27:10PM -0700, Tom Rini wrote:
   {
unsigned int icr =  APIC_DM_FIXED | shortcut | vector | dest;
  - if (vector == KDB_VECTOR)
  + if (vector == NMI_VECTOR)
icr = (icr  (~APIC_VECTOR_MASK)) | APIC_DM_NMI;
 
 That if () should be removed since it's useless.
 Can you do that please?
 
 Why is 'if ()' useless?  Remove the if test and all ipis get sent as
 NMI, we definitely do not want that.

The if () with its following line. The same result can be gotten
by passing suitable arguments.

Arguments to what?  The path for sending the NMI_VECTOR is
send_IPI_allbutself - {cluster,flat,physflat}_send_IPI_allbutself -
{__send_IPI_shortcut, physflat_send_IPI_mask, cluster_send_IPI_mask} -
send_IPI_mask_sequence - __prepare_ICR.

Pushing the check for NMI_VECTOR any higher than __prepare_ICR needs
multiple tests for NMI_VECTOR.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 07/15] Basic x86_64 support

2005-08-06 Thread Keith Owens
On Thu, 4 Aug 2005 14:39:00 +0200, 
Andi Kleen <[EMAIL PROTECTED]> wrote:
>> > That doesn't make much sense here. tasklet will only run when interrupts
>> > are enabled, and that is much later. You could move it to there.
>> 
>> Where?  Keep in mind it's really only x86_64 that isn't able to break
>> sooner.
>
>The local_irq_enable() call in init/main.c:start_kernel()
>
>If you want to run gdb earlier you need to do it without a tasklet.
>
>> > > --- linux-2.6.13-rc3/include/asm-x86_64/hw_irq.h~x86_64-lite 
>> > > 2005-07-29 13:19:10.0 -0700
>> > > +++ linux-2.6.13-rc3-trini/include/asm-x86_64/hw_irq.h   2005-07-29 
>> > > 13:19:10.0 -0700
>> > > @@ -55,6 +55,7 @@ struct hw_interrupt_type;
>> > >  #define TASK_MIGRATION_VECTOR   0xfb
>> > >  #define CALL_FUNCTION_VECTOR0xfa
>> > >  #define KDB_VECTOR  0xf9
>> > > +#define KGDB_VECTOR 0xf8
>> > 
>> > I already allocated these vectors for something else.
>> 
>> Is there another we can use?  Just following what looked to be the
>> logical order.
>
>How about you use KDB_VECTOR and rename it to DEBUG_VECTOR
>and then just check if kgdb is currently active? 
>
>KDB can do the same.
>
>I changed the assignment in my tree like this:
>
>#define SPURIOUS_APIC_VECTOR0xff
>#define ERROR_APIC_VECTOR   0xfe
>#define RESCHEDULE_VECTOR   0xfd
>#define CALL_FUNCTION_VECTOR0xfc
>#define KDB_VECTOR  0xfb/* reserved for KDB */
>#define THERMAL_APIC_VECTOR 0xfa
>/* 0xf9 free */
>#define INVALIDATE_TLB_VECTOR_END   0xf8
>#define INVALIDATE_TLB_VECTOR_START 0xf0/* f0-f8 used for TLB flush */

Don't call it {KDB,KGDB,DEBUG}_VECTOR, call it NMI_VECTOR, which is
what it really is.  default_do_nmi() determines if the nmi is due to a
debugger or some other event.  That requires the debuggers to record if
they are expecting their own nmi, putting all the load on the
debuggers, where it belongs.

IOW, add NMI_VECTOR to the base code, then add debugger support on top of
NMI_VECTOR.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 07/15] Basic x86_64 support

2005-08-06 Thread Keith Owens
On Thu, 4 Aug 2005 14:39:00 +0200, 
Andi Kleen [EMAIL PROTECTED] wrote:
  That doesn't make much sense here. tasklet will only run when interrupts
  are enabled, and that is much later. You could move it to there.
 
 Where?  Keep in mind it's really only x86_64 that isn't able to break
 sooner.

The local_irq_enable() call in init/main.c:start_kernel()

If you want to run gdb earlier you need to do it without a tasklet.

   --- linux-2.6.13-rc3/include/asm-x86_64/hw_irq.h~x86_64-lite 
   2005-07-29 13:19:10.0 -0700
   +++ linux-2.6.13-rc3-trini/include/asm-x86_64/hw_irq.h   2005-07-29 
   13:19:10.0 -0700
   @@ -55,6 +55,7 @@ struct hw_interrupt_type;
#define TASK_MIGRATION_VECTOR   0xfb
#define CALL_FUNCTION_VECTOR0xfa
#define KDB_VECTOR  0xf9
   +#define KGDB_VECTOR 0xf8
  
  I already allocated these vectors for something else.
 
 Is there another we can use?  Just following what looked to be the
 logical order.

How about you use KDB_VECTOR and rename it to DEBUG_VECTOR
and then just check if kgdb is currently active? 

KDB can do the same.

I changed the assignment in my tree like this:

#define SPURIOUS_APIC_VECTOR0xff
#define ERROR_APIC_VECTOR   0xfe
#define RESCHEDULE_VECTOR   0xfd
#define CALL_FUNCTION_VECTOR0xfc
#define KDB_VECTOR  0xfb/* reserved for KDB */
#define THERMAL_APIC_VECTOR 0xfa
/* 0xf9 free */
#define INVALIDATE_TLB_VECTOR_END   0xf8
#define INVALIDATE_TLB_VECTOR_START 0xf0/* f0-f8 used for TLB flush */

Don't call it {KDB,KGDB,DEBUG}_VECTOR, call it NMI_VECTOR, which is
what it really is.  default_do_nmi() determines if the nmi is due to a
debugger or some other event.  That requires the debuggers to record if
they are expecting their own nmi, putting all the load on the
debuggers, where it belongs.

IOW, add NMI_VECTOR to the base code, then add debugger support on top of
NMI_VECTOR.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: question on memory map of process on i386

2005-08-03 Thread Keith Owens
On Wed, 03 Aug 2005 17:28:38 -0600, 
"Christopher Friesen" <[EMAIL PROTECTED]> wrote:
>
>On i386, /proc//maps shows the following entry:
>
>e000-f000 ---p  00:00 0
>
>This page of memory is way up above TASK_SIZE (which is 0xc000), so 
>how is it visible to userspace?
>
>Just to complicate things,  I seem to find the vma for this page using 
>find_vma_prev().
>
>Can anyone explain what's going on?

The gate page is a section of code that is generated as part of the
kernel build.  At run time, the gate page is mapped into all the user
space processes.  There is also a virtual dynamic .so (vdso) file that
is created by the kernel and picked up by the linker, the vdso maps the
kernel entries in the gate page.  Run this command and look for "gate".

  ldd -v `which cat`

Once all the dots are joined by the linker, a program can use the vdso
to directly access the gate page, even though the vdso and the
underlying page belongs to the kernel.  This direct access does not
incur any of the overhead associated with a syscall, so it can be very
fast.

What is in the gate page varies from one architecture to another, glibc
hides the arch differences from the program.  Some sample uses for the
gate page -

i386: select between int 0x80 or sysenter to enter the kernel.
ia64: select between break 0x10 or epc to enter the kernel, epc is
  significantly faster.  On ia64, the gate page also contains the
  signal delivery trampoline.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: question on memory map of process on i386

2005-08-03 Thread Keith Owens
On Wed, 03 Aug 2005 17:28:38 -0600, 
Christopher Friesen [EMAIL PROTECTED] wrote:

On i386, /proc/pid/maps shows the following entry:

e000-f000 ---p  00:00 0

This page of memory is way up above TASK_SIZE (which is 0xc000), so 
how is it visible to userspace?

Just to complicate things,  I seem to find the vma for this page using 
find_vma_prev().

Can anyone explain what's going on?

The gate page is a section of code that is generated as part of the
kernel build.  At run time, the gate page is mapped into all the user
space processes.  There is also a virtual dynamic .so (vdso) file that
is created by the kernel and picked up by the linker, the vdso maps the
kernel entries in the gate page.  Run this command and look for gate.

  ldd -v `which cat`

Once all the dots are joined by the linker, a program can use the vdso
to directly access the gate page, even though the vdso and the
underlying page belongs to the kernel.  This direct access does not
incur any of the overhead associated with a syscall, so it can be very
fast.

What is in the gate page varies from one architecture to another, glibc
hides the arch differences from the program.  Some sample uses for the
gate page -

i386: select between int 0x80 or sysenter to enter the kernel.
ia64: select between break 0x10 or epc to enter the kernel, epc is
  significantly faster.  On ia64, the gate page also contains the
  signal delivery trampoline.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01

2005-08-02 Thread Keith Owens
On Tue, 02 Aug 2005 18:12:27 -0700, 
George Anzinger  wrote:
>How about something like:
>   if (current + THREAD_SIZE/sizeof(long) - (regs + sizeof(pt_regs)) > 
> MAGIC)

current points to the current struct task, regs points to the kernel
stack.  Those two data areas can be completely separate, as they are on
i386.  Also i386 uses a separate kernel stack for interrupts.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Need help regarding kernel threads

2005-08-02 Thread Keith Owens
On Tue, 2 Aug 2005 09:57:51 +0100 (BST), 
vinay hegde <[EMAIL PROTECTED]> wrote:
>How to differentiate kernel threads from normal
>processes inside the Linux kernel code? 

The Linux Kernel Debugger (ftp://oss.sgi.com/projects/kdb/download/v4.4)
distinguishes between idle tasks, sleeping system daemons and the rest
(typically user tasks).  An idle task has pid 0, a sleeping system
daemon has a NULL mm field and is in S state, everything else is
treated as a normal task.

Download the latest kdb common patch and look at function
kdb_task_state_char.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Need help regarding kernel threads

2005-08-02 Thread Keith Owens
On Tue, 2 Aug 2005 09:57:51 +0100 (BST), 
vinay hegde [EMAIL PROTECTED] wrote:
How to differentiate kernel threads from normal
processes inside the Linux kernel code? 

The Linux Kernel Debugger (ftp://oss.sgi.com/projects/kdb/download/v4.4)
distinguishes between idle tasks, sleeping system daemons and the rest
(typically user tasks).  An idle task has pid 0, a sleeping system
daemon has a NULL mm field and is in S state, everything else is
treated as a normal task.

Download the latest kdb common patch and look at function
kdb_task_state_char.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01

2005-08-02 Thread Keith Owens
On Tue, 02 Aug 2005 18:12:27 -0700, 
George Anzinger george@mvista.com wrote:
How about something like:
   if (current + THREAD_SIZE/sizeof(long) - (regs + sizeof(pt_regs))  
 MAGIC)

current points to the current struct task, regs points to the kernel
stack.  Those two data areas can be completely separate, as they are on
i386.  Also i386 uses a separate kernel stack for interrupts.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc4 use after free in class_device_attr_show

2005-08-01 Thread Keith Owens
On Tue, 02 Aug 2005 13:05:50 +1000, 
Keith Owens <[EMAIL PROTECTED]> wrote:
>The vcsnn value varies.  I traced the dentry parent chain for the
>latest event.  From bottom to top the d_name entries are
>
>  dev, vcs16, vc, class, /.
>
>That makes no sense, why is dev a child of vcs16?  Raw data at the end.

Ignore that bit, I was confusing /dev and dev as a subdir of a sysfs
entry.  The parent chain is right.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc4 use after free in class_device_attr_show

2005-08-01 Thread Keith Owens
On Mon, 1 Aug 2005 12:03:21 -0700,
Andrew Morton <[EMAIL PROTECTED]> wrote:
>Keith Owens <[EMAIL PROTECTED]> wrote:
>>
>> On Sat, 30 Jul 2005 02:29:55 -0700,
>> Andrew Morton <[EMAIL PROTECTED]> wrote:
>> >Keith Owens <[EMAIL PROTECTED]> wrote:
>> >>
>> >> 2.6.13-rc4 + kdb, with lots of CONFIG_DEBUG options.  There is an
>> >>  intermittent use after free in class_device_attr_show.  Reboot with no
>> >>  changes and the problem does not always recur.
>> >> ...
>> >>  ip is at class_device_attr_show+0x50/0xa0
>> >> ...
>> >
>> >It might help to know which file is being read from here.
>> >
>> >The below patch will record the name of the most-recently-opened sysfs
>> >file.  You can print last_sysfs_file[] in the debugger or add the
>> >appropriate printk to the ia64 code?
>>
>> No need for a patch.  It is /dev/vcsa2.
>
>You mean /sys/class/vc/vcsa2?

The vcsnn value varies.  I traced the dentry parent chain for the
latest event.  From bottom to top the d_name entries are

  dev, vcs16, vc, class, /.

That makes no sense, why is dev a child of vcs16?  Raw data at the end.

>That appears to be using generic code...
>
>Can you please summarise what you curently know about this bug?  What is
>being accessed after free in class_device_attr_show()?  class_dev_attr?
>cd?

IA64, compiled for both SMP and uni-processor.  Lots of debug configs,
including slab poisoning.

The problem was first noticed at 2.6.13-rc3, it has also been seen in
-rc4.  It is very intermittent, so -rc3 may not be the starting point.

Failures have been seen in two sysfs routines,
sysfs_read_file()->class_device_attr_show() and
sysfs_release()->module_put(owner).

The common denominator in the failures is that sd->s_element points to
poisoned data.

Raw data, from the failure in sysfs_release:

kdb> filp 0xe0301eeae8d0
name.name 0xe0301d171384  name.len  3
File Pointer at 0xe0301eeae8d0
 f_list.nxt = 0xe0301eeaea08 f_list.prv = 0xe03003e5aeb8
 f_dentry = 0xe0301d1712a0 f_op = 0xa00100a615c8
 f_count = 0 f_flags = 0x8000 f_mode = 0xd
 f_pos = 5

dentry parent chain.  /class/vc/vcs16/dev WTF?

kdb> dentry 0xe0301d1712a0
Dentry at 0xe0301d1712a0
 d_name.len = 3 d_name.name = 0xe0301d171384 
 d_count = 1 d_flags = 0x18 d_inode = 0xe0b476b32df0
 d_parent = 0xe0301d171a80
 d_hash.nxt = 0x d_hash.prv = 0x00200200
 d_lru.nxt = 0xe0301d1712f8 d_lru.prv = 0xe0301d1712f8
 d_child.nxt = 0xe0301d171af8 d_child.prv = 0xe0301d171af8
 d_subdirs.nxt = 0xe0301d171318 d_subdirs.prv = 0xe0301d171318
 d_alias.nxt = 0xe0b476b32e20 d_alias.prv = 0xe0b476b32e20
 d_op = 0xa00100a61870 d_sb = 0xe03003e5ad58

kdb> dentry 0xe0301d171a80
Dentry at 0xe0301d171a80
 d_name.len = 5 d_name.name = 0xe0301d171b64 
 d_count = 2 d_flags = 0x10 d_inode = 0xe0301986cac0
 d_parent = 0xe0347b87c880
 d_hash.nxt = 0x d_hash.prv = 0x00200200
 d_lru.nxt = 0xe0301d171ad8 d_lru.prv = 0xe0301d171ad8
 d_child.nxt = 0xe03011ba9ae8 d_child.prv = 0xe03019f974c8
 d_subdirs.nxt = 0xe0301d171308 d_subdirs.prv = 0xe0301d171308
 d_alias.nxt = 0xe0301986caf0 d_alias.prv = 0xe0301986caf0
 d_op = 0xa00100a61870 d_sb = 0xe03003e5ad58

kdb> dentry 0xe0347b87c880
Dentry at 0xe0347b87c880
 d_name.len = 2 d_name.name = 0xe0347b87c964 
 d_count = 8 d_flags = 0x0 d_inode = 0xe0b47a5dad70
 d_parent = 0xe0b47a404760
 d_hash.nxt = 0x d_hash.prv = 0xa0020079d000
 d_lru.nxt = 0xe0347b87c8d8 d_lru.prv = 0xe0347b87c8d8
 d_child.nxt = 0xe0b47a445668 d_child.prv = 0xe0347b921548
 d_subdirs.nxt = 0xe0301a1fd788 d_subdirs.prv = 0xe0347b87c7c8
 d_alias.nxt = 0xe0b47a5dada0 d_alias.prv = 0xe0b47a5dada0
 d_op = 0xa00100a61870 d_sb = 0xe03003e5ad58

kdb> dentry 0xe0b47a404760
Dentry at 0xe0b47a404760
 d_name.len = 5 d_name.name = 0xe0b47a404844 
 d_count = 20 d_flags = 0x0 d_inode = 0xe0347bc95c18
 d_parent = 0xe0b47a405180
 d_hash.nxt = 0x d_hash.prv = 0xa002002d4bc8
 d_lru.nxt = 0xe0b47a4047b8 d_lru.prv = 0xe0b47a4047b8
 d_child.nxt = 0xe0b47a4048e8 d_child.prv = 0xe0b47a4046a8
 d_subdirs.nxt = 0xe03013818d68 d_subdirs.prv = 0xe0b47a405d28
 d_alias.nxt = 0xe0347bc95c48 d_alias.prv = 0xe0347bc95c48
 d_op = 0xa00100a61870 d_sb = 0xe03003e5ad58

kdb> dentry 0xe0b47a405180
Dentry at 0xe0b47a405180
 d_name.len = 1 d_name.name = 0xe0b47a405264 
 d_count = 11 d_flags = 0x10 d_inode = 0xe0347bc97460
 d_parent = 0xe0b47a405180
 d_hash.nxt = 0x d_hash.prv = 0x
 d_lru.nxt = 0xe0

Re: 2.6.13-rc4 use after free in class_device_attr_show

2005-08-01 Thread Keith Owens
Another (different) manifestation of use after free in sysfs.  It broke
on module_put(owner) in sysfs_release().  FWIW this ia64 build is
uni-processor, so there is a lot more context switching than normally
occurs on udev.

fill_kobj_path: path = '/class/vc/vcs2'
kobject_hotplug: /sbin/hotplug vc seq=1809 HOME=/ 
PATH=/sbin:/bin:/usr/sbin:/usr/bin ACTION=add DEVPATH=/class/vc/vcs2 
SUBSYSTEM=vc
kobject vcsa2: registering. parent: vc, set: class_obj
kobject_hotplug
fill_kobj_path: path = '/class/vc/vcsa2'
kobject_hotplug: /sbin/hotplug vc seq=1810 HOME=/ 
PATH=/sbin:/bin:/usr/sbin:/usr/bin ACTION=add DEVPATH=/class/vc/vcsa2 
SUBSYSTEM=vc
kobject_hotplug
fill_kobj_path: path = '/class/vc/vcs1'
kobject_hotplug: /sbin/hotplug vc seq=1811 HOME=/ 
PATH=/sbin:/bin:/usr/sbin:/usr/bin ACTION=remove DEVPATH=/class/vc/vcs1 
SUBSYSTEM=vc
kobject vcs1: cleaning up
kobject_hotplug
fill_kobj_path: path = '/class/vc/vcsa1'
kobject_hotplug: /sbin/hotplug vc seq=1812 HOME=/ 
PATH=/sbin:/bin:/usr/sbin:/usr/bin ACTION=remove DEVPATH=/class/vc/vcsa1 
SUBSYSTEM=vc
kobject vcsa1: cleaning up
kobject vcs16: cleaning up
Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b6cf3
udev[24414]: Oops 8821862825984 [1]
Modules linked in: md5 ipv6 usbcore raid0 md_mod nls_iso8859_1 nls_cp437 dm_mod 
sg st osst

Pid: 24414, CPU 0, comm: udev
psr : 1010081a6018 ifs : 8308 ip  : []Not 
tainted
ip is at sysfs_release+0xf0/0x1c0
unat:  pfs : 0308 rsc : 0003
rnat:  bsps:  pr  : 00158659
ldrs:  ccv :  fpsr: 0009804c8270033f
csd :  ssd : 
b0  : a0010025bff0 b6  : a001e8c0 b7  : a0010057ff00
f6  : 1003e6b6b6b6b6b6b6b6b f7  : 0ffe58bbeff7b8000
f8  : 1003e0578 f9  : 1003e0005
f10 : 100019803b6e3 f11 : 1003e0005
r1  : a00100ddf690 r2  : 0001 r3  : e0b078360da0
r8  :  r9  : a00100be0a40 r10 : 00f4
r11 : 0001 r12 : e0b078367e30 r13 : e0b07836
r14 : 6b6b6b6b6b6b6cf3 r15 : 0001 r16 : e0b078360da0
r17 :  r18 : 054cd124 r19 : a0007fff62138000
r20 : a0007fff8c7a r21 : 0010 r22 : 4000
r23 : 6b6b6b6b6b6b6b6b r24 :  r25 : e0347bff0758
r26 : 0090 r27 : e030752f0728 r28 : e030752f0720
r29 : e030752f0738 r30 :  r31 : 0001

kdb> r s
 r32: e0b476b32df0  r33: e0b472417380  r34: 6b6b6b6b6b6b6b6b 
 r35: a0010019a060  r36: 0610  r37: 0610 
 r38: a0010025bff0  r39: 0308 

kdb> bt
Stack traceback for pid 24414
0xe0b078362441424400  10   R  0xe0b078360300 *udev
0xa0010025c010 sysfs_release+0xf0
args (0xe0b476b32df0, 0xe0b472417380, 0x6b6b6b6b6b6b6b6b, 
0xa0010019a060, 0x610)
0xa0010019a060 __fput+0x3c0
args (0xe0301eeae8d0, 0xe0301eeae8f0, 0xe0b476b32df0, 
0xe0301eeae8e0, 0xe0347bc91200)
0xa0010019a0c0 fput+0x40
args (0xe0301eeae8d0, 0xa00100191d60, 0x308, 0xe0b476b32df0)
0xa00100191d60 filp_close+0xc0
args (0xe0301eeae8d0, 0xe0b4720d5230, 0x0, 0xa001001920d0, 
0x919)
0xa001001920d0 sys_close+0x2f0
args (0x6, 0x60058210, 0x4000, 0x280, 0x0)
0xa001b520 ia64_ret_from_syscall
args (0x6, 0x60058210, 0x4000)
0xa0010640 __kernel_syscall_via_break
args (0x6, 0x60058210, 0x4000)

kdb> inode 0xe0b476b32df0
struct inode at  0xe0b476b32df0
 i_ino = 34192 i_count = 1 i_size 16384
 i_mode = 0100444  i_nlink = 0  i_rdev = 0x0
 i_hash.nxt = 0x i_hash.pprev = 0x
 i_list.nxt = 0xe0b472084d40 i_list.prv = 0xe0b476b31c98
 i_dentry.nxt = 0xe0301d1712a0 i_dentry.prv = 0xe0301d1712a0
 i_sb = 0xe03003e5ad58 i_op = 0xa00100a61488 i_data = 
0xe0b476b32f98 nrpages = 0
 i_fop= 0xa00100a615c8 i_flock = 0x i_mapping = 
0xe0b476b32f98
 i_flags 0x0 i_state 0x0 []  fs specific info @ 0xe0b476b33148

kdb> dentry 0xe0301d1712a0
Dentry at 0xe0301d1712a0
 d_name.len = 3 d_name.name = 0xe0301d171384 
 d_count = 1 d_flags = 0x18 d_inode = 0xe0b476b32df0
 d_parent = 0xe0301d171a80
 d_hash.nxt = 0x d_hash.prv = 0x00200200
 d_lru.nxt = 0xe0301d1712f8 d_lru.prv = 0xe0301d1712f8
 d_child.nxt = 0xe0301d171af8 d_child.prv = 0xe0301d171af8
 d_subdirs.nxt = 0xe0301d171318 d_subdirs.prv = 0xe0301d171318
 d_alias.nxt = 0xe0b476b32e20 d_alias.prv = 0xe0b476b32e20
 d_op = 0xa00100a61870 d_sb = 0xe03003e5ad58

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More 

Re: 2.6.13-rc4 use after free in class_device_attr_show

2005-08-01 Thread Keith Owens
On Sat, 30 Jul 2005 02:29:55 -0700,
Andrew Morton <[EMAIL PROTECTED]> wrote:
>Keith Owens <[EMAIL PROTECTED]> wrote:
>>
>> 2.6.13-rc4 + kdb, with lots of CONFIG_DEBUG options.  There is an
>>  intermittent use after free in class_device_attr_show.  Reboot with no
>>  changes and the problem does not always recur.
>> ...
>>  ip is at class_device_attr_show+0x50/0xa0
>> ...
>
>It might help to know which file is being read from here.
>
>The below patch will record the name of the most-recently-opened sysfs
>file.  You can print last_sysfs_file[] in the debugger or add the
>appropriate printk to the ia64 code?

No need for a patch.  It is /dev/vcsa2.

kdb> bt
Stack traceback for pid 23066
0xe0301abf80002306623051  10   R  0xe0301abf8300 *udev
0xa0010057f850 class_device_attr_show+0x50
args (0xe0b006abb900, 0x6b6b6b6b6b6b6b6b, 0xe030186d4000, 
0xa0010025cf90, 0x711)
0xa0010025cf90 sysfs_read_file+0x2b0
args (0xe03073917110, 0x60058210, 0x4000, 
0xe0301abffe38, 0xe0307a98d668)
0xa00100197ae0 vfs_read+0x1c0
args (0xe0301b2ba8d0, 0x60058210, 0x4000, 
0xe0301abffe38, 0xe0301b2ba8f0)
0xa00100197e20 sys_read+0x80
args (0x6, 0x60058210, 0x4000, 0x280, 0x0)
0xa001b520 ia64_ret_from_syscall
args (0x6, 0x60058210, 0x4000)
0xa0010640 __kernel_syscall_via_break
args (0x6, 0x60058210, 0x4000)

kdb> r
 psr: 0x1010081a6018   ifs: 0x8308ip: 0xa0010057f850
unat: 0x   pfs: 0x0711   rsc: 0x0003
rnat: 0xa00100abbb40  bsps: 0x036apr: 0x00159659
ldrs: 0x   ccv: 0x  fpsr: 0x0009804c0270033f
  b0: 0xa0010025cf90b6: 0xa001e8c0b7: 0xa0010057f800
  r1: 0xa00100ddf690r2: 0xe03073917128r3: 0xe03003018498
  r8: 0xr9: 0xa00100bfbbe8   r10: 0xe030186d4000
 r11: 0x00c061b5   r12: 0xe0301abffe20   r13: 0xe0301abf8000
 r14: 0xa0010057f800   r15: 0xe030186d4000   r16: 0x6db6db6db6db6db7
 r17: 0x2a155f98   r18: 0xa0007fff62138000   r19: 0xe030030102c8
 r20: 0xe0300301   r21: 0xfffefcf1   r22: 0x0010
 r23: 0xa00100d3f1a0   r24: 0xa00100bfbbe8   r25: 0x0542abf3
 r26: 0xa00100995718   r27: 0xe03003015208   r28: 0x
 r29: 0xe0300301   r30: 0xa00100d3f1a0   r31: 0x
 = e0301abffc60

kdb> r s
 r32: e0b006abb900  r33: 6b6b6b6b6b6b6b6b  r34: e030186d4000
 r35: a0010025cf90  r36: 0711  r37: a00100ddf690
 r38: e0b006abb8f0  r39: e030186d4000

kdb> filp 0xe0301b2ba8d0
name.name 0xe0301aced384  name.len  3
File Pointer at 0xe0301b2ba8d0
 f_list.nxt = 0xe0301b2bb9e0 f_list.prv = 0xe03003e5aeb8
 f_dentry = 0xe0301aced2a0 f_op = 0xa00100a615c8
 f_count = 1 f_flags = 0x8000 f_mode = 0xd
 f_pos = 0

kdb> dentry 0xe0301aced2a0
Dentry at 0xe0301aced2a0
 d_name.len = 3 d_name.name = 0xe0301aced384 
 d_count = 1 d_flags = 0x18 d_inode = 0xe0b4796a87c8
 d_parent = 0xe0b0044a6020
 d_hash.nxt = 0x d_hash.prv = 0x00200200
 d_lru.nxt = 0xe0301aced2f8 d_lru.prv = 0xe0301aced2f8
 d_child.nxt = 0xe0b0044a6098 d_child.prv = 0xe0b0044a6098
 d_subdirs.nxt = 0xe0301aced318 d_subdirs.prv = 0xe0301aced318
 d_alias.nxt = 0xe0b4796a87f8 d_alias.prv = 0xe0b4796a87f8
 d_op = 0xa00100a61870 d_sb = 0xe03003e5ad58

kdb> inode 0xe0b4796a87c8
struct inode at  0xe0b4796a87c8
 i_ino = 33036 i_count = 1 i_size 16384
 i_mode = 0100444  i_nlink = 0  i_rdev = 0x0
 i_hash.nxt = 0x i_hash.pprev = 0x
 i_list.nxt = 0xe0301b10d2f0 i_list.prv = 0xe0b474f1bb50
 i_dentry.nxt = 0xe0301aced2a0 i_dentry.prv = 0xe0301aced2a0
 i_sb = 0xe03003e5ad58 i_op = 0xa00100a61488 i_data = 
0xe0b4796a8970 nrpages = 0
 i_fop= 0xa00100a615c8 i_flock = 0x i_mapping = 
0xe0b4796a8970
 i_flags 0x0 i_state 0x0 []  fs specific info @ 0xe0b4796a8b20

kdb> kobject 0xe0b006abb900
kobject at 0xe0b006abb900
 k_name 0xe0b006abb908 'vcsa2'
 kref.refcount 1'
 entry.next = 0xe0b006abb920 entry.prev = 0xe0b006abb920
 parent = 0xe0347b877528 kset = 0xa00100abba30 ktype = 
0x dentry = 0xe0b0044a6020

The attr is passed to class_device_attr_show in r33 but gcc has reused
that register by the time of the oops.

kdb> id class_device_attr_show
0xa0010057f800 class_device_attr_show[MII]   alloc r36=ar.pfs,8,6,0
0xa0010057f806 class_device_attr_show+0x6mov r8=r0;;
0xa0010057f80c class_device_attr_show+0xcadds

Re: 2.6.13-rc4 use after free in class_device_attr_show

2005-08-01 Thread Keith Owens
On Sat, 30 Jul 2005 02:29:55 -0700,
Andrew Morton [EMAIL PROTECTED] wrote:
Keith Owens [EMAIL PROTECTED] wrote:

 2.6.13-rc4 + kdb, with lots of CONFIG_DEBUG options.  There is an
  intermittent use after free in class_device_attr_show.  Reboot with no
  changes and the problem does not always recur.
 ...
  ip is at class_device_attr_show+0x50/0xa0
 ...

It might help to know which file is being read from here.

The below patch will record the name of the most-recently-opened sysfs
file.  You can print last_sysfs_file[] in the debugger or add the
appropriate printk to the ia64 code?

No need for a patch.  It is /dev/vcsa2.

kdb bt
Stack traceback for pid 23066
0xe0301abf80002306623051  10   R  0xe0301abf8300 *udev
0xa0010057f850 class_device_attr_show+0x50
args (0xe0b006abb900, 0x6b6b6b6b6b6b6b6b, 0xe030186d4000, 
0xa0010025cf90, 0x711)
0xa0010025cf90 sysfs_read_file+0x2b0
args (0xe03073917110, 0x60058210, 0x4000, 
0xe0301abffe38, 0xe0307a98d668)
0xa00100197ae0 vfs_read+0x1c0
args (0xe0301b2ba8d0, 0x60058210, 0x4000, 
0xe0301abffe38, 0xe0301b2ba8f0)
0xa00100197e20 sys_read+0x80
args (0x6, 0x60058210, 0x4000, 0x280, 0x0)
0xa001b520 ia64_ret_from_syscall
args (0x6, 0x60058210, 0x4000)
0xa0010640 __kernel_syscall_via_break
args (0x6, 0x60058210, 0x4000)

kdb r
 psr: 0x1010081a6018   ifs: 0x8308ip: 0xa0010057f850
unat: 0x   pfs: 0x0711   rsc: 0x0003
rnat: 0xa00100abbb40  bsps: 0x036apr: 0x00159659
ldrs: 0x   ccv: 0x  fpsr: 0x0009804c0270033f
  b0: 0xa0010025cf90b6: 0xa001e8c0b7: 0xa0010057f800
  r1: 0xa00100ddf690r2: 0xe03073917128r3: 0xe03003018498
  r8: 0xr9: 0xa00100bfbbe8   r10: 0xe030186d4000
 r11: 0x00c061b5   r12: 0xe0301abffe20   r13: 0xe0301abf8000
 r14: 0xa0010057f800   r15: 0xe030186d4000   r16: 0x6db6db6db6db6db7
 r17: 0x2a155f98   r18: 0xa0007fff62138000   r19: 0xe030030102c8
 r20: 0xe0300301   r21: 0xfffefcf1   r22: 0x0010
 r23: 0xa00100d3f1a0   r24: 0xa00100bfbbe8   r25: 0x0542abf3
 r26: 0xa00100995718   r27: 0xe03003015208   r28: 0x
 r29: 0xe0300301   r30: 0xa00100d3f1a0   r31: 0x
regs = e0301abffc60

kdb r s
 r32: e0b006abb900  r33: 6b6b6b6b6b6b6b6b  r34: e030186d4000
 r35: a0010025cf90  r36: 0711  r37: a00100ddf690
 r38: e0b006abb8f0  r39: e030186d4000

kdb filp 0xe0301b2ba8d0
name.name 0xe0301aced384  name.len  3
File Pointer at 0xe0301b2ba8d0
 f_list.nxt = 0xe0301b2bb9e0 f_list.prv = 0xe03003e5aeb8
 f_dentry = 0xe0301aced2a0 f_op = 0xa00100a615c8
 f_count = 1 f_flags = 0x8000 f_mode = 0xd
 f_pos = 0

kdb dentry 0xe0301aced2a0
Dentry at 0xe0301aced2a0
 d_name.len = 3 d_name.name = 0xe0301aced384 dev
 d_count = 1 d_flags = 0x18 d_inode = 0xe0b4796a87c8
 d_parent = 0xe0b0044a6020
 d_hash.nxt = 0x d_hash.prv = 0x00200200
 d_lru.nxt = 0xe0301aced2f8 d_lru.prv = 0xe0301aced2f8
 d_child.nxt = 0xe0b0044a6098 d_child.prv = 0xe0b0044a6098
 d_subdirs.nxt = 0xe0301aced318 d_subdirs.prv = 0xe0301aced318
 d_alias.nxt = 0xe0b4796a87f8 d_alias.prv = 0xe0b4796a87f8
 d_op = 0xa00100a61870 d_sb = 0xe03003e5ad58

kdb inode 0xe0b4796a87c8
struct inode at  0xe0b4796a87c8
 i_ino = 33036 i_count = 1 i_size 16384
 i_mode = 0100444  i_nlink = 0  i_rdev = 0x0
 i_hash.nxt = 0x i_hash.pprev = 0x
 i_list.nxt = 0xe0301b10d2f0 i_list.prv = 0xe0b474f1bb50
 i_dentry.nxt = 0xe0301aced2a0 i_dentry.prv = 0xe0301aced2a0
 i_sb = 0xe03003e5ad58 i_op = 0xa00100a61488 i_data = 
0xe0b4796a8970 nrpages = 0
 i_fop= 0xa00100a615c8 i_flock = 0x i_mapping = 
0xe0b4796a8970
 i_flags 0x0 i_state 0x0 []  fs specific info @ 0xe0b4796a8b20

kdb kobject 0xe0b006abb900
kobject at 0xe0b006abb900
 k_name 0xe0b006abb908 'vcsa2'
 kref.refcount 1'
 entry.next = 0xe0b006abb920 entry.prev = 0xe0b006abb920
 parent = 0xe0347b877528 kset = 0xa00100abba30 ktype = 
0x dentry = 0xe0b0044a6020

The attr is passed to class_device_attr_show in r33 but gcc has reused
that register by the time of the oops.

kdb id class_device_attr_show
0xa0010057f800 class_device_attr_show[MII]   alloc r36=ar.pfs,8,6,0
0xa0010057f806 class_device_attr_show+0x6mov r8=r0;;
0xa0010057f80c class_device_attr_show+0xcadds r2=24,r33

0xa0010057f810 class_device_attr_show+0x10[MMI]   mov r37=r1
0xa0010057f816 class_device_attr_show+0x16

Re: 2.6.13-rc4 use after free in class_device_attr_show

2005-08-01 Thread Keith Owens
Another (different) manifestation of use after free in sysfs.  It broke
on module_put(owner) in sysfs_release().  FWIW this ia64 build is
uni-processor, so there is a lot more context switching than normally
occurs on udev.

fill_kobj_path: path = '/class/vc/vcs2'
kobject_hotplug: /sbin/hotplug vc seq=1809 HOME=/ 
PATH=/sbin:/bin:/usr/sbin:/usr/bin ACTION=add DEVPATH=/class/vc/vcs2 
SUBSYSTEM=vc
kobject vcsa2: registering. parent: vc, set: class_obj
kobject_hotplug
fill_kobj_path: path = '/class/vc/vcsa2'
kobject_hotplug: /sbin/hotplug vc seq=1810 HOME=/ 
PATH=/sbin:/bin:/usr/sbin:/usr/bin ACTION=add DEVPATH=/class/vc/vcsa2 
SUBSYSTEM=vc
kobject_hotplug
fill_kobj_path: path = '/class/vc/vcs1'
kobject_hotplug: /sbin/hotplug vc seq=1811 HOME=/ 
PATH=/sbin:/bin:/usr/sbin:/usr/bin ACTION=remove DEVPATH=/class/vc/vcs1 
SUBSYSTEM=vc
kobject vcs1: cleaning up
kobject_hotplug
fill_kobj_path: path = '/class/vc/vcsa1'
kobject_hotplug: /sbin/hotplug vc seq=1812 HOME=/ 
PATH=/sbin:/bin:/usr/sbin:/usr/bin ACTION=remove DEVPATH=/class/vc/vcsa1 
SUBSYSTEM=vc
kobject vcsa1: cleaning up
kobject vcs16: cleaning up
Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b6cf3
udev[24414]: Oops 8821862825984 [1]
Modules linked in: md5 ipv6 usbcore raid0 md_mod nls_iso8859_1 nls_cp437 dm_mod 
sg st osst

Pid: 24414, CPU 0, comm: udev
psr : 1010081a6018 ifs : 8308 ip  : [a0010025c010]Not 
tainted
ip is at sysfs_release+0xf0/0x1c0
unat:  pfs : 0308 rsc : 0003
rnat:  bsps:  pr  : 00158659
ldrs:  ccv :  fpsr: 0009804c8270033f
csd :  ssd : 
b0  : a0010025bff0 b6  : a001e8c0 b7  : a0010057ff00
f6  : 1003e6b6b6b6b6b6b6b6b f7  : 0ffe58bbeff7b8000
f8  : 1003e0578 f9  : 1003e0005
f10 : 100019803b6e3 f11 : 1003e0005
r1  : a00100ddf690 r2  : 0001 r3  : e0b078360da0
r8  :  r9  : a00100be0a40 r10 : 00f4
r11 : 0001 r12 : e0b078367e30 r13 : e0b07836
r14 : 6b6b6b6b6b6b6cf3 r15 : 0001 r16 : e0b078360da0
r17 :  r18 : 054cd124 r19 : a0007fff62138000
r20 : a0007fff8c7a r21 : 0010 r22 : 4000
r23 : 6b6b6b6b6b6b6b6b r24 :  r25 : e0347bff0758
r26 : 0090 r27 : e030752f0728 r28 : e030752f0720
r29 : e030752f0738 r30 :  r31 : 0001

kdb r s
 r32: e0b476b32df0  r33: e0b472417380  r34: 6b6b6b6b6b6b6b6b 
 r35: a0010019a060  r36: 0610  r37: 0610 
 r38: a0010025bff0  r39: 0308 

kdb bt
Stack traceback for pid 24414
0xe0b078362441424400  10   R  0xe0b078360300 *udev
0xa0010025c010 sysfs_release+0xf0
args (0xe0b476b32df0, 0xe0b472417380, 0x6b6b6b6b6b6b6b6b, 
0xa0010019a060, 0x610)
0xa0010019a060 __fput+0x3c0
args (0xe0301eeae8d0, 0xe0301eeae8f0, 0xe0b476b32df0, 
0xe0301eeae8e0, 0xe0347bc91200)
0xa0010019a0c0 fput+0x40
args (0xe0301eeae8d0, 0xa00100191d60, 0x308, 0xe0b476b32df0)
0xa00100191d60 filp_close+0xc0
args (0xe0301eeae8d0, 0xe0b4720d5230, 0x0, 0xa001001920d0, 
0x919)
0xa001001920d0 sys_close+0x2f0
args (0x6, 0x60058210, 0x4000, 0x280, 0x0)
0xa001b520 ia64_ret_from_syscall
args (0x6, 0x60058210, 0x4000)
0xa0010640 __kernel_syscall_via_break
args (0x6, 0x60058210, 0x4000)

kdb inode 0xe0b476b32df0
struct inode at  0xe0b476b32df0
 i_ino = 34192 i_count = 1 i_size 16384
 i_mode = 0100444  i_nlink = 0  i_rdev = 0x0
 i_hash.nxt = 0x i_hash.pprev = 0x
 i_list.nxt = 0xe0b472084d40 i_list.prv = 0xe0b476b31c98
 i_dentry.nxt = 0xe0301d1712a0 i_dentry.prv = 0xe0301d1712a0
 i_sb = 0xe03003e5ad58 i_op = 0xa00100a61488 i_data = 
0xe0b476b32f98 nrpages = 0
 i_fop= 0xa00100a615c8 i_flock = 0x i_mapping = 
0xe0b476b32f98
 i_flags 0x0 i_state 0x0 []  fs specific info @ 0xe0b476b33148

kdb dentry 0xe0301d1712a0
Dentry at 0xe0301d1712a0
 d_name.len = 3 d_name.name = 0xe0301d171384 dev
 d_count = 1 d_flags = 0x18 d_inode = 0xe0b476b32df0
 d_parent = 0xe0301d171a80
 d_hash.nxt = 0x d_hash.prv = 0x00200200
 d_lru.nxt = 0xe0301d1712f8 d_lru.prv = 0xe0301d1712f8
 d_child.nxt = 0xe0301d171af8 d_child.prv = 0xe0301d171af8
 d_subdirs.nxt = 0xe0301d171318 d_subdirs.prv = 0xe0301d171318
 d_alias.nxt = 0xe0b476b32e20 d_alias.prv = 0xe0b476b32e20
 d_op = 0xa00100a61870 d_sb = 0xe03003e5ad58

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL 

Re: 2.6.13-rc4 use after free in class_device_attr_show

2005-08-01 Thread Keith Owens
On Mon, 1 Aug 2005 12:03:21 -0700,
Andrew Morton [EMAIL PROTECTED] wrote:
Keith Owens [EMAIL PROTECTED] wrote:

 On Sat, 30 Jul 2005 02:29:55 -0700,
 Andrew Morton [EMAIL PROTECTED] wrote:
 Keith Owens [EMAIL PROTECTED] wrote:
 
  2.6.13-rc4 + kdb, with lots of CONFIG_DEBUG options.  There is an
   intermittent use after free in class_device_attr_show.  Reboot with no
   changes and the problem does not always recur.
  ...
   ip is at class_device_attr_show+0x50/0xa0
  ...
 
 It might help to know which file is being read from here.
 
 The below patch will record the name of the most-recently-opened sysfs
 file.  You can print last_sysfs_file[] in the debugger or add the
 appropriate printk to the ia64 code?

 No need for a patch.  It is /dev/vcsa2.

You mean /sys/class/vc/vcsa2?

The vcsnn value varies.  I traced the dentry parent chain for the
latest event.  From bottom to top the d_name entries are

  dev, vcs16, vc, class, /.

That makes no sense, why is dev a child of vcs16?  Raw data at the end.

That appears to be using generic code...

Can you please summarise what you curently know about this bug?  What is
being accessed after free in class_device_attr_show()?  class_dev_attr?
cd?

IA64, compiled for both SMP and uni-processor.  Lots of debug configs,
including slab poisoning.

The problem was first noticed at 2.6.13-rc3, it has also been seen in
-rc4.  It is very intermittent, so -rc3 may not be the starting point.

Failures have been seen in two sysfs routines,
sysfs_read_file()-class_device_attr_show() and
sysfs_release()-module_put(owner).

The common denominator in the failures is that sd-s_element points to
poisoned data.

Raw data, from the failure in sysfs_release:

kdb filp 0xe0301eeae8d0
name.name 0xe0301d171384  name.len  3
File Pointer at 0xe0301eeae8d0
 f_list.nxt = 0xe0301eeaea08 f_list.prv = 0xe03003e5aeb8
 f_dentry = 0xe0301d1712a0 f_op = 0xa00100a615c8
 f_count = 0 f_flags = 0x8000 f_mode = 0xd
 f_pos = 5

dentry parent chain.  /class/vc/vcs16/dev WTF?

kdb dentry 0xe0301d1712a0
Dentry at 0xe0301d1712a0
 d_name.len = 3 d_name.name = 0xe0301d171384 dev
 d_count = 1 d_flags = 0x18 d_inode = 0xe0b476b32df0
 d_parent = 0xe0301d171a80
 d_hash.nxt = 0x d_hash.prv = 0x00200200
 d_lru.nxt = 0xe0301d1712f8 d_lru.prv = 0xe0301d1712f8
 d_child.nxt = 0xe0301d171af8 d_child.prv = 0xe0301d171af8
 d_subdirs.nxt = 0xe0301d171318 d_subdirs.prv = 0xe0301d171318
 d_alias.nxt = 0xe0b476b32e20 d_alias.prv = 0xe0b476b32e20
 d_op = 0xa00100a61870 d_sb = 0xe03003e5ad58

kdb dentry 0xe0301d171a80
Dentry at 0xe0301d171a80
 d_name.len = 5 d_name.name = 0xe0301d171b64 vcs16
 d_count = 2 d_flags = 0x10 d_inode = 0xe0301986cac0
 d_parent = 0xe0347b87c880
 d_hash.nxt = 0x d_hash.prv = 0x00200200
 d_lru.nxt = 0xe0301d171ad8 d_lru.prv = 0xe0301d171ad8
 d_child.nxt = 0xe03011ba9ae8 d_child.prv = 0xe03019f974c8
 d_subdirs.nxt = 0xe0301d171308 d_subdirs.prv = 0xe0301d171308
 d_alias.nxt = 0xe0301986caf0 d_alias.prv = 0xe0301986caf0
 d_op = 0xa00100a61870 d_sb = 0xe03003e5ad58

kdb dentry 0xe0347b87c880
Dentry at 0xe0347b87c880
 d_name.len = 2 d_name.name = 0xe0347b87c964 vc
 d_count = 8 d_flags = 0x0 d_inode = 0xe0b47a5dad70
 d_parent = 0xe0b47a404760
 d_hash.nxt = 0x d_hash.prv = 0xa0020079d000
 d_lru.nxt = 0xe0347b87c8d8 d_lru.prv = 0xe0347b87c8d8
 d_child.nxt = 0xe0b47a445668 d_child.prv = 0xe0347b921548
 d_subdirs.nxt = 0xe0301a1fd788 d_subdirs.prv = 0xe0347b87c7c8
 d_alias.nxt = 0xe0b47a5dada0 d_alias.prv = 0xe0b47a5dada0
 d_op = 0xa00100a61870 d_sb = 0xe03003e5ad58

kdb dentry 0xe0b47a404760
Dentry at 0xe0b47a404760
 d_name.len = 5 d_name.name = 0xe0b47a404844 class
 d_count = 20 d_flags = 0x0 d_inode = 0xe0347bc95c18
 d_parent = 0xe0b47a405180
 d_hash.nxt = 0x d_hash.prv = 0xa002002d4bc8
 d_lru.nxt = 0xe0b47a4047b8 d_lru.prv = 0xe0b47a4047b8
 d_child.nxt = 0xe0b47a4048e8 d_child.prv = 0xe0b47a4046a8
 d_subdirs.nxt = 0xe03013818d68 d_subdirs.prv = 0xe0b47a405d28
 d_alias.nxt = 0xe0347bc95c48 d_alias.prv = 0xe0347bc95c48
 d_op = 0xa00100a61870 d_sb = 0xe03003e5ad58

kdb dentry 0xe0b47a405180
Dentry at 0xe0b47a405180
 d_name.len = 1 d_name.name = 0xe0b47a405264 /
 d_count = 11 d_flags = 0x10 d_inode = 0xe0347bc97460
 d_parent = 0xe0b47a405180
 d_hash.nxt = 0x d_hash.prv = 0x
 d_lru.nxt = 0xe0b47a4051d8 d_lru.prv = 0xe0b47a4051d8
 d_child.nxt = 0xe0b47a4051e8 d_child.prv = 0xe0b47a4051e8
 d_subdirs.nxt = 0xe0b47a446ce8 d_subdirs.prv = 0xe0b47a404a08
 d_alias.nxt = 0xe0347bc97490 d_alias.prv = 0xe0347bc97490
 d_op = 0x d_sb = 0xe03003e5ad58

Hex dump of dentry

Re: 2.6.13-rc4 use after free in class_device_attr_show

2005-08-01 Thread Keith Owens
On Tue, 02 Aug 2005 13:05:50 +1000, 
Keith Owens [EMAIL PROTECTED] wrote:
The vcsnn value varies.  I traced the dentry parent chain for the
latest event.  From bottom to top the d_name entries are

  dev, vcs16, vc, class, /.

That makes no sense, why is dev a child of vcs16?  Raw data at the end.

Ignore that bit, I was confusing /dev and dev as a subdir of a sysfs
entry.  The parent chain is right.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.13-rc4 use after free in class_device_attr_show

2005-07-29 Thread Keith Owens
2.6.13-rc4 + kdb, with lots of CONFIG_DEBUG options.  There is an
intermittent use after free in class_device_attr_show.  Reboot with no
changes and the problem does not always recur.

Starting SSH daemon   done
Starting sound driver done
Starting cupsddone
loading ACPI modules () Starting powersaved   done
Try to get initial date and time via NTP from  ntp0   done
Starting network time protocol daemon (NTPD)  done
Starting kernel based NFS server  done
Starting service automounter  done
udev[21369]: Oops 8813272891392 [1]

udev[21369]: Oops 8813272891392 [1]
Modules linked in: md5 ipv6 usbcore raid0 md_mod nls_iso8859_1 nls_cp437 dm_mod 
sg st osst

Pid: 21369, CPU 0, comm: udev
psr : 1010081a6018 ifs : 8308 ip  : []Not 
tainted
ip is at class_device_attr_show+0x50/0xa0
unat:  pfs : 0711 rsc : 0003
rnat: a00100abbae0 bsps: 01fb pr  : 00159659
ldrs:  ccv :  fpsr: 0009804c0270033f
csd :  ssd : 
b0  : a0010025def0 b6  : a001e8c0 b7  : a00100580760
f6  : 1003e6db6db6db6db6db7 f7  : 1003e00c1d6ca
f8  : 1003e00c1d6ca f9  : 1003e054cdf86
f10 : 0 f11 : 0
r1  : a00100ddf0a0 r2  : e03072ab7288 r3  : e0300301c498
r8  :  r9  : a00100bfb5d0 r10 : e03075b28000
r11 : 00c1d6ca r12 : e03076ce7e20 r13 : e03076ce
r14 : a00100580760 r15 : e03075b28000 r16 : 6db6db6db6db6db7
r17 : 2a66fc30 r18 : a0007fff62138000 r19 : e030030102c8
r20 : e0300301 r21 : fffefcf1 r22 : 0010
r23 : a00100d46c50 r24 : a00100bfb5d0 r25 : 054cdf86
r26 : a001009968c8 r27 : e03003015208 r28 : 
r29 : e0300301 r30 : a00100d46c50 r31 : 

Call Trace:
 [] show_stack+0x80/0xa0
sp=e03076ce79c0 bsp=e03076ce10d8
 [] show_regs+0x850/0x880
sp=e03076ce7b90 bsp=e03076ce1078
 [] die+0x280/0x4a0
sp=e03076ce7ba0 bsp=e03076ce1028
 [] ia64_do_page_fault+0x650/0xba0
sp=e03076ce7ba0 bsp=e03076ce0fb8
 [] ia64_leave_kernel+0x0/0x290
sp=e03076ce7c50 bsp=e03076ce0fb8
 [] class_device_attr_show+0x50/0xa0
sp=e03076ce7e20 bsp=e03076ce0f78
 [] sysfs_read_file+0x2b0/0x360
sp=e03076ce7e20 bsp=e03076ce0f08
 [] vfs_read+0x1c0/0x360
sp=e03076ce7e20 bsp=e03076ce0eb0
 [] sys_read+0x80/0xe0
sp=e03076ce7e20 bsp=e03076ce0e38
 [] ia64_ret_from_syscall+0x0/0x20
sp=e03076ce7e30 bsp=e03076ce0e38
kdb> id class_device_attr_show
0xa00100580760 class_device_attr_show[MII]   alloc r36=ar.pfs,8,6,0
0xa00100580766 class_device_attr_show+0x6mov r8=r0;;
0xa0010058076c class_device_attr_show+0xcadds r2=24,r33

0xa00100580770 class_device_attr_show+0x10[MMI]   mov r37=r1
0xa00100580776 class_device_attr_show+0x16mov r39=r34
0xa0010058077c class_device_attr_show+0x1cadds r38=-16,r32

0xa00100580780 class_device_attr_show+0x20[MII]   nop.m 0x0
0xa00100580786 class_device_attr_show+0x26mov r35=b0;;
0xa0010058078c class_device_attr_show+0x2cmov.i ar.pfs=r36

0xa00100580790 class_device_attr_show+0x30[MII]   ld8 r33=[r2]
0xa00100580796 class_device_attr_show+0x36mov b0=r35;;
0xa0010058079c class_device_attr_show+0x3ccmp.eq p8,p9=0,r33

0xa001005807a0 class_device_attr_show+0x40[MBB]   nop.m 0x0
0xa001005807a6 class_device_attr_show+0x46  (p09) br.cond.dpnt.few 
0xa001005807b0 class_device_attr_show+0x50
0xa001005807ac class_device_attr_show+0x4cbr.ret.sptk.many b0

0xa001005807b0 class_device_attr_show+0x50[MMI]   ld8 r8=[r33],8;;
0xa001005807b6 class_device_attr_show+0x56ld8 r1=[r33],-8
0xa001005807bc class_device_attr_show+0x5cmov b7=r8

0xa001005807c0 class_device_attr_show+0x60[MIB]   nop.m 0x0
0xa001005807c6 class_device_attr_show+0x66nop.i 0x0
0xa001005807cc class_device_attr_show+0x6cbr.call.sptk.many 
b0=b7;;

0xa001005807d0 class_device_attr_show+0x70[MII]   mov r1=r37
0xa001005807d6 class_device_attr_show+0x76mov.i 

Re: [PATCH] NMI watch dog notify patch

2005-07-29 Thread Keith Owens
On Fri, 29 Jul 2005 13:55:23 -0700, 
George Anzinger  wrote:
>   This patch adds a notify to the die_nmi notify that the system
>   is about to be taken down.  If the notify is handled with a
>   NOTIFY_STOP return, the system is given a new lease on life.
> 
> void die_nmi (struct pt_regs *regs, const char *msg)
> {
>+  if (notify_die(DIE_NMIWATCHDOG, "nmi_watchdog", regs, 
>+ 0, 0, SIGINT) == NOTIFY_STOP)
>+  return;
>+
>   spin_lock(_print_lock);
>   /*
>   * We are in trouble anyway, lets at least try

Minor nitpick.  die_nmi() already gets a message passed in to
distinguish between different types of nmi.  Pass that message to
notify_die(), on the off chance that the notified routines can use that
difference.

Also your patch adds a trailing whitespace on the call to notify_die().

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Add prefetch switch stack hook in scheduler function

2005-07-29 Thread Keith Owens
On Fri, 29 Jul 2005 00:22:43 -0700, 
"Chen, Kenneth W" <[EMAIL PROTECTED]> wrote:
>On ia64, we have two kernel stacks, one for outgoing task, and one for
>incoming task.  for outgoing task, we haven't called switch_to() yet.
>So the switch stack structure for 'current' will be allocated immediately
>below current 'sp' pointer. For the incoming task, it was fully ctx'ed out
>previously, so switch stack structure is immediate above kernel_stack(next).
>It Would be beneficial to prefetch both stacks.

struct switch_stack for current is all write data, no reading is done.
Is it worth doing prefetchw() for current?  IOW, is there any
measurable performance gain?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Add prefetch switch stack hook in scheduler function

2005-07-29 Thread Keith Owens
On Fri, 29 Jul 2005 09:04:48 +0200, 
Ingo Molnar <[EMAIL PROTECTED]> wrote:
>ok, how about the additional patch below? Does this do the trick on 
>ia64? It makes complete sense on every architecture to prefetch from 
>below the current kernel stack, in the expectation of the next task 
>touching the stack. The only difference is that for ia64 the 'expected 
>minimum stack footprint' is larger, due to the switch_stack.
>...
>Index: linux/kernel/sched.c
>===
>--- linux.orig/kernel/sched.c
>+++ linux/kernel/sched.c
>@@ -2869,7 +2869,14 @@ go_idle:
>* its thread_info, its kernel stack and mm:
>*/
>   prefetch(next->thread_info);
>-  prefetch(kernel_stack(next));
>+  /*
>+   * Prefetch (at least) a cacheline below the current
>+   * kernel stack (in expectation of any new task touching
>+   * the stack at least minimally), and a cacheline above
>+   * the stack:
>+   */
>+  prefetch_range(kernel_stack(next) - MIN_KERNEL_STACK_FOOTPRINT,
>+ MIN_KERNEL_STACK_FOOTPRINT + L1_CACHE_BYTES);
>   prefetch(next->mm);
> 
>   if (!rt_task(next) && next->activated > 0) {

Surely the prefetch range has to depend on which direction the stack
grows.  For stacks that grow down, we want esp/ksp upwards,

prefetch_range(kernel_stack(next),
MIN_KERNEL_STACK_FOOTPRINT + L1_CACHE_BYTES);

For stacks that grow up, we want esp/ksp downwards

prefetch_range(kernel_stack(next) - MIN_KERNEL_STACK_FOOTPRINT,
MIN_KERNEL_STACK_FOOTPRINT + L1_CACHE_BYTES);

BTW, for ia64 you may as well prefetch pt_regs, that is also quite
large.

#define MIN_KERNEL_STACK_FOOTPRINT (IA64_SWITCH_STACK_SIZE + IA64_PT_REGS_SIZE)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Add prefetch switch stack hook in scheduler function

2005-07-29 Thread Keith Owens
On Fri, 29 Jul 2005 09:04:48 +0200, 
Ingo Molnar [EMAIL PROTECTED] wrote:
ok, how about the additional patch below? Does this do the trick on 
ia64? It makes complete sense on every architecture to prefetch from 
below the current kernel stack, in the expectation of the next task 
touching the stack. The only difference is that for ia64 the 'expected 
minimum stack footprint' is larger, due to the switch_stack.
...
Index: linux/kernel/sched.c
===
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -2869,7 +2869,14 @@ go_idle:
* its thread_info, its kernel stack and mm:
*/
   prefetch(next-thread_info);
-  prefetch(kernel_stack(next));
+  /*
+   * Prefetch (at least) a cacheline below the current
+   * kernel stack (in expectation of any new task touching
+   * the stack at least minimally), and a cacheline above
+   * the stack:
+   */
+  prefetch_range(kernel_stack(next) - MIN_KERNEL_STACK_FOOTPRINT,
+ MIN_KERNEL_STACK_FOOTPRINT + L1_CACHE_BYTES);
   prefetch(next-mm);
 
   if (!rt_task(next)  next-activated  0) {

Surely the prefetch range has to depend on which direction the stack
grows.  For stacks that grow down, we want esp/ksp upwards,

prefetch_range(kernel_stack(next),
MIN_KERNEL_STACK_FOOTPRINT + L1_CACHE_BYTES);

For stacks that grow up, we want esp/ksp downwards

prefetch_range(kernel_stack(next) - MIN_KERNEL_STACK_FOOTPRINT,
MIN_KERNEL_STACK_FOOTPRINT + L1_CACHE_BYTES);

BTW, for ia64 you may as well prefetch pt_regs, that is also quite
large.

#define MIN_KERNEL_STACK_FOOTPRINT (IA64_SWITCH_STACK_SIZE + IA64_PT_REGS_SIZE)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Add prefetch switch stack hook in scheduler function

2005-07-29 Thread Keith Owens
On Fri, 29 Jul 2005 00:22:43 -0700, 
Chen, Kenneth W [EMAIL PROTECTED] wrote:
On ia64, we have two kernel stacks, one for outgoing task, and one for
incoming task.  for outgoing task, we haven't called switch_to() yet.
So the switch stack structure for 'current' will be allocated immediately
below current 'sp' pointer. For the incoming task, it was fully ctx'ed out
previously, so switch stack structure is immediate above kernel_stack(next).
It Would be beneficial to prefetch both stacks.

struct switch_stack for current is all write data, no reading is done.
Is it worth doing prefetchw() for current?  IOW, is there any
measurable performance gain?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] NMI watch dog notify patch

2005-07-29 Thread Keith Owens
On Fri, 29 Jul 2005 13:55:23 -0700, 
George Anzinger george@mvista.com wrote:
   This patch adds a notify to the die_nmi notify that the system
   is about to be taken down.  If the notify is handled with a
   NOTIFY_STOP return, the system is given a new lease on life.
 
 void die_nmi (struct pt_regs *regs, const char *msg)
 {
+  if (notify_die(DIE_NMIWATCHDOG, nmi_watchdog, regs, 
+ 0, 0, SIGINT) == NOTIFY_STOP)
+  return;
+
   spin_lock(nmi_print_lock);
   /*
   * We are in trouble anyway, lets at least try

Minor nitpick.  die_nmi() already gets a message passed in to
distinguish between different types of nmi.  Pass that message to
notify_die(), on the off chance that the notified routines can use that
difference.

Also your patch adds a trailing whitespace on the call to notify_die().

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] NMI watch dog notify patch

2005-07-28 Thread Keith Owens
On Thu, 28 Jul 2005 21:16:56 -0700, 
George Anzinger  wrote:
>Keith Owens wrote:
>> On Thu, 28 Jul 2005 13:31:58 -0700, 
>> George Anzinger  wrote:
>> 
>>>I have been doing some work on kgdb to pull a few of it "fingers" out of 
>>>various places in the kernel.  This is the final location where we have 
>>>a kgdb intercept not covered by a notify.
>> 
>> 
>> I like the idea, but the hook should be in die_nmi(), not in the
>> watchdog, using the reason that is already passed into die_nmi.
>> die_nmi() is also called for a real NMI.
>> 
>I had though that too, but it does not allow recovery (i.e. lets reset 
>the watchdog and try again).

die_nmi() returns to nmi_watchdog_tick(), nmi_watchdog_tick does the
reset and continues.  Patch below.

>Hmm.. just looked at traps.c.  Seems die_nmi is NOT called from the nmi 
>trap, only from the watchdog.  Also, there is a notify in the path to 
>the other nmi stuff.

I was looking at unknown_nmi_panic_callback(), which also calls
die_nmi().

traps.c already has several notify_die() calls, nmi.c has none.  It is
cleaner to keep all the notification in traps.c, with this small change
to nmi.c to cope with die_nmi() returning.

Index: linux/arch/i386/kernel/nmi.c
===
--- linux.orig/arch/i386/kernel/nmi.c   2005-07-28 17:22:06.735038510 +1000
+++ linux/arch/i386/kernel/nmi.c2005-07-29 15:19:00.371196596 +1000
@@ -494,8 +494,10 @@ void nmi_watchdog_tick (struct pt_regs *
 * wait a few IRQs (5 seconds) before doing the oops ...
 */
alert_counter[cpu]++;
-   if (alert_counter[cpu] == 5*nmi_hz)
+   if (alert_counter[cpu] == 5*nmi_hz) {
die_nmi(regs, "NMI Watchdog detected LOCKUP");
+   alert_counter[cpu] = 0;
+   }
} else {
last_irq_sums[cpu] = sum;
alert_counter[cpu] = 0;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] NMI watch dog notify patch

2005-07-28 Thread Keith Owens
On Thu, 28 Jul 2005 13:31:58 -0700, 
George Anzinger  wrote:
>I have been doing some work on kgdb to pull a few of it "fingers" out of 
>various places in the kernel.  This is the final location where we have 
>a kgdb intercept not covered by a notify.

I like the idea, but the hook should be in die_nmi(), not in the
watchdog, using the reason that is already passed into die_nmi.
die_nmi() is also called for a real NMI.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Add prefetch switch stack hook in scheduler function

2005-07-28 Thread Keith Owens
On Thu, 28 Jul 2005 09:41:18 +0200,
Ingo Molnar <[EMAIL PROTECTED]> wrote:
>
>* david mosberger <[EMAIL PROTECTED]> wrote:
>
>> Also, should this be called prefetch_stack() or perhaps even just
>> prefetch_task()?  Not every architecture defines a switch_stack
>> structure.
>
>yeah. I'd too suggest to call it prefetch_stack(), and not make it a
>macro & hook but something defined on all arches, with for now only ia64
>having any real code in the inline function.
>
>i'm wondering, is the switch_stack at the same/similar place as
>next->thread_info? If yes then we could simply do a
>prefetch(next->thread_info).

No, they can be up to 30K apart.  See include/asm-ia64/ptrace.h.
thread_info is at ~0xda0, depending on the config.  The switch_stack
can be as high as 0x7bd0 in the kernel stack, depending on why the task
is sleeping.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Add prefetch switch stack hook in scheduler function

2005-07-28 Thread Keith Owens
On Thu, 28 Jul 2005 09:41:18 +0200,
Ingo Molnar [EMAIL PROTECTED] wrote:

* david mosberger [EMAIL PROTECTED] wrote:

 Also, should this be called prefetch_stack() or perhaps even just
 prefetch_task()?  Not every architecture defines a switch_stack
 structure.

yeah. I'd too suggest to call it prefetch_stack(), and not make it a
macro  hook but something defined on all arches, with for now only ia64
having any real code in the inline function.

i'm wondering, is the switch_stack at the same/similar place as
next-thread_info? If yes then we could simply do a
prefetch(next-thread_info).

No, they can be up to 30K apart.  See include/asm-ia64/ptrace.h.
thread_info is at ~0xda0, depending on the config.  The switch_stack
can be as high as 0x7bd0 in the kernel stack, depending on why the task
is sleeping.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] NMI watch dog notify patch

2005-07-28 Thread Keith Owens
On Thu, 28 Jul 2005 13:31:58 -0700, 
George Anzinger george@mvista.com wrote:
I have been doing some work on kgdb to pull a few of it fingers out of 
various places in the kernel.  This is the final location where we have 
a kgdb intercept not covered by a notify.

I like the idea, but the hook should be in die_nmi(), not in the
watchdog, using the reason that is already passed into die_nmi.
die_nmi() is also called for a real NMI.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] NMI watch dog notify patch

2005-07-28 Thread Keith Owens
On Thu, 28 Jul 2005 21:16:56 -0700, 
George Anzinger george@mvista.com wrote:
Keith Owens wrote:
 On Thu, 28 Jul 2005 13:31:58 -0700, 
 George Anzinger george@mvista.com wrote:
 
I have been doing some work on kgdb to pull a few of it fingers out of 
various places in the kernel.  This is the final location where we have 
a kgdb intercept not covered by a notify.
 
 
 I like the idea, but the hook should be in die_nmi(), not in the
 watchdog, using the reason that is already passed into die_nmi.
 die_nmi() is also called for a real NMI.
 
I had though that too, but it does not allow recovery (i.e. lets reset 
the watchdog and try again).

die_nmi() returns to nmi_watchdog_tick(), nmi_watchdog_tick does the
reset and continues.  Patch below.

Hmm.. just looked at traps.c.  Seems die_nmi is NOT called from the nmi 
trap, only from the watchdog.  Also, there is a notify in the path to 
the other nmi stuff.

I was looking at unknown_nmi_panic_callback(), which also calls
die_nmi().

traps.c already has several notify_die() calls, nmi.c has none.  It is
cleaner to keep all the notification in traps.c, with this small change
to nmi.c to cope with die_nmi() returning.

Index: linux/arch/i386/kernel/nmi.c
===
--- linux.orig/arch/i386/kernel/nmi.c   2005-07-28 17:22:06.735038510 +1000
+++ linux/arch/i386/kernel/nmi.c2005-07-29 15:19:00.371196596 +1000
@@ -494,8 +494,10 @@ void nmi_watchdog_tick (struct pt_regs *
 * wait a few IRQs (5 seconds) before doing the oops ...
 */
alert_counter[cpu]++;
-   if (alert_counter[cpu] == 5*nmi_hz)
+   if (alert_counter[cpu] == 5*nmi_hz) {
die_nmi(regs, NMI Watchdog detected LOCKUP);
+   alert_counter[cpu] = 0;
+   }
} else {
last_irq_sums[cpu] = sum;
alert_counter[cpu] = 0;

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3 udev/hotplug use memory after free

2005-07-26 Thread Keith Owens
On Mon, 25 Jul 2005 15:01:19 -0700, 
Andrew Morton <[EMAIL PROTECTED]> wrote:
>Keith Owens <[EMAIL PROTECTED]> wrote:
>>
>> 2.6.13-rc3 + kdb (which does not touch udev/hotplug) on IA64 (Altix).
>>  gcc version 3.3.3 (SuSE Linux).  Compiled with DEBUG_SLAB,
>>  DEBUG_PREEMPT, DEBUG_SPINLOCK, DEBUG_SPINLOCK_SLEEP, DEBUG_KOBJECT.
>> 
>>  There is a use after free somewhere above class_device_attr_show.
>
>Can we obtain a backtrace for this one, Keith?  The function itself is
>pretty innocuous and is used by many callers.  I'd be suspectng a bug in
>the caller.

I no longer have the backtrace.  This 2.6.13-rc3 system has been booted
50+ times (ia64 MCA testing) and only once did it break.  If it recurs, I'll
do some more digging.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3 udev/hotplug use memory after free

2005-07-26 Thread Keith Owens
On Mon, 25 Jul 2005 15:01:19 -0700, 
Andrew Morton [EMAIL PROTECTED] wrote:
Keith Owens [EMAIL PROTECTED] wrote:

 2.6.13-rc3 + kdb (which does not touch udev/hotplug) on IA64 (Altix).
  gcc version 3.3.3 (SuSE Linux).  Compiled with DEBUG_SLAB,
  DEBUG_PREEMPT, DEBUG_SPINLOCK, DEBUG_SPINLOCK_SLEEP, DEBUG_KOBJECT.
 
  There is a use after free somewhere above class_device_attr_show.

Can we obtain a backtrace for this one, Keith?  The function itself is
pretty innocuous and is used by many callers.  I'd be suspectng a bug in
the caller.

I no longer have the backtrace.  This 2.6.13-rc3 system has been booted
50+ times (ia64 MCA testing) and only once did it break.  If it recurs, I'll
do some more digging.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why build empty object files in drivers/media?

2005-07-24 Thread Keith Owens
On Fri, 22 Jul 2005 19:46:00 +, 
Sam Ravnborg <[EMAIL PROTECTED]> wrote:
>On Thu, Jul 21, 2005 at 11:06:21PM -0400, Chuck Ebbert wrote:
>> 
>> I have this in my .config file for 2.6.13-rc3:
>> 
>> 
>> #
>> # Multimedia devices
>> #
>> # CONFIG_VIDEO_DEV is not set
>> 
>> #
>> # Digital Video Broadcasting Devices
>> #
>> # CONFIG_DVB is not set
>> 
>> 
>> And yet these completely empty files are being built:
>> 
>> ...
>kbuild is told to visit these directories - and then it build an empty
>.o file to make linking step possible.
>The only solution is to tell kbuild not to visit these directories
>unless they are in real use.
>Following untested patch should do the trick. But the media people must
>check if before being applied since I have only taken a brief look at
>the Kconfig and Makefile files.
>
>   Sam
>
>diff --git a/drivers/media/Makefile b/drivers/media/Makefile
>--- a/drivers/media/Makefile
>+++ b/drivers/media/Makefile
>@@ -2,4 +2,7 @@
> # Makefile for the kernel multimedia device drivers.
> #
> 
>-obj-y:= video/ radio/ dvb/ common/
>+obj-y   := common/
>+obj-$(CONFIG_VIDEO_DEV) := video/
>+obj-$(CONFIG_VIDEO_DEV) := radio/
>+obj-$(CONFIG_DVB)   := dvb/

That should be +=, not :=

+obj-$(CONFIG_VIDEO_DEV) += video/
+obj-$(CONFIG_VIDEO_DEV) += radio/
+obj-$(CONFIG_DVB)   += dvb/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >