Re: [BUG] __copy_to_user_inatomic broken on non Pentium machines

2007-03-25 Thread Thomas Gleixner
On Sun, 2007-03-25 at 11:14 -0700, Linus Torvalds wrote:
> > Environment: Pre Pentium systems, (boot_cpu_data.wp_works_ok == 0)
> 
> This shouldn't be "pre-pentium", afaik. WP-works-ok on i486 too. I think 
> only the original i386 had this bug ("feature").
>
> But I agree, it does seem to be broken on such machines (I assume you 
> don't actually have one, but just tested by forcing it by hand ;)

Yes, it's a genuine i386 embedded system and AFAIK the same feature is
available on 486 clones. i386 and Co are still in used in the embedded
space.

tglx




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] __copy_to_user_inatomic broken on non Pentium machines

2007-03-25 Thread Ingo Molnar

* Linus Torvalds <[EMAIL PROTECTED]> wrote:

> On Sun, 25 Mar 2007, Thomas Gleixner wrote:
> >
> > Environment: Pre Pentium systems, (boot_cpu_data.wp_works_ok == 0)
> 
> This shouldn't be "pre-pentium", afaik. WP-works-ok on i486 too. I 
> think only the original i386 had this bug ("feature").
> 
> But I agree, it does seem to be broken on such machines (I assume you 
> don't actually have one, but just tested by forcing it by hand ;)

actually, AFAIK this is a genuine i386 box Thomas has (an embedded 
board). Our hardware legacies and the resulting dependencies _really_ 
stick around for quite long time :-/

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] __copy_to_user_inatomic broken on non Pentium machines

2007-03-25 Thread Linus Torvalds


On Sun, 25 Mar 2007, Thomas Gleixner wrote:
>
> Environment: Pre Pentium systems, (boot_cpu_data.wp_works_ok == 0)

This shouldn't be "pre-pentium", afaik. WP-works-ok on i486 too. I think 
only the original i386 had this bug ("feature").

But I agree, it does seem to be broken on such machines (I assume you 
don't actually have one, but just tested by forcing it by hand ;)

> Now __copy_to_user_ll() takes the (boot_cpu_data.wp_works_ok == 0) path,
> which in turn calls 
> 
> down_read(current->mm->mmap_sem) - which might sleep
> 
> and
> 
> get_user_pages() - which has a cond_resched() inside.
> 
> Not sure how to fix that.

I agree. Nasty. But the thing is, it's actually much worse. We use 
"__put_user()" earlier to try to fault it in writably, and that one is 
totally broken on a CPU where wp_works_ok isn't set.

The whole notion that we should do this at access time is broken.

We should go back to doing it at "access_ok()", or we should just state 
that we don't support original-i386 CPU's any more. As it is, we don't do 
it right *anyway*, since we only do the tests properly in 
__copy_to_user(), and totally miss them in __put_user() and friends.

So it's buggy on i386 however you try to fix it. The only way to fix it 
properly is to move the i386 fixup early, into "access_ok()", the way it 
used to be. 

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[BUG] __copy_to_user_inatomic broken on non Pentium machines

2007-03-25 Thread Thomas Gleixner
Environment: Pre Pentium systems, (boot_cpu_data.wp_works_ok == 0)
Last known working kernel: 2.6.18 (did not try 2.6.19 yet)

Enabling CONFIG_PREEMPT on latest mainline as well as 2.6.20 trigger

[   14.15] BUG: sleeping function called from invalid context at 
/home/tglx/work/kernel/vanilla/linux-2.6.20/kernel/rwsem.c:20
[   14.16] in_atomic():1, irqs_disabled():0
[   14.16] no locks held by init/1.
[   14.17]  [] show_trace_log_lvl+0x1a/0x2f
[   14.18]  [] show_trace+0x12/0x14
[   14.19]  [] dump_stack+0x16/0x18
[   14.19]  [] __might_sleep+0xc7/0xcd
[   14.20]  [] down_read+0x18/0x47
[   14.21]  [] __copy_to_user_ll+0x5e/0x1b6
[   14.22]  [] file_read_actor+0x10b/0x149
[   14.23]  [] do_generic_mapping_read+0x187/0x433
[   14.24]  [] generic_file_aio_read+0x191/0x1ca
[   14.24]  [] do_sync_read+0xc2/0xff
[   14.25]  [] vfs_read+0x90/0x145
[   14.26]  [] sys_read+0x3f/0x63
[   14.27]  [] syscall_call+0x7/0xb
[   14.27]  ===

and 

[   22.66] BUG: scheduling while atomic: e2fsck/0x1001/272
[   22.67] 1 lock held by e2fsck/272:
[   22.68]  #0:  (>mmap_sem){}, at: [] 
__copy_to_user_ll+0x5e/0x1b6
[   22.69]  [] show_trace_log_lvl+0x1a/0x2f
[   22.70]  [] show_trace+0x12/0x14
[   22.71]  [] dump_stack+0x16/0x18
[   22.72]  [] __sched_text_start+0x71/0x57f
[   22.72]  [] __cond_resched+0x21/0x3b
[   22.73]  [] cond_resched+0x26/0x31
[   22.74]  [] get_user_pages+0x1e1/0x23c
[   22.75]  [] __copy_to_user_ll+0x98/0x1b6
[   22.76]  [] file_read_actor+0x10b/0x149
[   22.77]  [] do_generic_mapping_read+0x187/0x433
[   22.78]  [] generic_file_aio_read+0x191/0x1ca
[   22.79]  [] do_sync_read+0xc2/0xff
[   22.79]  [] vfs_read+0x90/0x145
[   22.80]  [] sys_read+0x3f/0x63
[   22.81]  [] syscall_call+0x7/0xb
[   22.82]  ===

which is not surprising. 

int file_read_actor(read_descriptor_t *desc, struct page *page,
unsigned long offset, unsigned long size)
{

/*
 * Faults on the destination of a read are common, so do it before
 * taking the kmap.
 */
if (!fault_in_pages_writeable(desc->arg.buf, size)) {
kaddr = kmap_atomic(page, KM_USER0);
>   left = __copy_to_user_inatomic(desc->arg.buf,
kaddr + offset, size);

is called with preempt_count == 1, due to the kmap_atomic() above.

Now __copy_to_user_ll() takes the (boot_cpu_data.wp_works_ok == 0) path,
which in turn calls 

down_read(current->mm->mmap_sem) - which might sleep

and

get_user_pages() - which has a cond_resched() inside.

Not sure how to fix that.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[BUG] __copy_to_user_inatomic broken on non Pentium machines

2007-03-25 Thread Thomas Gleixner
Environment: Pre Pentium systems, (boot_cpu_data.wp_works_ok == 0)
Last known working kernel: 2.6.18 (did not try 2.6.19 yet)

Enabling CONFIG_PREEMPT on latest mainline as well as 2.6.20 trigger

[   14.15] BUG: sleeping function called from invalid context at 
/home/tglx/work/kernel/vanilla/linux-2.6.20/kernel/rwsem.c:20
[   14.16] in_atomic():1, irqs_disabled():0
[   14.16] no locks held by init/1.
[   14.17]  [c0103346] show_trace_log_lvl+0x1a/0x2f
[   14.18]  [c0103441] show_trace+0x12/0x14
[   14.19]  [c0103cf5] dump_stack+0x16/0x18
[   14.19]  [c010aa62] __might_sleep+0xc7/0xcd
[   14.20]  [c01213a1] down_read+0x18/0x47
[   14.21]  [c01a01e4] __copy_to_user_ll+0x5e/0x1b6
[   14.22]  [c012cf85] file_read_actor+0x10b/0x149
[   14.23]  [c012d7b2] do_generic_mapping_read+0x187/0x433
[   14.24]  [c012f64b] generic_file_aio_read+0x191/0x1ca
[   14.24]  [c0141657] do_sync_read+0xc2/0xff
[   14.25]  [c0141eb6] vfs_read+0x90/0x145
[   14.26]  [c014227e] sys_read+0x3f/0x63
[   14.27]  [c0102fb0] syscall_call+0x7/0xb
[   14.27]  ===

and 

[   22.66] BUG: scheduling while atomic: e2fsck/0x1001/272
[   22.67] 1 lock held by e2fsck/272:
[   22.68]  #0:  (mm-mmap_sem){}, at: [c01a01e4] 
__copy_to_user_ll+0x5e/0x1b6
[   22.69]  [c0103346] show_trace_log_lvl+0x1a/0x2f
[   22.70]  [c0103441] show_trace+0x12/0x14
[   22.71]  [c0103cf5] dump_stack+0x16/0x18
[   22.72]  [c024a189] __sched_text_start+0x71/0x57f
[   22.72]  [c010b49f] __cond_resched+0x21/0x3b
[   22.73]  [c024aca7] cond_resched+0x26/0x31
[   22.74]  [c0137ae5] get_user_pages+0x1e1/0x23c
[   22.75]  [c01a021e] __copy_to_user_ll+0x98/0x1b6
[   22.76]  [c012cf85] file_read_actor+0x10b/0x149
[   22.77]  [c012d7b2] do_generic_mapping_read+0x187/0x433
[   22.78]  [c012f64b] generic_file_aio_read+0x191/0x1ca
[   22.79]  [c0141657] do_sync_read+0xc2/0xff
[   22.79]  [c0141eb6] vfs_read+0x90/0x145
[   22.80]  [c014227e] sys_read+0x3f/0x63
[   22.81]  [c0102fb0] syscall_call+0x7/0xb
[   22.82]  ===

which is not surprising. 

int file_read_actor(read_descriptor_t *desc, struct page *page,
unsigned long offset, unsigned long size)
{

/*
 * Faults on the destination of a read are common, so do it before
 * taking the kmap.
 */
if (!fault_in_pages_writeable(desc-arg.buf, size)) {
kaddr = kmap_atomic(page, KM_USER0);
   left = __copy_to_user_inatomic(desc-arg.buf,
kaddr + offset, size);

is called with preempt_count == 1, due to the kmap_atomic() above.

Now __copy_to_user_ll() takes the (boot_cpu_data.wp_works_ok == 0) path,
which in turn calls 

down_read(current-mm-mmap_sem) - which might sleep

and

get_user_pages() - which has a cond_resched() inside.

Not sure how to fix that.

tglx


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] __copy_to_user_inatomic broken on non Pentium machines

2007-03-25 Thread Linus Torvalds


On Sun, 25 Mar 2007, Thomas Gleixner wrote:

 Environment: Pre Pentium systems, (boot_cpu_data.wp_works_ok == 0)

This shouldn't be pre-pentium, afaik. WP-works-ok on i486 too. I think 
only the original i386 had this bug (feature).

But I agree, it does seem to be broken on such machines (I assume you 
don't actually have one, but just tested by forcing it by hand ;)

 Now __copy_to_user_ll() takes the (boot_cpu_data.wp_works_ok == 0) path,
 which in turn calls 
 
 down_read(current-mm-mmap_sem) - which might sleep
 
 and
 
 get_user_pages() - which has a cond_resched() inside.
 
 Not sure how to fix that.

I agree. Nasty. But the thing is, it's actually much worse. We use 
__put_user() earlier to try to fault it in writably, and that one is 
totally broken on a CPU where wp_works_ok isn't set.

The whole notion that we should do this at access time is broken.

We should go back to doing it at access_ok(), or we should just state 
that we don't support original-i386 CPU's any more. As it is, we don't do 
it right *anyway*, since we only do the tests properly in 
__copy_to_user(), and totally miss them in __put_user() and friends.

So it's buggy on i386 however you try to fix it. The only way to fix it 
properly is to move the i386 fixup early, into access_ok(), the way it 
used to be. 

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] __copy_to_user_inatomic broken on non Pentium machines

2007-03-25 Thread Ingo Molnar

* Linus Torvalds [EMAIL PROTECTED] wrote:

 On Sun, 25 Mar 2007, Thomas Gleixner wrote:
 
  Environment: Pre Pentium systems, (boot_cpu_data.wp_works_ok == 0)
 
 This shouldn't be pre-pentium, afaik. WP-works-ok on i486 too. I 
 think only the original i386 had this bug (feature).
 
 But I agree, it does seem to be broken on such machines (I assume you 
 don't actually have one, but just tested by forcing it by hand ;)

actually, AFAIK this is a genuine i386 box Thomas has (an embedded 
board). Our hardware legacies and the resulting dependencies _really_ 
stick around for quite long time :-/

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] __copy_to_user_inatomic broken on non Pentium machines

2007-03-25 Thread Thomas Gleixner
On Sun, 2007-03-25 at 11:14 -0700, Linus Torvalds wrote:
  Environment: Pre Pentium systems, (boot_cpu_data.wp_works_ok == 0)
 
 This shouldn't be pre-pentium, afaik. WP-works-ok on i486 too. I think 
 only the original i386 had this bug (feature).

 But I agree, it does seem to be broken on such machines (I assume you 
 don't actually have one, but just tested by forcing it by hand ;)

Yes, it's a genuine i386 embedded system and AFAIK the same feature is
available on 486 clones. i386 and Co are still in used in the embedded
space.

tglx




-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/