Re: Alpha SMP problem

2000-12-01 Thread Andrea Arcangeli

On Fri, Dec 01, 2000 at 08:28:57AM -0800, Reto Baettig wrote:
> Is there any chance that we will see this patch as well as your other
> Alpha patches included in future 2.2.X and 2.4.X releases?

Yes, for 2.2.x I'm waiting 2.2.19pre, for 2.4.x as DaveM suggested we first
need to cleanup the interface of the context[] information to optimize the
memory usage on x86*/sparc64 etc...

It would be nice if in the meantime somebody with an old ev4 based machine
(possibly SMP if it exists) could verify that current 2.2.x and 2.4.x patches
works well there too.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Alpha SMP problem

2000-12-01 Thread Reto Baettig

Hi!

It's great that you could fix that! 

Is there any chance that we will see this patch as well as your other
Alpha patches included in future 2.2.X and 2.4.X releases?

Thanks,

Reto

Andrea Arcangeli wrote:
> 
> There were a few SMP races that could trigger only using threads:
> 
> 1) flush_tlb_other could happen after we read the mm->context and we could
>miss a tlb flush
> 2) flush_tlb_current could bump up the asn of the current cpu and in turn
>change the asn version after we acquired a new context leading to
>an alias between our asn and a later one
> 3) a PAL_swpctx can't be done in the middle of alpha_switch_to
> 
> ppc/sparc64 may have similar issues and I didn't checked them (from a fast read
> it looks like sparc64 is just safe but I don't know the sparc hardware
> well enough to be sure).
> 
> I also noticed the horrible implementation of ASN in SMP so while I was
> there I rewrote it.
> 
> The rewrote is based on the fact that mm->context makes no sense. It must be an
> array of mm->context[NR_CPUS]. Almost certainly mips wants an array of NR_CPUS
> too. Anyways for mips it's not a big deal since SMP isn't supported in 2.2.x ;).
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Alpha SMP problem

2000-12-01 Thread Reto Baettig

Hi!

It's great that you could fix that! 

Is there any chance that we will see this patch as well as your other
Alpha patches included in future 2.2.X and 2.4.X releases?

Thanks,

Reto

Andrea Arcangeli wrote:
 
 There were a few SMP races that could trigger only using threads:
 
 1) flush_tlb_other could happen after we read the mm-context and we could
miss a tlb flush
 2) flush_tlb_current could bump up the asn of the current cpu and in turn
change the asn version after we acquired a new context leading to
an alias between our asn and a later one
 3) a PAL_swpctx can't be done in the middle of alpha_switch_to
 
 ppc/sparc64 may have similar issues and I didn't checked them (from a fast read
 it looks like sparc64 is just safe but I don't know the sparc hardware
 well enough to be sure).
 
 I also noticed the horrible implementation of ASN in SMP so while I was
 there I rewrote it.
 
 The rewrote is based on the fact that mm-context makes no sense. It must be an
 array of mm-context[NR_CPUS]. Almost certainly mips wants an array of NR_CPUS
 too. Anyways for mips it's not a big deal since SMP isn't supported in 2.2.x ;).

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Alpha SMP problem

2000-12-01 Thread Andrea Arcangeli

On Fri, Dec 01, 2000 at 08:28:57AM -0800, Reto Baettig wrote:
 Is there any chance that we will see this patch as well as your other
 Alpha patches included in future 2.2.X and 2.4.X releases?

Yes, for 2.2.x I'm waiting 2.2.19pre, for 2.4.x as DaveM suggested we first
need to cleanup the interface of the context[] information to optimize the
memory usage on x86*/sparc64 etc...

It would be nice if in the meantime somebody with an old ev4 based machine
(possibly SMP if it exists) could verify that current 2.2.x and 2.4.x patches
works well there too.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Alpha SMP problem

2000-11-23 Thread Andrea Arcangeli

On Tue, Nov 07, 2000 at 10:57:49PM -0800, Richard Henderson wrote:
> On Tue, Nov 07, 2000 at 10:09:34AM -0800, Reto Baettig wrote:
> > I have a problem whith Alpha SMP's which seems to be kernel-related. I
> > discussed this on the bug-glibc list but everybody seems to agree that
> > it cannot be a libc problem.
> 
> Indeed it does seem to be some sort of tlb flushing problem,

Yes it was.

> but I've been unable to figure out exactly what.

There were a few SMP races that could trigger only using threads:

1) flush_tlb_other could happen after we read the mm->context and we could
   miss a tlb flush
2) flush_tlb_current could bump up the asn of the current cpu and in turn
   change the asn version after we acquired a new context leading to
   an alias between our asn and a later one
3) a PAL_swpctx can't be done in the middle of alpha_switch_to

ppc/sparc64 may have similar issues and I didn't checked them (from a fast read
it looks like sparc64 is just safe but I don't know the sparc hardware
well enough to be sure).

I also noticed the horrible implementation of ASN in SMP so while I was
there I rewrote it.

The rewrote is based on the fact that mm->context makes no sense. It must be an
array of mm->context[NR_CPUS]. Almost certainly mips wants an array of NR_CPUS
too. Anyways for mips it's not a big deal since SMP isn't supported in 2.2.x ;).

In 2.2.x I added:

#ifdef __alpha__

in the mm.h code, so that people can apply this patch kernel without breaking
compiles of all other architectures.

For 2.4.x I'd like to know what sparc64 and ppc wants as mm->context, alpha
definitely wants a per-CPU array (probably mips too).

With a single mm->context with threads both cpus was going to overwrite
the same field at the same time. This just made mm->context useless
and it leads to overflow of asn even if there's only 1 MM running in the
system.

And the old implementation wasn't only bad for threads but it was bad
also for regular processes. Every time a task was changing CPU an ASN
was wasted. After 512 changes of CPU of the same task the tlb was flushed
on both cpus even if there was only 1 or two programs running.

With this new design up to 256 different MM (they could belong to 10 threads
each or to a single task each) can run in a SMP system without generating any
tlb flush (aka ASN overflow) in any CPU regardless of the MM migration between
cpus or of the context switches between task and threads.

--- 2.2.18pre21aa2/arch/alpha/kernel/smp.c.~1~  Wed Nov 22 02:32:53 2000
+++ 2.2.18pre21aa2/arch/alpha/kernel/smp.c  Thu Nov 23 04:48:24 2000
@@ -95,8 +95,7 @@
 smp_store_cpu_info(int cpuid)
 {
cpu_data[cpuid].loops_per_jiffy = loops_per_jiffy;
-   cpu_data[cpuid].last_asn
- = (cpuid << WIDTH_HARDWARE_ASN) + ASN_FIRST_VERSION;
+   cpu_data[cpuid].last_asn = ASN_FIRST_VERSION;
 
 cpu_data[cpuid].irq_count = 0;
 cpu_data[cpuid].bh_count = 0;
@@ -905,6 +904,8 @@
struct mm_struct *mm = (struct mm_struct *) x;
if (mm == current->mm)
flush_tlb_current(mm);
+   else
+   flush_tlb_other(mm);
 }
 
 void
@@ -912,10 +913,17 @@
 {
if (mm == current->mm) {
flush_tlb_current(mm);
-   if (atomic_read(>count) == 1)
+   if (atomic_read(>count) == 1) {
+   int i, cpu, this_cpu = smp_processor_id();
+   for (i = 0; i < smp_num_cpus; i++) {
+   cpu = cpu_logical_map(i);
+   if (cpu == this_cpu)
+   continue;
+   mm->context[cpu] = 0;
+   }
return;
-   } else
-   flush_tlb_other(mm);
+   }
+   }
 
if (smp_call_function(ipi_flush_tlb_mm, mm, 1, 1)) {
printk(KERN_CRIT "flush_tlb_mm: timed out\n");
@@ -932,8 +940,12 @@
 ipi_flush_tlb_page(void *x)
 {
struct flush_tlb_page_struct *data = (struct flush_tlb_page_struct *)x;
-   if (data->mm == current->mm)
-   flush_tlb_current_page(data->mm, data->vma, data->addr);
+   struct mm_struct * mm = data->mm;
+
+   if (mm == current->mm)
+   flush_tlb_current_page(mm, data->vma, data->addr);
+   else
+   flush_tlb_other(mm);
 }
 
 void
@@ -944,10 +956,17 @@
 
if (mm == current->mm) {
flush_tlb_current_page(mm, vma, addr);
-   if (atomic_read(>mm->count) == 1)
+   if (atomic_read(>mm->count) == 1) {
+   int i, cpu, this_cpu = smp_processor_id();
+   for (i = 0; i < smp_num_cpus; i++) {
+   cpu = cpu_logical_map(i);
+   if (cpu == this_cpu)
+   continue;
+   mm->context[cpu] = 0;
+   }
  

Re: Alpha SMP problem

2000-11-23 Thread Andrea Arcangeli

On Tue, Nov 07, 2000 at 10:57:49PM -0800, Richard Henderson wrote:
 On Tue, Nov 07, 2000 at 10:09:34AM -0800, Reto Baettig wrote:
  I have a problem whith Alpha SMP's which seems to be kernel-related. I
  discussed this on the bug-glibc list but everybody seems to agree that
  it cannot be a libc problem.
 
 Indeed it does seem to be some sort of tlb flushing problem,

Yes it was.

 but I've been unable to figure out exactly what.

There were a few SMP races that could trigger only using threads:

1) flush_tlb_other could happen after we read the mm-context and we could
   miss a tlb flush
2) flush_tlb_current could bump up the asn of the current cpu and in turn
   change the asn version after we acquired a new context leading to
   an alias between our asn and a later one
3) a PAL_swpctx can't be done in the middle of alpha_switch_to

ppc/sparc64 may have similar issues and I didn't checked them (from a fast read
it looks like sparc64 is just safe but I don't know the sparc hardware
well enough to be sure).

I also noticed the horrible implementation of ASN in SMP so while I was
there I rewrote it.

The rewrote is based on the fact that mm-context makes no sense. It must be an
array of mm-context[NR_CPUS]. Almost certainly mips wants an array of NR_CPUS
too. Anyways for mips it's not a big deal since SMP isn't supported in 2.2.x ;).

In 2.2.x I added:

#ifdef __alpha__

in the mm.h code, so that people can apply this patch kernel without breaking
compiles of all other architectures.

For 2.4.x I'd like to know what sparc64 and ppc wants as mm-context, alpha
definitely wants a per-CPU array (probably mips too).

With a single mm-context with threads both cpus was going to overwrite
the same field at the same time. This just made mm-context useless
and it leads to overflow of asn even if there's only 1 MM running in the
system.

And the old implementation wasn't only bad for threads but it was bad
also for regular processes. Every time a task was changing CPU an ASN
was wasted. After 512 changes of CPU of the same task the tlb was flushed
on both cpus even if there was only 1 or two programs running.

With this new design up to 256 different MM (they could belong to 10 threads
each or to a single task each) can run in a SMP system without generating any
tlb flush (aka ASN overflow) in any CPU regardless of the MM migration between
cpus or of the context switches between task and threads.

--- 2.2.18pre21aa2/arch/alpha/kernel/smp.c.~1~  Wed Nov 22 02:32:53 2000
+++ 2.2.18pre21aa2/arch/alpha/kernel/smp.c  Thu Nov 23 04:48:24 2000
@@ -95,8 +95,7 @@
 smp_store_cpu_info(int cpuid)
 {
cpu_data[cpuid].loops_per_jiffy = loops_per_jiffy;
-   cpu_data[cpuid].last_asn
- = (cpuid  WIDTH_HARDWARE_ASN) + ASN_FIRST_VERSION;
+   cpu_data[cpuid].last_asn = ASN_FIRST_VERSION;
 
 cpu_data[cpuid].irq_count = 0;
 cpu_data[cpuid].bh_count = 0;
@@ -905,6 +904,8 @@
struct mm_struct *mm = (struct mm_struct *) x;
if (mm == current-mm)
flush_tlb_current(mm);
+   else
+   flush_tlb_other(mm);
 }
 
 void
@@ -912,10 +913,17 @@
 {
if (mm == current-mm) {
flush_tlb_current(mm);
-   if (atomic_read(mm-count) == 1)
+   if (atomic_read(mm-count) == 1) {
+   int i, cpu, this_cpu = smp_processor_id();
+   for (i = 0; i  smp_num_cpus; i++) {
+   cpu = cpu_logical_map(i);
+   if (cpu == this_cpu)
+   continue;
+   mm-context[cpu] = 0;
+   }
return;
-   } else
-   flush_tlb_other(mm);
+   }
+   }
 
if (smp_call_function(ipi_flush_tlb_mm, mm, 1, 1)) {
printk(KERN_CRIT "flush_tlb_mm: timed out\n");
@@ -932,8 +940,12 @@
 ipi_flush_tlb_page(void *x)
 {
struct flush_tlb_page_struct *data = (struct flush_tlb_page_struct *)x;
-   if (data-mm == current-mm)
-   flush_tlb_current_page(data-mm, data-vma, data-addr);
+   struct mm_struct * mm = data-mm;
+
+   if (mm == current-mm)
+   flush_tlb_current_page(mm, data-vma, data-addr);
+   else
+   flush_tlb_other(mm);
 }
 
 void
@@ -944,10 +956,17 @@
 
if (mm == current-mm) {
flush_tlb_current_page(mm, vma, addr);
-   if (atomic_read(current-mm-count) == 1)
+   if (atomic_read(current-mm-count) == 1) {
+   int i, cpu, this_cpu = smp_processor_id();
+   for (i = 0; i  smp_num_cpus; i++) {
+   cpu = cpu_logical_map(i);
+   if (cpu == this_cpu)
+   continue;
+   mm-context[cpu] = 0;
+   }
return;
-   

Re: Alpha SMP problem

2000-11-07 Thread Richard Henderson

On Tue, Nov 07, 2000 at 10:09:34AM -0800, Reto Baettig wrote:
> I have a problem whith Alpha SMP's which seems to be kernel-related. I
> discussed this on the bug-glibc list but everybody seems to agree that
> it cannot be a libc problem.

Indeed it does seem to be some sort of tlb flushing problem,
but I've been unable to figure out exactly what.


r~
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Alpha SMP problem

2000-11-07 Thread Reto Baettig

Update: I just tested it on Alpha UP and everything's fine. It really
seems to be a SMP problem...

Reto Baettig wrote:
> 
> Hi
> 
> I have a problem whith Alpha SMP's which seems to be kernel-related. I
> discussed this on the bug-glibc list but everybody seems to agree that
> it cannot be a libc problem.
> 
> I attached a little testprogram which reproduces the bug in < 1Minute.
> BUT: IT MUST BE STARTED AT LEAST TWICE!
> 
> The strange thing is that a single instance of the program runs just
> fine. When I start the program a second time, I get segfaults and/or
> stuck threads.
> 
> We could reproduce this behaviour on different Machines, both with linux
> 2.2.14 and 2.4.0-test10, but
> ONLY ON ALPHA SMP MACHINES.
> 
> Here's my configuration:
> 
> Linux reto1 2.4.0-test10 #2 SMP Tue Oct 31 19:39:51 PST 2000 alpha
> unknown
> ^^^  ^
> Kernel modules 2.3.19
> Gnu C  egcs-2.91.66
> Gnu Make   3.78.1
> Binutils   2.9.5.0.22
> Linux C Library2.1.3
> Dynamic linker ldd (GNU libc) 2.1.3
> Procps 2.0.6
> Mount  2.10f
> Net-tools  1.54
> Console-tools  0.3.3
> Sh-utils   2.0
> Modules Loaded nfs lockd sunrpc
> 
> Any ideas?
> 
> Please tell me when you need more information, or give me some pointers
> where I could start to dig...
> 
> TIA
> 
> Reto
> 
>   
>  Name: malloctest.tgz
>malloctest.tgzType: unspecified type (application/octet-stream)
>  Encoding: base64
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Alpha SMP problem

2000-11-07 Thread Reto Baettig

Update: I just tested it on Alpha UP and everything's fine. It really
seems to be a SMP problem...

Reto Baettig wrote:
 
 Hi
 
 I have a problem whith Alpha SMP's which seems to be kernel-related. I
 discussed this on the bug-glibc list but everybody seems to agree that
 it cannot be a libc problem.
 
 I attached a little testprogram which reproduces the bug in  1Minute.
 BUT: IT MUST BE STARTED AT LEAST TWICE!
 
 The strange thing is that a single instance of the program runs just
 fine. When I start the program a second time, I get segfaults and/or
 stuck threads.
 
 We could reproduce this behaviour on different Machines, both with linux
 2.2.14 and 2.4.0-test10, but
 ONLY ON ALPHA SMP MACHINES.
 
 Here's my configuration:
 
 Linux reto1 2.4.0-test10 #2 SMP Tue Oct 31 19:39:51 PST 2000 alpha
 unknown
 ^^^  ^
 Kernel modules 2.3.19
 Gnu C  egcs-2.91.66
 Gnu Make   3.78.1
 Binutils   2.9.5.0.22
 Linux C Library2.1.3
 Dynamic linker ldd (GNU libc) 2.1.3
 Procps 2.0.6
 Mount  2.10f
 Net-tools  1.54
 Console-tools  0.3.3
 Sh-utils   2.0
 Modules Loaded nfs lockd sunrpc
 
 Any ideas?
 
 Please tell me when you need more information, or give me some pointers
 where I could start to dig...
 
 TIA
 
 Reto
 
   
  Name: malloctest.tgz
malloctest.tgzType: unspecified type (application/octet-stream)
  Encoding: base64
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Alpha SMP problem

2000-11-07 Thread Richard Henderson

On Tue, Nov 07, 2000 at 10:09:34AM -0800, Reto Baettig wrote:
 I have a problem whith Alpha SMP's which seems to be kernel-related. I
 discussed this on the bug-glibc list but everybody seems to agree that
 it cannot be a libc problem.

Indeed it does seem to be some sort of tlb flushing problem,
but I've been unable to figure out exactly what.


r~
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/