Re: Alpha SMP problem
On Fri, Dec 01, 2000 at 08:28:57AM -0800, Reto Baettig wrote: > Is there any chance that we will see this patch as well as your other > Alpha patches included in future 2.2.X and 2.4.X releases? Yes, for 2.2.x I'm waiting 2.2.19pre, for 2.4.x as DaveM suggested we first need to cleanup the interface of the context[] information to optimize the memory usage on x86*/sparc64 etc... It would be nice if in the meantime somebody with an old ev4 based machine (possibly SMP if it exists) could verify that current 2.2.x and 2.4.x patches works well there too. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Alpha SMP problem
Hi! It's great that you could fix that! Is there any chance that we will see this patch as well as your other Alpha patches included in future 2.2.X and 2.4.X releases? Thanks, Reto Andrea Arcangeli wrote: > > There were a few SMP races that could trigger only using threads: > > 1) flush_tlb_other could happen after we read the mm->context and we could >miss a tlb flush > 2) flush_tlb_current could bump up the asn of the current cpu and in turn >change the asn version after we acquired a new context leading to >an alias between our asn and a later one > 3) a PAL_swpctx can't be done in the middle of alpha_switch_to > > ppc/sparc64 may have similar issues and I didn't checked them (from a fast read > it looks like sparc64 is just safe but I don't know the sparc hardware > well enough to be sure). > > I also noticed the horrible implementation of ASN in SMP so while I was > there I rewrote it. > > The rewrote is based on the fact that mm->context makes no sense. It must be an > array of mm->context[NR_CPUS]. Almost certainly mips wants an array of NR_CPUS > too. Anyways for mips it's not a big deal since SMP isn't supported in 2.2.x ;). > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Alpha SMP problem
Hi! It's great that you could fix that! Is there any chance that we will see this patch as well as your other Alpha patches included in future 2.2.X and 2.4.X releases? Thanks, Reto Andrea Arcangeli wrote: There were a few SMP races that could trigger only using threads: 1) flush_tlb_other could happen after we read the mm-context and we could miss a tlb flush 2) flush_tlb_current could bump up the asn of the current cpu and in turn change the asn version after we acquired a new context leading to an alias between our asn and a later one 3) a PAL_swpctx can't be done in the middle of alpha_switch_to ppc/sparc64 may have similar issues and I didn't checked them (from a fast read it looks like sparc64 is just safe but I don't know the sparc hardware well enough to be sure). I also noticed the horrible implementation of ASN in SMP so while I was there I rewrote it. The rewrote is based on the fact that mm-context makes no sense. It must be an array of mm-context[NR_CPUS]. Almost certainly mips wants an array of NR_CPUS too. Anyways for mips it's not a big deal since SMP isn't supported in 2.2.x ;). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Alpha SMP problem
On Fri, Dec 01, 2000 at 08:28:57AM -0800, Reto Baettig wrote: Is there any chance that we will see this patch as well as your other Alpha patches included in future 2.2.X and 2.4.X releases? Yes, for 2.2.x I'm waiting 2.2.19pre, for 2.4.x as DaveM suggested we first need to cleanup the interface of the context[] information to optimize the memory usage on x86*/sparc64 etc... It would be nice if in the meantime somebody with an old ev4 based machine (possibly SMP if it exists) could verify that current 2.2.x and 2.4.x patches works well there too. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Alpha SMP problem
On Tue, Nov 07, 2000 at 10:57:49PM -0800, Richard Henderson wrote: > On Tue, Nov 07, 2000 at 10:09:34AM -0800, Reto Baettig wrote: > > I have a problem whith Alpha SMP's which seems to be kernel-related. I > > discussed this on the bug-glibc list but everybody seems to agree that > > it cannot be a libc problem. > > Indeed it does seem to be some sort of tlb flushing problem, Yes it was. > but I've been unable to figure out exactly what. There were a few SMP races that could trigger only using threads: 1) flush_tlb_other could happen after we read the mm->context and we could miss a tlb flush 2) flush_tlb_current could bump up the asn of the current cpu and in turn change the asn version after we acquired a new context leading to an alias between our asn and a later one 3) a PAL_swpctx can't be done in the middle of alpha_switch_to ppc/sparc64 may have similar issues and I didn't checked them (from a fast read it looks like sparc64 is just safe but I don't know the sparc hardware well enough to be sure). I also noticed the horrible implementation of ASN in SMP so while I was there I rewrote it. The rewrote is based on the fact that mm->context makes no sense. It must be an array of mm->context[NR_CPUS]. Almost certainly mips wants an array of NR_CPUS too. Anyways for mips it's not a big deal since SMP isn't supported in 2.2.x ;). In 2.2.x I added: #ifdef __alpha__ in the mm.h code, so that people can apply this patch kernel without breaking compiles of all other architectures. For 2.4.x I'd like to know what sparc64 and ppc wants as mm->context, alpha definitely wants a per-CPU array (probably mips too). With a single mm->context with threads both cpus was going to overwrite the same field at the same time. This just made mm->context useless and it leads to overflow of asn even if there's only 1 MM running in the system. And the old implementation wasn't only bad for threads but it was bad also for regular processes. Every time a task was changing CPU an ASN was wasted. After 512 changes of CPU of the same task the tlb was flushed on both cpus even if there was only 1 or two programs running. With this new design up to 256 different MM (they could belong to 10 threads each or to a single task each) can run in a SMP system without generating any tlb flush (aka ASN overflow) in any CPU regardless of the MM migration between cpus or of the context switches between task and threads. --- 2.2.18pre21aa2/arch/alpha/kernel/smp.c.~1~ Wed Nov 22 02:32:53 2000 +++ 2.2.18pre21aa2/arch/alpha/kernel/smp.c Thu Nov 23 04:48:24 2000 @@ -95,8 +95,7 @@ smp_store_cpu_info(int cpuid) { cpu_data[cpuid].loops_per_jiffy = loops_per_jiffy; - cpu_data[cpuid].last_asn - = (cpuid << WIDTH_HARDWARE_ASN) + ASN_FIRST_VERSION; + cpu_data[cpuid].last_asn = ASN_FIRST_VERSION; cpu_data[cpuid].irq_count = 0; cpu_data[cpuid].bh_count = 0; @@ -905,6 +904,8 @@ struct mm_struct *mm = (struct mm_struct *) x; if (mm == current->mm) flush_tlb_current(mm); + else + flush_tlb_other(mm); } void @@ -912,10 +913,17 @@ { if (mm == current->mm) { flush_tlb_current(mm); - if (atomic_read(>count) == 1) + if (atomic_read(>count) == 1) { + int i, cpu, this_cpu = smp_processor_id(); + for (i = 0; i < smp_num_cpus; i++) { + cpu = cpu_logical_map(i); + if (cpu == this_cpu) + continue; + mm->context[cpu] = 0; + } return; - } else - flush_tlb_other(mm); + } + } if (smp_call_function(ipi_flush_tlb_mm, mm, 1, 1)) { printk(KERN_CRIT "flush_tlb_mm: timed out\n"); @@ -932,8 +940,12 @@ ipi_flush_tlb_page(void *x) { struct flush_tlb_page_struct *data = (struct flush_tlb_page_struct *)x; - if (data->mm == current->mm) - flush_tlb_current_page(data->mm, data->vma, data->addr); + struct mm_struct * mm = data->mm; + + if (mm == current->mm) + flush_tlb_current_page(mm, data->vma, data->addr); + else + flush_tlb_other(mm); } void @@ -944,10 +956,17 @@ if (mm == current->mm) { flush_tlb_current_page(mm, vma, addr); - if (atomic_read(>mm->count) == 1) + if (atomic_read(>mm->count) == 1) { + int i, cpu, this_cpu = smp_processor_id(); + for (i = 0; i < smp_num_cpus; i++) { + cpu = cpu_logical_map(i); + if (cpu == this_cpu) + continue; + mm->context[cpu] = 0; + }
Re: Alpha SMP problem
On Tue, Nov 07, 2000 at 10:57:49PM -0800, Richard Henderson wrote: On Tue, Nov 07, 2000 at 10:09:34AM -0800, Reto Baettig wrote: I have a problem whith Alpha SMP's which seems to be kernel-related. I discussed this on the bug-glibc list but everybody seems to agree that it cannot be a libc problem. Indeed it does seem to be some sort of tlb flushing problem, Yes it was. but I've been unable to figure out exactly what. There were a few SMP races that could trigger only using threads: 1) flush_tlb_other could happen after we read the mm-context and we could miss a tlb flush 2) flush_tlb_current could bump up the asn of the current cpu and in turn change the asn version after we acquired a new context leading to an alias between our asn and a later one 3) a PAL_swpctx can't be done in the middle of alpha_switch_to ppc/sparc64 may have similar issues and I didn't checked them (from a fast read it looks like sparc64 is just safe but I don't know the sparc hardware well enough to be sure). I also noticed the horrible implementation of ASN in SMP so while I was there I rewrote it. The rewrote is based on the fact that mm-context makes no sense. It must be an array of mm-context[NR_CPUS]. Almost certainly mips wants an array of NR_CPUS too. Anyways for mips it's not a big deal since SMP isn't supported in 2.2.x ;). In 2.2.x I added: #ifdef __alpha__ in the mm.h code, so that people can apply this patch kernel without breaking compiles of all other architectures. For 2.4.x I'd like to know what sparc64 and ppc wants as mm-context, alpha definitely wants a per-CPU array (probably mips too). With a single mm-context with threads both cpus was going to overwrite the same field at the same time. This just made mm-context useless and it leads to overflow of asn even if there's only 1 MM running in the system. And the old implementation wasn't only bad for threads but it was bad also for regular processes. Every time a task was changing CPU an ASN was wasted. After 512 changes of CPU of the same task the tlb was flushed on both cpus even if there was only 1 or two programs running. With this new design up to 256 different MM (they could belong to 10 threads each or to a single task each) can run in a SMP system without generating any tlb flush (aka ASN overflow) in any CPU regardless of the MM migration between cpus or of the context switches between task and threads. --- 2.2.18pre21aa2/arch/alpha/kernel/smp.c.~1~ Wed Nov 22 02:32:53 2000 +++ 2.2.18pre21aa2/arch/alpha/kernel/smp.c Thu Nov 23 04:48:24 2000 @@ -95,8 +95,7 @@ smp_store_cpu_info(int cpuid) { cpu_data[cpuid].loops_per_jiffy = loops_per_jiffy; - cpu_data[cpuid].last_asn - = (cpuid WIDTH_HARDWARE_ASN) + ASN_FIRST_VERSION; + cpu_data[cpuid].last_asn = ASN_FIRST_VERSION; cpu_data[cpuid].irq_count = 0; cpu_data[cpuid].bh_count = 0; @@ -905,6 +904,8 @@ struct mm_struct *mm = (struct mm_struct *) x; if (mm == current-mm) flush_tlb_current(mm); + else + flush_tlb_other(mm); } void @@ -912,10 +913,17 @@ { if (mm == current-mm) { flush_tlb_current(mm); - if (atomic_read(mm-count) == 1) + if (atomic_read(mm-count) == 1) { + int i, cpu, this_cpu = smp_processor_id(); + for (i = 0; i smp_num_cpus; i++) { + cpu = cpu_logical_map(i); + if (cpu == this_cpu) + continue; + mm-context[cpu] = 0; + } return; - } else - flush_tlb_other(mm); + } + } if (smp_call_function(ipi_flush_tlb_mm, mm, 1, 1)) { printk(KERN_CRIT "flush_tlb_mm: timed out\n"); @@ -932,8 +940,12 @@ ipi_flush_tlb_page(void *x) { struct flush_tlb_page_struct *data = (struct flush_tlb_page_struct *)x; - if (data-mm == current-mm) - flush_tlb_current_page(data-mm, data-vma, data-addr); + struct mm_struct * mm = data-mm; + + if (mm == current-mm) + flush_tlb_current_page(mm, data-vma, data-addr); + else + flush_tlb_other(mm); } void @@ -944,10 +956,17 @@ if (mm == current-mm) { flush_tlb_current_page(mm, vma, addr); - if (atomic_read(current-mm-count) == 1) + if (atomic_read(current-mm-count) == 1) { + int i, cpu, this_cpu = smp_processor_id(); + for (i = 0; i smp_num_cpus; i++) { + cpu = cpu_logical_map(i); + if (cpu == this_cpu) + continue; + mm-context[cpu] = 0; + } return; -
Re: Alpha SMP problem
On Tue, Nov 07, 2000 at 10:09:34AM -0800, Reto Baettig wrote: > I have a problem whith Alpha SMP's which seems to be kernel-related. I > discussed this on the bug-glibc list but everybody seems to agree that > it cannot be a libc problem. Indeed it does seem to be some sort of tlb flushing problem, but I've been unable to figure out exactly what. r~ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Alpha SMP problem
Update: I just tested it on Alpha UP and everything's fine. It really seems to be a SMP problem... Reto Baettig wrote: > > Hi > > I have a problem whith Alpha SMP's which seems to be kernel-related. I > discussed this on the bug-glibc list but everybody seems to agree that > it cannot be a libc problem. > > I attached a little testprogram which reproduces the bug in < 1Minute. > BUT: IT MUST BE STARTED AT LEAST TWICE! > > The strange thing is that a single instance of the program runs just > fine. When I start the program a second time, I get segfaults and/or > stuck threads. > > We could reproduce this behaviour on different Machines, both with linux > 2.2.14 and 2.4.0-test10, but > ONLY ON ALPHA SMP MACHINES. > > Here's my configuration: > > Linux reto1 2.4.0-test10 #2 SMP Tue Oct 31 19:39:51 PST 2000 alpha > unknown > ^^^ ^ > Kernel modules 2.3.19 > Gnu C egcs-2.91.66 > Gnu Make 3.78.1 > Binutils 2.9.5.0.22 > Linux C Library2.1.3 > Dynamic linker ldd (GNU libc) 2.1.3 > Procps 2.0.6 > Mount 2.10f > Net-tools 1.54 > Console-tools 0.3.3 > Sh-utils 2.0 > Modules Loaded nfs lockd sunrpc > > Any ideas? > > Please tell me when you need more information, or give me some pointers > where I could start to dig... > > TIA > > Reto > > > Name: malloctest.tgz >malloctest.tgzType: unspecified type (application/octet-stream) > Encoding: base64 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Alpha SMP problem
Update: I just tested it on Alpha UP and everything's fine. It really seems to be a SMP problem... Reto Baettig wrote: Hi I have a problem whith Alpha SMP's which seems to be kernel-related. I discussed this on the bug-glibc list but everybody seems to agree that it cannot be a libc problem. I attached a little testprogram which reproduces the bug in 1Minute. BUT: IT MUST BE STARTED AT LEAST TWICE! The strange thing is that a single instance of the program runs just fine. When I start the program a second time, I get segfaults and/or stuck threads. We could reproduce this behaviour on different Machines, both with linux 2.2.14 and 2.4.0-test10, but ONLY ON ALPHA SMP MACHINES. Here's my configuration: Linux reto1 2.4.0-test10 #2 SMP Tue Oct 31 19:39:51 PST 2000 alpha unknown ^^^ ^ Kernel modules 2.3.19 Gnu C egcs-2.91.66 Gnu Make 3.78.1 Binutils 2.9.5.0.22 Linux C Library2.1.3 Dynamic linker ldd (GNU libc) 2.1.3 Procps 2.0.6 Mount 2.10f Net-tools 1.54 Console-tools 0.3.3 Sh-utils 2.0 Modules Loaded nfs lockd sunrpc Any ideas? Please tell me when you need more information, or give me some pointers where I could start to dig... TIA Reto Name: malloctest.tgz malloctest.tgzType: unspecified type (application/octet-stream) Encoding: base64 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Alpha SMP problem
On Tue, Nov 07, 2000 at 10:09:34AM -0800, Reto Baettig wrote: I have a problem whith Alpha SMP's which seems to be kernel-related. I discussed this on the bug-glibc list but everybody seems to agree that it cannot be a libc problem. Indeed it does seem to be some sort of tlb flushing problem, but I've been unable to figure out exactly what. r~ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/