Re: [patch]2.4.0-test6 "spinlock" preemption patch
On Tue, Sep 12, 2000 at 11:37:46AM +0100, Alan Cox wrote: > That code example can in theory deadlock without any patches if the CPU's > end up locked in sync with each other and the same one always wins the test. > It isnt likely on current x86 but other processors are a different story If seen systems (not processors!) that can detect such a case let one process randomly win over the others. Ralf -- "Embrace, Enhance, Eliminate" - it worked for the pope, it'll work for Bill. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch]2.4.0-test6 "spinlock" preemption patch
On Tue, 12 Sep 2000, Alan Cox wrote: >That code example can in theory deadlock without any patches if the CPU's Woops I really meant: while (test_and_set_bit(0, &lock)); /* critical section */ mb(); clear_bit(0, &lock); Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch]2.4.0-test6 "spinlock" preemption patch
> > while (test_and_set_bit(0, &something)) { > > /* critical section */ > > mb(); > > clear_bit(0, &something); > > } > > > The above construct it's discouraged of course when you can do > > the same thing with a spinlock but some place is doing that. > > Hmmm, maybe the Montavista people can volunteer to clean > up all those places in the kernel code? ;) That code example can in theory deadlock without any patches if the CPU's end up locked in sync with each other and the same one always wins the test. It isnt likely on current x86 but other processors are a different story - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch]2.4.0-test6 "spinlock" preemption patch
Rik van Riel wrote: > > On Tue, 12 Sep 2000, Andrea Arcangeli wrote: > > On Wed, 6 Sep 2000, George Anzinger wrote: > > > > >The times a kernel is not preemptable under this patch are: > > > > > >While handling interrupts. > > >While doing "bottom half" processing. > > >While holding a spinlock, writelock or readlock. > > > > > >At all other times the algorithm allows preemption. > > > > So it can deadlock if somebody is doing: > > > > while (test_and_set_bit(0, &something)) { > > /* critical section */ > > mb(); > > clear_bit(0, &something); > > } > > > The above construct it's discouraged of course when you can do > > the same thing with a spinlock but some place is doing that. > > Hmmm, maybe the Montavista people can volunteer to clean > up all those places in the kernel code? ;) > > cheers, Well, I think that is what we are saying. We are trying to understand the lay of the land and which way the wind is blowing so that our work is accepted into the kernel. Thus, for example, even now both preemption and rtsched are configuration options that, when not chosen, give you back the same old kernel (with possibly a more readable debug option in spinlock.h and a more reliable exit from entry.S :) Along these lines, we do want to thank Andrea for pointing out this code to us. It is always better to have someone point out the tar pits prior to our trying to walk across them (and verily, he was never seen again :) George - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch]2.4.0-test6 "spinlock" preemption patch
On Mon, 11 Sep 2000, Rik van Riel wrote: >Hmmm, maybe the Montavista people can volunteer to clean >up all those places in the kernel code? ;) That would be nice and welcome indipendently of the preemptible kernel indeed. The right construct to convert that stuff is spin_is_locked/spin_trylock (so spin_trylock will take care to forbid kernel reschedules within the critical section). One example that cames to mind to better show what this cleanup consists of (not a matter to this case because it's code that doesn't get compiled in UP) is the global_irq_lock variable. The i386 one is the example of the old style one and the alpha one is the new style spin_is_locked/trylock one. The new rule should be that places that uses test_and_set_bit should never spin. They can be of course schedule-locks like lock_page() (infact being a schedule aware lock still means not to spin on the lock :). Those cleanups can start in the 2.4.x timeframe but I'd rather not depend on them during 2.4.x to have a stable kernel. (2.5.x looks a better time to change such an ancient API) This is just my humble opinion of course. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch]2.4.0-test6 "spinlock" preemption patch
On Tue, 12 Sep 2000, Andrea Arcangeli wrote: > On Wed, 6 Sep 2000, George Anzinger wrote: > > >The times a kernel is not preemptable under this patch are: > > > >While handling interrupts. > >While doing "bottom half" processing. > >While holding a spinlock, writelock or readlock. > > > >At all other times the algorithm allows preemption. > > So it can deadlock if somebody is doing: > > while (test_and_set_bit(0, &something)) { > /* critical section */ > mb(); > clear_bit(0, &something); > } > The above construct it's discouraged of course when you can do > the same thing with a spinlock but some place is doing that. Hmmm, maybe the Montavista people can volunteer to clean up all those places in the kernel code? ;) cheers, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch]2.4.0-test6 "spinlock" preemption patch
On Wed, 6 Sep 2000, George Anzinger wrote: >The times a kernel is not preemptable under this patch are: > >While handling interrupts. >While doing "bottom half" processing. >While holding a spinlock, writelock or readlock. > >At all other times the algorithm allows preemption. So it can deadlock if somebody is doing: while (test_and_set_bit(0, &something)) { /* critical section */ mb(); clear_bit(0, &something); } The above is 100% correct code in all linux kernels out there. It won't be correct anymore when you make the kernel preemtable with your patch or also with the patch from [EMAIL PROTECTED] of last month. http://www.uwsg.iu.edu/hypermail/linux/kernel/0008.1/0842.html The above construct it's discouraged of course when you can do the same thing with a spinlock but some place is doing that. I did a very fast grep I found several places that will deadlock with your patch applied. (just grep for test_and_set_bit all over the kernel and search for `while' in the output of the grep, this will give you the obvious places, then we should as well check for all the other atomic operations also the one that doesn't spin because the spin could happen w/o an atomic operation... infact all spinning should be done w/o atomic operations to avoid cacheline pingpong) About the title "hard real-time fully preemptable Linux kernel prototype" I'd say it's a little misleading given that the preemptable kernel have nothing to do with hard real time. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch]2.4.0-test6 "spinlock" preemption patch
George Anzinger wrote: > > This patch, for 2.4.0-test6, allows the kernel to be built with full > preemption. Neat. Congratulations. > ... > The measured context switch latencies with this patch > have been as high as 12 ms, however, we are actively working to > isolate and fix the areas of the system where this occurs. That has already been done! From memory, there are three long-lived spinlocks which affect latency. Try applying http://www.uow.edu.au/~andrewm/linux/low-latency.patch Also, review Ingo's ll-patch. He may have picked up on some extra ones. http://www.redhat.com/~mingo/lowlatency-patches/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[patch]2.4.0-test6 "spinlock" preemption patch
This patch, for 2.4.0-test6, allows the kernel to be built with full preemption. The patch relies on the spinlock mechanism to protect areas of code that should not be preempted. The patch, by way of a small change to schedule(), allows preemption during times when current->state is not TASK_RUN. The measured context switch latencies with this patch have been as high as 12 ms, however, we are actively working to isolate and fix the areas of the system where this occurs. Currently the patch works only for ix86 UP systems. We are also actively working to expand the platform base for this work and to allow SMP systems to take advantage of the same improvements. The times a kernel is not preemptable under this patch are: While handling interrupts. While doing "bottom half" processing. While holding a spinlock, writelock or readlock. At all other times the algorithm allows preemption. The file preempt.txt (available at ftp.mvista.com/pub/Real-Time/2.4.0-test6) discusses the patch in more detail. This patch as well as a Real Time scheduler patch are also available at that site. George MontaVista Software diff -urP -X patch.exclude linux-2.4.0-test6-org/Documentation/Configure.help linux/Documentation/Configure.help --- linux-2.4.0-test6-org/Documentation/Configure.help Wed Aug 9 13:49:28 2000 +++ linux/Documentation/Configure.help Mon Sep 4 14:51:58 2000 @@ -130,6 +130,15 @@ If you have system with several CPU's, you do not need to say Y here: APIC will be used automatically. +Preemptable Kernel +CONFIG_PREEMPT + This option changes the kernel to be "usually" preemptable. Currently + this option is incompatable with SMP. The expected result of a + preemptable kernel is much lower latency (or time) between a scheduling + event (I/O completion interrupt, timer expiration, lock release) and the + actual rescheduling of the task waiting on the event. The cost is a + small overhead in maintaining the required locks and data structures. + Kernel math emulation CONFIG_MATH_EMULATION Linux can emulate a math coprocessor (used for floating point diff -urP -X patch.exclude linux-2.4.0-test6-org/arch/i386/config.in linux/arch/i386/config.in --- linux-2.4.0-test6-org/arch/i386/config.in Mon Jul 31 19:36:10 2000 +++ linux/arch/i386/config.in Mon Sep 4 14:51:58 2000 @@ -148,6 +148,7 @@ define_bool CONFIG_X86_IO_APIC y define_bool CONFIG_X86_LOCAL_APIC y fi +bool 'Preemptable Kernel' CONFIG_PREEMPT fi if [ "$CONFIG_SMP" = "y" -a "$CONFIG_X86_CMPXCHG" = "y" ]; then define_bool CONFIG_HAVE_DEC_LOCK y diff -urP -X patch.exclude linux-2.4.0-test6-org/arch/i386/kernel/entry.S linux/arch/i386/kernel/entry.S --- linux-2.4.0-test6-org/arch/i386/kernel/entry.S Sun Aug 6 22:21:23 2000 +++ linux/arch/i386/kernel/entry.S Mon Sep 4 14:51:58 2000 @@ -47,6 +47,12 @@ #define ASSEMBLY #include +#ifdef CONFIG_PREEMPT +#define scheduleY preempt_schedule +#else +#define scheduleY schedule +#endif + EBX= 0x00 ECX= 0x04 EDX= 0x08 @@ -72,7 +78,7 @@ * these are offsets into the task-struct. */ state = 0 -flags = 4 +preempt_count = 4 sigpending = 8 addr_limit = 12 exec_domain= 16 @@ -80,8 +86,31 @@ tsk_ptrace = 24 processor = 52 +/* These are offsets into the irq_stat structure + * There is one per cpu and it is aligned to 32 + * byte boundry (we put that here as a shift count) + */ +irq_array_shift = 5 + +irq_stat_softirq_active = 0 +irq_stat_softirq_mask = 4 +irq_stat_local_irq_count= 8 +irq_stat_local_bh_count = 12 + ENOSYS = 38 +#ifdef CONFIG_SMP +#define GET_CPU_INDX movl processor(%ebx),%eax; \ +shll $irq_array_shift,%eax +#define GET_CURRENT_CPU_INDX GET_CURRENT(%ebx); \ + GET_CPU_INDX +#define CPU_INDX (,%eax) +#else +#define GET_CPU_INDX +#define GET_CURRENT_CPU_INDX GET_CURRENT(%ebx) +#define CPU_INDX +#endif + #define SAVE_ALL \ cld; \ @@ -202,35 +231,45 @@ jne tracesys call *SYMBOL_NAME(sys_call_table)(,%eax,4) movl %eax,EAX(%esp) # save the return value + ENTRY(ret_from_sys_call) -#ifdef CONFIG_SMP - movl processor(%ebx),%eax - shll $5,%eax - movl SYMBOL_NAME(irq_stat)(,%eax),%ecx # softirq_active - testl SYMBOL_NAME(irq_stat)+4(,%eax),%ecx # softirq_mask -#else - movl SYMBOL_NAME(irq_stat),%ecx # softirq_active - testl SYMBOL_NAME(irq_stat)+4,%ecx # softirq_mask +GET_CPU_INDX +#ifdef CONFIG_PREEMPT +cli #endif - jne handle_softirq - -ret_with_reschedule: + movl SYMBOL_NAME(irq_stat)+irq_stat_softirq_active CPU_INDX,%ecx + testl SYMBOL_NAME(irq_stat)+irq_stat_softirq_mask CPU_INDX,%ecx + jne handle_softirq_user + +softirq_user_rtn: