Re: [patch]2.4.0-test6 "spinlock" preemption patch

2000-09-13 Thread Ralf Baechle

On Tue, Sep 12, 2000 at 11:37:46AM +0100, Alan Cox wrote:

> That code example can in theory deadlock without any patches if the CPU's
> end up locked in sync with each other and the same one always wins the test.
> It isnt likely on current x86 but other processors are a different story

If seen systems (not processors!) that can detect such a case let one
process randomly win over the others.

  Ralf

--
"Embrace, Enhance, Eliminate" - it worked for the pope, it'll work for Bill.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch]2.4.0-test6 "spinlock" preemption patch

2000-09-12 Thread Andrea Arcangeli

On Tue, 12 Sep 2000, Alan Cox wrote:

>That code example can in theory deadlock without any patches if the CPU's

Woops I really meant:

while (test_and_set_bit(0, &lock));
/* critical section */
mb();
clear_bit(0, &lock);

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch]2.4.0-test6 "spinlock" preemption patch

2000-09-12 Thread Alan Cox

> > while (test_and_set_bit(0, &something)) {
> > /* critical section */
> > mb();
> > clear_bit(0, &something);
> > }
> 
> > The above construct it's discouraged of course when you can do
> > the same thing with a spinlock but some place is doing that.
> 
> Hmmm, maybe the Montavista people can volunteer to clean
> up all those places in the kernel code? ;)

That code example can in theory deadlock without any patches if the CPU's
end up locked in sync with each other and the same one always wins the test.
It isnt likely on current x86 but other processors are a different story

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch]2.4.0-test6 "spinlock" preemption patch

2000-09-12 Thread George Anzinger

Rik van Riel wrote:
> 
> On Tue, 12 Sep 2000, Andrea Arcangeli wrote:
> > On Wed, 6 Sep 2000, George Anzinger wrote:
> >
> > >The times a kernel is not preemptable under this patch are:
> > >
> > >While handling interrupts.
> > >While doing "bottom half" processing.
> > >While holding a spinlock, writelock or readlock.
> > >
> > >At all other times the algorithm allows preemption.
> >
> > So it can deadlock if somebody is doing:
> >
> >   while (test_and_set_bit(0, &something)) {
> >   /* critical section */
> >   mb();
> >   clear_bit(0, &something);
> >   }
> 
> > The above construct it's discouraged of course when you can do
> > the same thing with a spinlock but some place is doing that.
> 
> Hmmm, maybe the Montavista people can volunteer to clean
> up all those places in the kernel code? ;)
> 
> cheers,

Well, I think that is what we are saying.  We are trying to understand
the lay of the land and which way the wind is blowing so that our work
is accepted into the kernel.  Thus, for example, even now both
preemption and rtsched are configuration options that, when not chosen,
give you back the same old kernel (with possibly a more readable debug
option in spinlock.h  and a more reliable exit from entry.S :)

Along these lines, we do want to thank Andrea for pointing out this code
to us.  It is always better to have someone point out the tar pits prior
to our trying to walk across them (and verily, he was never seen again
:)

George
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch]2.4.0-test6 "spinlock" preemption patch

2000-09-11 Thread Andrea Arcangeli

On Mon, 11 Sep 2000, Rik van Riel wrote:

>Hmmm, maybe the Montavista people can volunteer to clean
>up all those places in the kernel code? ;)

That would be nice and welcome indipendently of the preemptible kernel
indeed. The right construct to convert that stuff is
spin_is_locked/spin_trylock (so spin_trylock will take care to forbid
kernel reschedules within the critical section).

One example that cames to mind to better show what this cleanup consists
of (not a matter to this case because it's code that doesn't get compiled
in UP) is the global_irq_lock variable. The i386 one is the example of the
old style one and the alpha one is the new style spin_is_locked/trylock
one.

The new rule should be that places that uses test_and_set_bit should never
spin. They can be of course schedule-locks like lock_page() (infact being
a schedule aware lock still means not to spin on the lock :).

Those cleanups can start in the 2.4.x timeframe but I'd rather not depend
on them during 2.4.x to have a stable kernel. (2.5.x looks a better time
to change such an ancient API) This is just my humble opinion of course.

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch]2.4.0-test6 "spinlock" preemption patch

2000-09-11 Thread Rik van Riel

On Tue, 12 Sep 2000, Andrea Arcangeli wrote:
> On Wed, 6 Sep 2000, George Anzinger wrote:
> 
> >The times a kernel is not preemptable under this patch are:
> >
> >While handling interrupts.
> >While doing "bottom half" processing.
> >While holding a spinlock, writelock or readlock.
> >
> >At all other times the algorithm allows preemption.
> 
> So it can deadlock if somebody is doing:
> 
>   while (test_and_set_bit(0, &something)) {
>   /* critical section */
>   mb();
>   clear_bit(0, &something);
>   }

> The above construct it's discouraged of course when you can do
> the same thing with a spinlock but some place is doing that.

Hmmm, maybe the Montavista people can volunteer to clean
up all those places in the kernel code? ;)

cheers,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch]2.4.0-test6 "spinlock" preemption patch

2000-09-11 Thread Andrea Arcangeli

On Wed, 6 Sep 2000, George Anzinger wrote:

>The times a kernel is not preemptable under this patch are:
>
>While handling interrupts.
>While doing "bottom half" processing.
>While holding a spinlock, writelock or readlock.
>
>At all other times the algorithm allows preemption.

So it can deadlock if somebody is doing:

while (test_and_set_bit(0, &something)) {
/* critical section */
mb();
clear_bit(0, &something);
}

The above is 100% correct code in all linux kernels out there. It
won't be correct anymore when you make the kernel preemtable with your
patch or also with the patch from [EMAIL PROTECTED] of last month.

http://www.uwsg.iu.edu/hypermail/linux/kernel/0008.1/0842.html

The above construct it's discouraged of course when you can do the same
thing with a spinlock but some place is doing that.

I did a very fast grep I found several places that will deadlock with your
patch applied. (just grep for test_and_set_bit all over the kernel and
search for `while' in the output of the grep, this will give you the
obvious places, then we should as well check for all the other atomic
operations also the one that doesn't spin because the spin could happen
w/o an atomic operation... infact all spinning should be done w/o atomic
operations to avoid cacheline pingpong)

About the title "hard real-time fully preemptable Linux kernel prototype"
I'd say it's a little misleading given that the preemptable kernel have
nothing to do with hard real time.

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [patch]2.4.0-test6 "spinlock" preemption patch

2000-09-06 Thread Andrew Morton

George Anzinger wrote:
> 
> This patch, for 2.4.0-test6, allows the kernel to be built with full
> preemption.

Neat.  Congratulations.

> ...
>  The measured context switch latencies with this patch
> have been as high as 12 ms, however, we are actively working to
> isolate and fix the areas of the system where this occurs.

That has already been done!  From memory, there are three long-lived
spinlocks which affect latency.  Try applying
http://www.uow.edu.au/~andrewm/linux/low-latency.patch

Also, review Ingo's ll-patch.   He may have picked up on some extra
ones.

http://www.redhat.com/~mingo/lowlatency-patches/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



[patch]2.4.0-test6 "spinlock" preemption patch

2000-09-06 Thread George Anzinger


This patch, for 2.4.0-test6, allows the kernel to be built with full
preemption.  The patch relies on the spinlock mechanism to protect areas
of code that should not be preempted.  The patch, by way of a small
change to schedule(), allows preemption during times when current->state
is not TASK_RUN. The measured context switch latencies with this patch
have been as high as 12 ms, however, we are actively working to
isolate and fix the areas of the system where this occurs.  Currently
the patch works only for ix86 UP systems.  We are also
actively working to expand the platform base for this work and to allow
SMP systems to take advantage of the same improvements.  

The times a kernel is not preemptable under this patch are:

While handling interrupts.
While doing "bottom half" processing.
While holding a spinlock, writelock or readlock.

At all other times the algorithm allows preemption.


The file preempt.txt (available at
ftp.mvista.com/pub/Real-Time/2.4.0-test6)
discusses the patch in more detail.  This patch as well as a
Real Time scheduler patch are also available at that site.

George
MontaVista Software

diff -urP -X patch.exclude linux-2.4.0-test6-org/Documentation/Configure.help 
linux/Documentation/Configure.help
--- linux-2.4.0-test6-org/Documentation/Configure.help  Wed Aug  9 13:49:28 2000
+++ linux/Documentation/Configure.help  Mon Sep  4 14:51:58 2000
@@ -130,6 +130,15 @@
   If you have system with several CPU's, you do not need to say Y
   here: APIC will be used automatically.
 
+Preemptable Kernel
+CONFIG_PREEMPT
+  This option changes the kernel to be "usually" preemptable.  Currently
+  this option is incompatable with SMP.  The expected result of a 
+  preemptable kernel is much lower latency (or time) between a scheduling 
+  event (I/O completion interrupt, timer expiration, lock release) and the
+  actual rescheduling of the task waiting on the event.  The cost is a 
+  small overhead in maintaining the required locks and data structures.
+
 Kernel math emulation
 CONFIG_MATH_EMULATION
   Linux can emulate a math coprocessor (used for floating point
diff -urP -X patch.exclude linux-2.4.0-test6-org/arch/i386/config.in 
linux/arch/i386/config.in
--- linux-2.4.0-test6-org/arch/i386/config.in   Mon Jul 31 19:36:10 2000
+++ linux/arch/i386/config.in   Mon Sep  4 14:51:58 2000
@@ -148,6 +148,7 @@
define_bool CONFIG_X86_IO_APIC y
define_bool CONFIG_X86_LOCAL_APIC y
 fi
+bool 'Preemptable Kernel' CONFIG_PREEMPT
 fi
 if [ "$CONFIG_SMP" = "y" -a "$CONFIG_X86_CMPXCHG" = "y" ]; then
 define_bool CONFIG_HAVE_DEC_LOCK y
diff -urP -X patch.exclude linux-2.4.0-test6-org/arch/i386/kernel/entry.S 
linux/arch/i386/kernel/entry.S
--- linux-2.4.0-test6-org/arch/i386/kernel/entry.S  Sun Aug  6 22:21:23 2000
+++ linux/arch/i386/kernel/entry.S  Mon Sep  4 14:51:58 2000
@@ -47,6 +47,12 @@
 #define ASSEMBLY
 #include 
 
+#ifdef CONFIG_PREEMPT
+#define scheduleY preempt_schedule
+#else
+#define scheduleY schedule
+#endif
+
 EBX= 0x00
 ECX= 0x04
 EDX= 0x08
@@ -72,7 +78,7 @@
  * these are offsets into the task-struct.
  */
 state  =  0
-flags  =  4
+preempt_count  =  4
 sigpending =  8
 addr_limit = 12
 exec_domain= 16
@@ -80,8 +86,31 @@
 tsk_ptrace = 24
 processor  = 52
 
+/* These are offsets into the irq_stat structure
+ * There is one per cpu and it is aligned to 32
+ * byte boundry (we put that here as a shift count)
+ */
+irq_array_shift = 5
+
+irq_stat_softirq_active = 0
+irq_stat_softirq_mask   = 4
+irq_stat_local_irq_count= 8
+irq_stat_local_bh_count = 12
+
 ENOSYS = 38
 
+#ifdef CONFIG_SMP
+#define GET_CPU_INDX   movl processor(%ebx),%eax;  \
+shll $irq_array_shift,%eax
+#define GET_CURRENT_CPU_INDX GET_CURRENT(%ebx); \
+ GET_CPU_INDX
+#define CPU_INDX (,%eax)
+#else
+#define GET_CPU_INDX
+#define GET_CURRENT_CPU_INDX GET_CURRENT(%ebx)
+#define CPU_INDX
+#endif
+
 
 #define SAVE_ALL \
cld; \
@@ -202,35 +231,45 @@
jne tracesys
call *SYMBOL_NAME(sys_call_table)(,%eax,4)
movl %eax,EAX(%esp) # save the return value
+
 ENTRY(ret_from_sys_call)
-#ifdef CONFIG_SMP
-   movl processor(%ebx),%eax
-   shll $5,%eax
-   movl SYMBOL_NAME(irq_stat)(,%eax),%ecx  # softirq_active
-   testl SYMBOL_NAME(irq_stat)+4(,%eax),%ecx   # softirq_mask
-#else
-   movl SYMBOL_NAME(irq_stat),%ecx # softirq_active
-   testl SYMBOL_NAME(irq_stat)+4,%ecx  # softirq_mask
+GET_CPU_INDX
+#ifdef CONFIG_PREEMPT
+cli
 #endif
-   jne   handle_softirq
-   
-ret_with_reschedule:
+   movl SYMBOL_NAME(irq_stat)+irq_stat_softirq_active CPU_INDX,%ecx
+   testl SYMBOL_NAME(irq_stat)+irq_stat_softirq_mask CPU_INDX,%ecx
+   jne   handle_softirq_user
+
+softirq_user_rtn: