Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Sat, 2005-02-19 at 15:45 -0500, Lee Revell wrote: > On Sat, 2005-02-19 at 10:03 +0100, Ingo Molnar wrote: > > * Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > > > > Testing on an all SCSI 1.3Ghz Athlon XP system, I am seeing very long > > > > latencies in the journalling code with 2.6.11-rc4-RT-V0.7.39-02. > > > > > > could you send me the full trace? > > > > just in case the system in question is still running - could you also do > > a 'verbose' trace via: > > > > echo 1 > /proc/sys/kernel/trace_verbose > > OK, here is a 2912us verbose latency trace with "data=ordered", gzipped. > dbench 32 or 64 is the easiest way to trigger these. > > I have not tried "data=journal". As previously stated "data=writeback" > works perfectly - I ran JACK overnight while stressing the fs and did > not get one xrun. Any update on this? The problem is still apparent in 2.6.11. It seems to be a regression from 2.6.10. And now I've heard 2.6.12-rc1 mentioned with no motion on this. Here's the trace again in case you missed it: http://www.alsa-project.org/~rlrevell/2912us The "latency regressions" thread was all sub-millisecond stuff which can be ignored IMHO. Still interesting because they are regressions after all, but not a real world problem. However this one can be several milliseconds. It's a real problem. I'd hate to have to ship 2.6.12 with a disclaimer that ext3 with "data=ordered" is not suitable for the desktop (as it clearly violates the stated desktop responsiveness goal of 1ms). Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Sat, 2005-02-19 at 15:45 -0500, Lee Revell wrote: On Sat, 2005-02-19 at 10:03 +0100, Ingo Molnar wrote: * Ingo Molnar [EMAIL PROTECTED] wrote: Testing on an all SCSI 1.3Ghz Athlon XP system, I am seeing very long latencies in the journalling code with 2.6.11-rc4-RT-V0.7.39-02. could you send me the full trace? just in case the system in question is still running - could you also do a 'verbose' trace via: echo 1 /proc/sys/kernel/trace_verbose OK, here is a 2912us verbose latency trace with data=ordered, gzipped. dbench 32 or 64 is the easiest way to trigger these. I have not tried data=journal. As previously stated data=writeback works perfectly - I ran JACK overnight while stressing the fs and did not get one xrun. Any update on this? The problem is still apparent in 2.6.11. It seems to be a regression from 2.6.10. And now I've heard 2.6.12-rc1 mentioned with no motion on this. Here's the trace again in case you missed it: http://www.alsa-project.org/~rlrevell/2912us The latency regressions thread was all sub-millisecond stuff which can be ignored IMHO. Still interesting because they are regressions after all, but not a real world problem. However this one can be several milliseconds. It's a real problem. I'd hate to have to ship 2.6.12 with a disclaimer that ext3 with data=ordered is not suitable for the desktop (as it clearly violates the stated desktop responsiveness goal of 1ms). Lee - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Wed, 2005-03-16 at 02:50 -0500, Steven Rostedt wrote: > > On Tue, 15 Mar 2005, Lee Revell wrote: > > > On Tue, 2005-03-15 at 13:05 -0500, Steven Rostedt wrote: > > > Damn! The answer was right there in front of my eyes! Here's the cleanest > > > solution. I forgot about wait_on_bit_lock. I've converted all the locks > > > to use this instead. We probably need to get priority inheritence working > > > on this too someday, but for now it's better than wasting memory or > > > getting into deadlocks. > > > > > > > I am still not clear on why this did not hit with earlier kernels + > > PREEMPT_DESKTOP. Were the bitlocks introduced recently? Or was another > > lock-break patch dropped? > > > > When did you start seeing this? This code has been there as far back as > 2.6.7 (the earliest 2.6 kernel I still have laying around) and as far > back as Ingo's realtime-preempt-2.6.9-mm1-U10. Maybe the tracing didn't > start picking this up till later, or that you were just lucky that no > contention was happening on that lock. Sometime after the RT preempt patches were rebased to mainline. I don't see how there could be contention as I am on a UP. Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > * Steven Rostedt <[EMAIL PROTECTED]> wrote: > > > Damn! The answer was right there in front of my eyes! Here's the > > cleanest solution. I forgot about wait_on_bit_lock. I've converted > > all the locks to use this instead. [...] > > ah, indeed, this looks really nifty. Andrew? > There's a little lock ranking diagram in jbd.h which tells us that these locks nest inside j_list_lock and j_state_lock. So I guess you'll need to turn those into semaphores. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
* Steven Rostedt <[EMAIL PROTECTED]> wrote: > Damn! The answer was right there in front of my eyes! Here's the > cleanest solution. I forgot about wait_on_bit_lock. I've converted > all the locks to use this instead. [...] ah, indeed, this looks really nifty. Andrew? Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
* Steven Rostedt [EMAIL PROTECTED] wrote: Damn! The answer was right there in front of my eyes! Here's the cleanest solution. I forgot about wait_on_bit_lock. I've converted all the locks to use this instead. [...] ah, indeed, this looks really nifty. Andrew? Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
Ingo Molnar [EMAIL PROTECTED] wrote: * Steven Rostedt [EMAIL PROTECTED] wrote: Damn! The answer was right there in front of my eyes! Here's the cleanest solution. I forgot about wait_on_bit_lock. I've converted all the locks to use this instead. [...] ah, indeed, this looks really nifty. Andrew? There's a little lock ranking diagram in jbd.h which tells us that these locks nest inside j_list_lock and j_state_lock. So I guess you'll need to turn those into semaphores. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Wed, 2005-03-16 at 02:50 -0500, Steven Rostedt wrote: On Tue, 15 Mar 2005, Lee Revell wrote: On Tue, 2005-03-15 at 13:05 -0500, Steven Rostedt wrote: Damn! The answer was right there in front of my eyes! Here's the cleanest solution. I forgot about wait_on_bit_lock. I've converted all the locks to use this instead. We probably need to get priority inheritence working on this too someday, but for now it's better than wasting memory or getting into deadlocks. I am still not clear on why this did not hit with earlier kernels + PREEMPT_DESKTOP. Were the bitlocks introduced recently? Or was another lock-break patch dropped? When did you start seeing this? This code has been there as far back as 2.6.7 (the earliest 2.6 kernel I still have laying around) and as far back as Ingo's realtime-preempt-2.6.9-mm1-U10. Maybe the tracing didn't start picking this up till later, or that you were just lucky that no contention was happening on that lock. Sometime after the RT preempt patches were rebased to mainline. I don't see how there could be contention as I am on a UP. Lee - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Tue, 15 Mar 2005, Lee Revell wrote: > On Tue, 2005-03-15 at 13:05 -0500, Steven Rostedt wrote: > > Damn! The answer was right there in front of my eyes! Here's the cleanest > > solution. I forgot about wait_on_bit_lock. I've converted all the locks > > to use this instead. We probably need to get priority inheritence working > > on this too someday, but for now it's better than wasting memory or > > getting into deadlocks. > > > > I am still not clear on why this did not hit with earlier kernels + > PREEMPT_DESKTOP. Were the bitlocks introduced recently? Or was another > lock-break patch dropped? > When did you start seeing this? This code has been there as far back as 2.6.7 (the earliest 2.6 kernel I still have laying around) and as far back as Ingo's realtime-preempt-2.6.9-mm1-U10. Maybe the tracing didn't start picking this up till later, or that you were just lucky that no contention was happening on that lock. -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Tue, 15 Mar 2005, Steven Rostedt wrote: > > > On Tue, 15 Mar 2005, Ingo Molnar wrote: > > > > i'd go for removing bit-spinlocks altogether, in the upstream kernel. It > > would simplify things, besides making PREEMPT_RT simpler as well. The > > memory overhead is not a big issue i believe. (8 more bytes per ext3 bh, > > on x86) > > > > Hi Ingo, > > Damn! The answer was right there in front of my eyes! Here's the cleanest > solution. I forgot about wait_on_bit_lock. I've converted all the locks > to use this instead. We probably need to get priority inheritence working > on this too someday, but for now it's better than wasting memory or > getting into deadlocks. > One bit of caution on these. If we don't have PREEMPT_RT, then don't the spinlocks on SMP act the same as normal spinlocks, and that we should not schedule holding a spinlock? I believe that some of this locks are called within holding spin_locks. So this isn't the right solution for other than PREEMPT_RT. I also forgot to add might_sleep in the locking calls. Here's the patch with the might_sleep added. What should we do for non PREEPMT_RT? Maybe put the bit_spinlocks back in for that case? -- Steve diff -ur linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c --- linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c 2005-03-02 02:37:49.0 -0500 +++ linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c 2005-03-15 11:58:14.0 -0500 @@ -82,6 +82,17 @@ static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) +/* + * Used in the locking of the bh_state and bh_journalhead bit locks. + */ +int jbd_lock_bh_sleep(void *notused) +{ + schedule(); + return 0; +} +#endif + /* * Helper function used to manage commit timeouts */ diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h --- linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h 2005-03-02 02:38:19.0 -0500 +++ linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h 2005-03-16 02:25:31.881251828 -0500 @@ -324,34 +324,65 @@ return bh->b_private; } +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) +int jbd_lock_bh_sleep(void *notused); +#endif + static inline void jbd_lock_bh_state(struct buffer_head *bh) { - bit_spin_lock(BH_State, >b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + might_sleep(); + wait_on_bit_lock(>b_state,BH_State,_lock_bh_sleep,TASK_UNINTERRUPTIBLE); +#endif + __acquire(bitlock); } static inline int jbd_trylock_bh_state(struct buffer_head *bh) { - return bit_spin_trylock(BH_State, >b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + if (test_and_set_bit(BH_State, >b_state)) + return 0; +#endif + __acquire(bitlock); + return 1; } static inline int jbd_is_locked_bh_state(struct buffer_head *bh) { - return bit_spin_is_locked(BH_State, >b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + return test_bit(BH_State, >b_state); +#else + return 1; +#endif } static inline void jbd_unlock_bh_state(struct buffer_head *bh) { - bit_spin_unlock(BH_State, >b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + clear_bit(BH_State, >b_state); + smp_mb__after_clear_bit(); + wake_up_bit(>b_state, BH_State); +#endif + __release(bitlock); } static inline void jbd_lock_bh_journal_head(struct buffer_head *bh) { - bit_spin_lock(BH_JournalHead, >b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + might_sleep(); + wait_on_bit_lock(>b_state,BH_JournalHead,_lock_bh_sleep,TASK_UNINTERRUPTIBLE); +#endif + __acquire(bitlock); } static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh) { - bit_spin_unlock(BH_JournalHead, >b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + clear_bit(BH_JournalHead, >b_state); + smp_mb__after_clear_bit(); + wake_up_bit(>b_state, BH_JournalHead); +#endif + __release(bitlock); } struct jbd_revoke_table_s; diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/spinlock.h linux-2.6.11-final-V0.7.40-00/include/linux/spinlock.h --- linux-2.6.11-final-V0.7.40-00.orig/include/linux/spinlock.h 2005-03-14 06:00:54.0 -0500 +++ linux-2.6.11-final-V0.7.40-00/include/linux/spinlock.h 2005-03-15 12:19:11.0 -0500 @@ -774,67 +774,6 @@ })) -/* - * bit-based spin_lock() - * - * Don't use this unless you really need to: spin_lock() and spin_unlock() - * are significantly faster. -
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Tue, 2005-03-15 at 13:05 -0500, Steven Rostedt wrote: > Damn! The answer was right there in front of my eyes! Here's the cleanest > solution. I forgot about wait_on_bit_lock. I've converted all the locks > to use this instead. We probably need to get priority inheritence working > on this too someday, but for now it's better than wasting memory or > getting into deadlocks. > I am still not clear on why this did not hit with earlier kernels + PREEMPT_DESKTOP. Were the bitlocks introduced recently? Or was another lock-break patch dropped? Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
Steven Rostedt <[EMAIL PROTECTED]> wrote: > > The problem here is that it's not ext3 bh's only. They're still the normal > buffer head. The problem arrises because the ext3 "journal head" is > allocated within these bit spin locks. Yes, the locks do want to live inside the buffer_head. Stephen has pointed out that we might want to remove jbd_lock_bh_journal_head() altogether some time, just use jbd_lock_bh_state() for that. In 2.4 these locks are global (or per-superblock). Making them a global spinlock would be acceptable for 2-ways and probably larger. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Tue, 15 Mar 2005, Ingo Molnar wrote: > > i'd go for removing bit-spinlocks altogether, in the upstream kernel. It > would simplify things, besides making PREEMPT_RT simpler as well. The > memory overhead is not a big issue i believe. (8 more bytes per ext3 bh, > on x86) > Hi Ingo, Damn! The answer was right there in front of my eyes! Here's the cleanest solution. I forgot about wait_on_bit_lock. I've converted all the locks to use this instead. We probably need to get priority inheritence working on this too someday, but for now it's better than wasting memory or getting into deadlocks. -- Steve diff -ur linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c --- linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c 2005-03-02 02:37:49.0 -0500 +++ linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c 2005-03-15 11:58:14.0 -0500 @@ -82,6 +82,17 @@ static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) +/* + * Used in the locking of the bh_state and bh_journalhead bit locks. + */ +int jbd_lock_bh_sleep(void *notused) +{ + schedule(); + return 0; +} +#endif + /* * Helper function used to manage commit timeouts */ diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h --- linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h 2005-03-02 02:38:19.0 -0500 +++ linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h 2005-03-15 11:58:40.0 -0500 @@ -324,34 +324,63 @@ return bh->b_private; } +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) +int jbd_lock_bh_sleep(void *notused); +#endif + static inline void jbd_lock_bh_state(struct buffer_head *bh) { - bit_spin_lock(BH_State, >b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + wait_on_bit_lock(>b_state,BH_State,_lock_bh_sleep,TASK_UNINTERRUPTIBLE); +#endif + __acquire(bitlock); } static inline int jbd_trylock_bh_state(struct buffer_head *bh) { - return bit_spin_trylock(BH_State, >b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + if (test_and_set_bit(BH_State, >b_state)) + return 0; +#endif + __acquire(bitlock); + return 1; } static inline int jbd_is_locked_bh_state(struct buffer_head *bh) { - return bit_spin_is_locked(BH_State, >b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + return test_bit(BH_State, >b_state); +#else + return 1; +#endif } static inline void jbd_unlock_bh_state(struct buffer_head *bh) { - bit_spin_unlock(BH_State, >b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + clear_bit(BH_State, >b_state); + smp_mb__after_clear_bit(); + wake_up_bit(>b_state, BH_State); +#endif + __release(bitlock); } static inline void jbd_lock_bh_journal_head(struct buffer_head *bh) { - bit_spin_lock(BH_JournalHead, >b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + wait_on_bit_lock(>b_state,BH_JournalHead,_lock_bh_sleep,TASK_UNINTERRUPTIBLE); +#endif + __acquire(bitlock); } static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh) { - bit_spin_unlock(BH_JournalHead, >b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + clear_bit(BH_JournalHead, >b_state); + smp_mb__after_clear_bit(); + wake_up_bit(>b_state, BH_JournalHead); +#endif + __release(bitlock); } struct jbd_revoke_table_s; diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/spinlock.h linux-2.6.11-final-V0.7.40-00/include/linux/spinlock.h --- linux-2.6.11-final-V0.7.40-00.orig/include/linux/spinlock.h 2005-03-14 06:00:54.0 -0500 +++ linux-2.6.11-final-V0.7.40-00/include/linux/spinlock.h 2005-03-15 12:19:11.032217736 -0500 @@ -774,67 +774,6 @@ })) -/* - * bit-based spin_lock() - * - * Don't use this unless you really need to: spin_lock() and spin_unlock() - * are significantly faster. - */ -static inline void bit_spin_lock(int bitnum, unsigned long *addr) -{ - /* -* Assuming the lock is uncontended, this never enters -* the body of the outer loop. If it is contended, then -* within the inner loop a non-atomic test is used to -* busywait with less bus contention for a good time to -* attempt to acquire the lock bit. -*/ -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) - while (test_and_set_bit(bitnum, addr)) - while (test_bit(bitnum, addr)) - cpu_relax(); -#endif -
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Tue, 15 Mar 2005, Ingo Molnar wrote: > > * Steven Rostedt <[EMAIL PROTECTED]> wrote: > > > > What should we use instead of #ifdef PREEMPT_RT? Or should we just > > keep it the same for both. Since this fix is only to fix spinlocks > > that schedule, I figured that it would be better not to waste the > > memory of those not using PREEMPT_RT. Should I use the opposite > > PREEMPT_DESKTOP? > > i'd go for removing bit-spinlocks altogether, in the upstream kernel. It > would simplify things, besides making PREEMPT_RT simpler as well. The > memory overhead is not a big issue i believe. (8 more bytes per ext3 bh, > on x86) > The problem here is that it's not ext3 bh's only. They're still the normal buffer head. The problem arrises because the ext3 "journal head" is allocated within these bit spin locks. I tried to monkey with putting the locks in the journal heads and have checks to see when to free them, but it wasn't that simple. I started having problems with some of the freeing transactions, I might have assumed too much. I'll give it one more try to get it into the journal heads, but after that, (if I fail) I'll let someone who understands the ext3 system better handle this. -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
* Steven Rostedt <[EMAIL PROTECTED]> wrote: > > good progress - but the global lock may be a scalability worry on > > upstream though. Would it be possible to just mirror much of the current > > lock logic, but with spinlocks instead of bitlocks? And there should be > > no #ifdefs on PREEMPT_RT. > > The first patch I had just converted the bit spinlocks to spinlocks > but I thought that adding two spinlocks was too much for every buffer > head, even if it wasn't in the ext3 file system. The journal head > spinlock is just used to add and remove the journal heads from the > buffer heads, so I'm not sure how much contention is on them. I only > have a dual smp system, so I can't test the system on large number of > CPUs. What do you think, should we sacrafice memory for speed? there are two bad effects of global spinlocks: 1) contention 2) cacheline bouncing. It's #2 that would affect this spinlock. While i'm not sure this would show up in usual benchmarks, we should rather err on the side of more scalability. Two spinlocks are just two more machine words on most architectures, so i dont think it matters all that much, while it removes a major wart - as long as the two extra locks are for ext3 buffer-heads only. > What should we use instead of #ifdef PREEMPT_RT? Or should we just > keep it the same for both. Since this fix is only to fix spinlocks > that schedule, I figured that it would be better not to waste the > memory of those not using PREEMPT_RT. Should I use the opposite > PREEMPT_DESKTOP? i'd go for removing bit-spinlocks altogether, in the upstream kernel. It would simplify things, besides making PREEMPT_RT simpler as well. The memory overhead is not a big issue i believe. (8 more bytes per ext3 bh, on x86) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Tue, 15 Mar 2005, Ingo Molnar wrote: > > * Steven Rostedt <[EMAIL PROTECTED]> wrote: > > > I've realized that my previous patch had too many problems with the > > way the journaling system works. So I went back to my first approach > > but added the journal_head lock as one global lock to keep the buffer > > head size smaller. I only added the state lock to the buffer head. > > I've tested this for some time now, and it works well (for the test at > > least). I'll recompile it with PREEMPT_DESKTOP to see if that works > > too. > > good progress - but the global lock may be a scalability worry on > upstream though. Would it be possible to just mirror much of the current > lock logic, but with spinlocks instead of bitlocks? And there should be > no #ifdefs on PREEMPT_RT. > The first patch I had just converted the bit spinlocks to spinlocks but I thought that adding two spinlocks was too much for every buffer head, even if it wasn't in the ext3 file system. The journal head spinlock is just used to add and remove the journal heads from the buffer heads, so I'm not sure how much contention is on them. I only have a dual smp system, so I can't test the system on large number of CPUs. What do you think, should we sacrafice memory for speed? What should we use instead of #ifdef PREEMPT_RT? Or should we just keep it the same for both. Since this fix is only to fix spinlocks that schedule, I figured that it would be better not to waste the memory of those not using PREEMPT_RT. Should I use the opposite PREEMPT_DESKTOP? Thanks, -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
* Steven Rostedt <[EMAIL PROTECTED]> wrote: > I've realized that my previous patch had too many problems with the > way the journaling system works. So I went back to my first approach > but added the journal_head lock as one global lock to keep the buffer > head size smaller. I only added the state lock to the buffer head. > I've tested this for some time now, and it works well (for the test at > least). I'll recompile it with PREEMPT_DESKTOP to see if that works > too. good progress - but the global lock may be a scalability worry on upstream though. Would it be possible to just mirror much of the current lock logic, but with spinlocks instead of bitlocks? And there should be no #ifdefs on PREEMPT_RT. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
I've realized that my previous patch had too many problems with the way the journaling system works. So I went back to my first approach but added the journal_head lock as one global lock to keep the buffer head size smaller. I only added the state lock to the buffer head. I've tested this for some time now, and it works well (for the test at least). I'll recompile it with PREEMPT_DESKTOP to see if that works too. -- Steve diff -ur linux-2.6.11-final-V0.7.40-00.orig/fs/buffer.c linux-2.6.11-final-V0.7.40-00/fs/buffer.c --- linux-2.6.11-final-V0.7.40-00.orig/fs/buffer.c 2005-03-02 02:38:10.0 -0500 +++ linux-2.6.11-final-V0.7.40-00/fs/buffer.c 2005-03-15 03:41:15.0 -0500 @@ -3003,6 +3003,9 @@ preempt_disable(); __get_cpu_var(bh_accounting).nr++; recalc_bh_state(); +#ifdef CONFIG_PREEMPT_RT + spin_lock_init(>b_jstate_lock); +#endif preempt_enable(); } return ret; diff -ur linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c --- linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c 2005-03-02 02:37:49.0 -0500 +++ linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c 2005-03-15 03:49:10.0 -0500 @@ -82,6 +82,8 @@ static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *); +spinlock_t journal_head_lock = SPIN_LOCK_UNLOCKED; + /* * Helper function used to manage commit timeouts */ diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/buffer_head.h linux-2.6.11-final-V0.7.40-00/include/linux/buffer_head.h --- linux-2.6.11-final-V0.7.40-00.orig/include/linux/buffer_head.h 2005-03-02 02:37:45.0 -0500 +++ linux-2.6.11-final-V0.7.40-00/include/linux/buffer_head.h 2005-03-15 03:42:22.0 -0500 @@ -62,6 +62,13 @@ bh_end_io_t *b_end_io; /* I/O completion */ void *b_private;/* reserved for b_end_io */ struct list_head b_assoc_buffers; /* associated with another mapping */ + +#ifdef CONFIG_PREEMPT_RT + /* +* Fixme: This should be in the journal code. +*/ + spinlock_t b_jstate_lock; /* lock for journal state. */ +#endif }; /* diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h --- linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h 2005-03-02 02:38:19.0 -0500 +++ linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h 2005-03-15 03:45:33.0 -0500 @@ -314,6 +314,13 @@ TAS_BUFFER_FNS(RevokeValid, revokevalid) BUFFER_FNS(Freed, freed) +#ifdef CONFIG_PREEMPT_RT +extern spinlock_t journal_head_lock; +#define PICK_SPIN_LOCK(otype,bit,name) spin_##otype(>b_##name##_lock) +#else +#define PICK_SPIN_LOCK(otype,bit,name) bit_spin_##otype(bit,bh->b_state); +#endif + static inline struct buffer_head *jh2bh(struct journal_head *jh) { return jh->b_bh; @@ -326,24 +333,36 @@ static inline void jbd_lock_bh_state(struct buffer_head *bh) { - bit_spin_lock(BH_State, >b_state); + PICK_SPIN_LOCK(lock,BH_State,jstate); } static inline int jbd_trylock_bh_state(struct buffer_head *bh) { - return bit_spin_trylock(BH_State, >b_state); + return PICK_SPIN_LOCK(trylock,BH_State,jstate); } static inline int jbd_is_locked_bh_state(struct buffer_head *bh) { - return bit_spin_is_locked(BH_State, >b_state); + return PICK_SPIN_LOCK(is_locked,BH_State,jstate); } static inline void jbd_unlock_bh_state(struct buffer_head *bh) { - bit_spin_unlock(BH_State, >b_state); + PICK_SPIN_LOCK(unlock,BH_State,jstate); +} +#undef PICK_SPIN_LOCK + +#ifdef CONFIG_PREEMPT_RT +static inline void jbd_lock_bh_journal_head(struct buffer_head *bh) +{ + spin_lock(_head_lock); } +static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh) +{ + spin_unlock(_head_lock); +} +#else /* !CONFIG_PREEMPT_RT */ static inline void jbd_lock_bh_journal_head(struct buffer_head *bh) { bit_spin_lock(BH_JournalHead, >b_state); @@ -353,6 +372,7 @@ { bit_spin_unlock(BH_JournalHead, >b_state); } +#endif /* CONFIG_PREEMPT_RT */ struct jbd_revoke_table_s; diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/spinlock.h linux-2.6.11-final-V0.7.40-00/include/linux/spinlock.h --- linux-2.6.11-final-V0.7.40-00.orig/include/linux/spinlock.h 2005-03-14 06:00:54.0 -0500 +++ linux-2.6.11-final-V0.7.40-00/include/linux/spinlock.h 2005-03-15 03:40:31.0 -0500 @@ -774,6 +774,10 @@ })) +#ifndef CONFIG_PREEMPT_RT + +/* These are just plain evil! */ + /* * bit-based spin_lock() * @@ -789,10 +793,15 @@ * busywait with less bus contention for a good time to * attempt to acquire the lock bit. */ -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) - while (test_and_set_bit(bitnum, addr)) -
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
I've realized that my previous patch had too many problems with the way the journaling system works. So I went back to my first approach but added the journal_head lock as one global lock to keep the buffer head size smaller. I only added the state lock to the buffer head. I've tested this for some time now, and it works well (for the test at least). I'll recompile it with PREEMPT_DESKTOP to see if that works too. -- Steve diff -ur linux-2.6.11-final-V0.7.40-00.orig/fs/buffer.c linux-2.6.11-final-V0.7.40-00/fs/buffer.c --- linux-2.6.11-final-V0.7.40-00.orig/fs/buffer.c 2005-03-02 02:38:10.0 -0500 +++ linux-2.6.11-final-V0.7.40-00/fs/buffer.c 2005-03-15 03:41:15.0 -0500 @@ -3003,6 +3003,9 @@ preempt_disable(); __get_cpu_var(bh_accounting).nr++; recalc_bh_state(); +#ifdef CONFIG_PREEMPT_RT + spin_lock_init(ret-b_jstate_lock); +#endif preempt_enable(); } return ret; diff -ur linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c --- linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c 2005-03-02 02:37:49.0 -0500 +++ linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c 2005-03-15 03:49:10.0 -0500 @@ -82,6 +82,8 @@ static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *); +spinlock_t journal_head_lock = SPIN_LOCK_UNLOCKED; + /* * Helper function used to manage commit timeouts */ diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/buffer_head.h linux-2.6.11-final-V0.7.40-00/include/linux/buffer_head.h --- linux-2.6.11-final-V0.7.40-00.orig/include/linux/buffer_head.h 2005-03-02 02:37:45.0 -0500 +++ linux-2.6.11-final-V0.7.40-00/include/linux/buffer_head.h 2005-03-15 03:42:22.0 -0500 @@ -62,6 +62,13 @@ bh_end_io_t *b_end_io; /* I/O completion */ void *b_private;/* reserved for b_end_io */ struct list_head b_assoc_buffers; /* associated with another mapping */ + +#ifdef CONFIG_PREEMPT_RT + /* +* Fixme: This should be in the journal code. +*/ + spinlock_t b_jstate_lock; /* lock for journal state. */ +#endif }; /* diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h --- linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h 2005-03-02 02:38:19.0 -0500 +++ linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h 2005-03-15 03:45:33.0 -0500 @@ -314,6 +314,13 @@ TAS_BUFFER_FNS(RevokeValid, revokevalid) BUFFER_FNS(Freed, freed) +#ifdef CONFIG_PREEMPT_RT +extern spinlock_t journal_head_lock; +#define PICK_SPIN_LOCK(otype,bit,name) spin_##otype(bh-b_##name##_lock) +#else +#define PICK_SPIN_LOCK(otype,bit,name) bit_spin_##otype(bit,bh-b_state); +#endif + static inline struct buffer_head *jh2bh(struct journal_head *jh) { return jh-b_bh; @@ -326,24 +333,36 @@ static inline void jbd_lock_bh_state(struct buffer_head *bh) { - bit_spin_lock(BH_State, bh-b_state); + PICK_SPIN_LOCK(lock,BH_State,jstate); } static inline int jbd_trylock_bh_state(struct buffer_head *bh) { - return bit_spin_trylock(BH_State, bh-b_state); + return PICK_SPIN_LOCK(trylock,BH_State,jstate); } static inline int jbd_is_locked_bh_state(struct buffer_head *bh) { - return bit_spin_is_locked(BH_State, bh-b_state); + return PICK_SPIN_LOCK(is_locked,BH_State,jstate); } static inline void jbd_unlock_bh_state(struct buffer_head *bh) { - bit_spin_unlock(BH_State, bh-b_state); + PICK_SPIN_LOCK(unlock,BH_State,jstate); +} +#undef PICK_SPIN_LOCK + +#ifdef CONFIG_PREEMPT_RT +static inline void jbd_lock_bh_journal_head(struct buffer_head *bh) +{ + spin_lock(journal_head_lock); } +static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh) +{ + spin_unlock(journal_head_lock); +} +#else /* !CONFIG_PREEMPT_RT */ static inline void jbd_lock_bh_journal_head(struct buffer_head *bh) { bit_spin_lock(BH_JournalHead, bh-b_state); @@ -353,6 +372,7 @@ { bit_spin_unlock(BH_JournalHead, bh-b_state); } +#endif /* CONFIG_PREEMPT_RT */ struct jbd_revoke_table_s; diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/spinlock.h linux-2.6.11-final-V0.7.40-00/include/linux/spinlock.h --- linux-2.6.11-final-V0.7.40-00.orig/include/linux/spinlock.h 2005-03-14 06:00:54.0 -0500 +++ linux-2.6.11-final-V0.7.40-00/include/linux/spinlock.h 2005-03-15 03:40:31.0 -0500 @@ -774,6 +774,10 @@ })) +#ifndef CONFIG_PREEMPT_RT + +/* These are just plain evil! */ + /* * bit-based spin_lock() * @@ -789,10 +793,15 @@ * busywait with less bus contention for a good time to * attempt to acquire the lock bit. */ -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) - while
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
* Steven Rostedt [EMAIL PROTECTED] wrote: I've realized that my previous patch had too many problems with the way the journaling system works. So I went back to my first approach but added the journal_head lock as one global lock to keep the buffer head size smaller. I only added the state lock to the buffer head. I've tested this for some time now, and it works well (for the test at least). I'll recompile it with PREEMPT_DESKTOP to see if that works too. good progress - but the global lock may be a scalability worry on upstream though. Would it be possible to just mirror much of the current lock logic, but with spinlocks instead of bitlocks? And there should be no #ifdefs on PREEMPT_RT. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Tue, 15 Mar 2005, Ingo Molnar wrote: * Steven Rostedt [EMAIL PROTECTED] wrote: I've realized that my previous patch had too many problems with the way the journaling system works. So I went back to my first approach but added the journal_head lock as one global lock to keep the buffer head size smaller. I only added the state lock to the buffer head. I've tested this for some time now, and it works well (for the test at least). I'll recompile it with PREEMPT_DESKTOP to see if that works too. good progress - but the global lock may be a scalability worry on upstream though. Would it be possible to just mirror much of the current lock logic, but with spinlocks instead of bitlocks? And there should be no #ifdefs on PREEMPT_RT. The first patch I had just converted the bit spinlocks to spinlocks but I thought that adding two spinlocks was too much for every buffer head, even if it wasn't in the ext3 file system. The journal head spinlock is just used to add and remove the journal heads from the buffer heads, so I'm not sure how much contention is on them. I only have a dual smp system, so I can't test the system on large number of CPUs. What do you think, should we sacrafice memory for speed? What should we use instead of #ifdef PREEMPT_RT? Or should we just keep it the same for both. Since this fix is only to fix spinlocks that schedule, I figured that it would be better not to waste the memory of those not using PREEMPT_RT. Should I use the opposite PREEMPT_DESKTOP? Thanks, -- Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
* Steven Rostedt [EMAIL PROTECTED] wrote: good progress - but the global lock may be a scalability worry on upstream though. Would it be possible to just mirror much of the current lock logic, but with spinlocks instead of bitlocks? And there should be no #ifdefs on PREEMPT_RT. The first patch I had just converted the bit spinlocks to spinlocks but I thought that adding two spinlocks was too much for every buffer head, even if it wasn't in the ext3 file system. The journal head spinlock is just used to add and remove the journal heads from the buffer heads, so I'm not sure how much contention is on them. I only have a dual smp system, so I can't test the system on large number of CPUs. What do you think, should we sacrafice memory for speed? there are two bad effects of global spinlocks: 1) contention 2) cacheline bouncing. It's #2 that would affect this spinlock. While i'm not sure this would show up in usual benchmarks, we should rather err on the side of more scalability. Two spinlocks are just two more machine words on most architectures, so i dont think it matters all that much, while it removes a major wart - as long as the two extra locks are for ext3 buffer-heads only. What should we use instead of #ifdef PREEMPT_RT? Or should we just keep it the same for both. Since this fix is only to fix spinlocks that schedule, I figured that it would be better not to waste the memory of those not using PREEMPT_RT. Should I use the opposite PREEMPT_DESKTOP? i'd go for removing bit-spinlocks altogether, in the upstream kernel. It would simplify things, besides making PREEMPT_RT simpler as well. The memory overhead is not a big issue i believe. (8 more bytes per ext3 bh, on x86) Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Tue, 15 Mar 2005, Ingo Molnar wrote: * Steven Rostedt [EMAIL PROTECTED] wrote: What should we use instead of #ifdef PREEMPT_RT? Or should we just keep it the same for both. Since this fix is only to fix spinlocks that schedule, I figured that it would be better not to waste the memory of those not using PREEMPT_RT. Should I use the opposite PREEMPT_DESKTOP? i'd go for removing bit-spinlocks altogether, in the upstream kernel. It would simplify things, besides making PREEMPT_RT simpler as well. The memory overhead is not a big issue i believe. (8 more bytes per ext3 bh, on x86) The problem here is that it's not ext3 bh's only. They're still the normal buffer head. The problem arrises because the ext3 journal head is allocated within these bit spin locks. I tried to monkey with putting the locks in the journal heads and have checks to see when to free them, but it wasn't that simple. I started having problems with some of the freeing transactions, I might have assumed too much. I'll give it one more try to get it into the journal heads, but after that, (if I fail) I'll let someone who understands the ext3 system better handle this. -- Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Tue, 15 Mar 2005, Ingo Molnar wrote: i'd go for removing bit-spinlocks altogether, in the upstream kernel. It would simplify things, besides making PREEMPT_RT simpler as well. The memory overhead is not a big issue i believe. (8 more bytes per ext3 bh, on x86) Hi Ingo, Damn! The answer was right there in front of my eyes! Here's the cleanest solution. I forgot about wait_on_bit_lock. I've converted all the locks to use this instead. We probably need to get priority inheritence working on this too someday, but for now it's better than wasting memory or getting into deadlocks. -- Steve diff -ur linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c --- linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c 2005-03-02 02:37:49.0 -0500 +++ linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c 2005-03-15 11:58:14.0 -0500 @@ -82,6 +82,17 @@ static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) +/* + * Used in the locking of the bh_state and bh_journalhead bit locks. + */ +int jbd_lock_bh_sleep(void *notused) +{ + schedule(); + return 0; +} +#endif + /* * Helper function used to manage commit timeouts */ diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h --- linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h 2005-03-02 02:38:19.0 -0500 +++ linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h 2005-03-15 11:58:40.0 -0500 @@ -324,34 +324,63 @@ return bh-b_private; } +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) +int jbd_lock_bh_sleep(void *notused); +#endif + static inline void jbd_lock_bh_state(struct buffer_head *bh) { - bit_spin_lock(BH_State, bh-b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + wait_on_bit_lock(bh-b_state,BH_State,jbd_lock_bh_sleep,TASK_UNINTERRUPTIBLE); +#endif + __acquire(bitlock); } static inline int jbd_trylock_bh_state(struct buffer_head *bh) { - return bit_spin_trylock(BH_State, bh-b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + if (test_and_set_bit(BH_State, bh-b_state)) + return 0; +#endif + __acquire(bitlock); + return 1; } static inline int jbd_is_locked_bh_state(struct buffer_head *bh) { - return bit_spin_is_locked(BH_State, bh-b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + return test_bit(BH_State, bh-b_state); +#else + return 1; +#endif } static inline void jbd_unlock_bh_state(struct buffer_head *bh) { - bit_spin_unlock(BH_State, bh-b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + clear_bit(BH_State, bh-b_state); + smp_mb__after_clear_bit(); + wake_up_bit(bh-b_state, BH_State); +#endif + __release(bitlock); } static inline void jbd_lock_bh_journal_head(struct buffer_head *bh) { - bit_spin_lock(BH_JournalHead, bh-b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + wait_on_bit_lock(bh-b_state,BH_JournalHead,jbd_lock_bh_sleep,TASK_UNINTERRUPTIBLE); +#endif + __acquire(bitlock); } static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh) { - bit_spin_unlock(BH_JournalHead, bh-b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + clear_bit(BH_JournalHead, bh-b_state); + smp_mb__after_clear_bit(); + wake_up_bit(bh-b_state, BH_JournalHead); +#endif + __release(bitlock); } struct jbd_revoke_table_s; diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/spinlock.h linux-2.6.11-final-V0.7.40-00/include/linux/spinlock.h --- linux-2.6.11-final-V0.7.40-00.orig/include/linux/spinlock.h 2005-03-14 06:00:54.0 -0500 +++ linux-2.6.11-final-V0.7.40-00/include/linux/spinlock.h 2005-03-15 12:19:11.032217736 -0500 @@ -774,67 +774,6 @@ })) -/* - * bit-based spin_lock() - * - * Don't use this unless you really need to: spin_lock() and spin_unlock() - * are significantly faster. - */ -static inline void bit_spin_lock(int bitnum, unsigned long *addr) -{ - /* -* Assuming the lock is uncontended, this never enters -* the body of the outer loop. If it is contended, then -* within the inner loop a non-atomic test is used to -* busywait with less bus contention for a good time to -* attempt to acquire the lock bit. -*/ -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) - while (test_and_set_bit(bitnum, addr)) - while (test_bit(bitnum, addr)) -
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
Steven Rostedt [EMAIL PROTECTED] wrote: The problem here is that it's not ext3 bh's only. They're still the normal buffer head. The problem arrises because the ext3 journal head is allocated within these bit spin locks. Yes, the locks do want to live inside the buffer_head. Stephen has pointed out that we might want to remove jbd_lock_bh_journal_head() altogether some time, just use jbd_lock_bh_state() for that. In 2.4 these locks are global (or per-superblock). Making them a global spinlock would be acceptable for 2-ways and probably larger. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Tue, 2005-03-15 at 13:05 -0500, Steven Rostedt wrote: Damn! The answer was right there in front of my eyes! Here's the cleanest solution. I forgot about wait_on_bit_lock. I've converted all the locks to use this instead. We probably need to get priority inheritence working on this too someday, but for now it's better than wasting memory or getting into deadlocks. I am still not clear on why this did not hit with earlier kernels + PREEMPT_DESKTOP. Were the bitlocks introduced recently? Or was another lock-break patch dropped? Lee - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Tue, 15 Mar 2005, Steven Rostedt wrote: On Tue, 15 Mar 2005, Ingo Molnar wrote: i'd go for removing bit-spinlocks altogether, in the upstream kernel. It would simplify things, besides making PREEMPT_RT simpler as well. The memory overhead is not a big issue i believe. (8 more bytes per ext3 bh, on x86) Hi Ingo, Damn! The answer was right there in front of my eyes! Here's the cleanest solution. I forgot about wait_on_bit_lock. I've converted all the locks to use this instead. We probably need to get priority inheritence working on this too someday, but for now it's better than wasting memory or getting into deadlocks. One bit of caution on these. If we don't have PREEMPT_RT, then don't the spinlocks on SMP act the same as normal spinlocks, and that we should not schedule holding a spinlock? I believe that some of this locks are called within holding spin_locks. So this isn't the right solution for other than PREEMPT_RT. I also forgot to add might_sleep in the locking calls. Here's the patch with the might_sleep added. What should we do for non PREEPMT_RT? Maybe put the bit_spinlocks back in for that case? -- Steve diff -ur linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c --- linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c 2005-03-02 02:37:49.0 -0500 +++ linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c 2005-03-15 11:58:14.0 -0500 @@ -82,6 +82,17 @@ static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) +/* + * Used in the locking of the bh_state and bh_journalhead bit locks. + */ +int jbd_lock_bh_sleep(void *notused) +{ + schedule(); + return 0; +} +#endif + /* * Helper function used to manage commit timeouts */ diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h --- linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h 2005-03-02 02:38:19.0 -0500 +++ linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h 2005-03-16 02:25:31.881251828 -0500 @@ -324,34 +324,65 @@ return bh-b_private; } +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) +int jbd_lock_bh_sleep(void *notused); +#endif + static inline void jbd_lock_bh_state(struct buffer_head *bh) { - bit_spin_lock(BH_State, bh-b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + might_sleep(); + wait_on_bit_lock(bh-b_state,BH_State,jbd_lock_bh_sleep,TASK_UNINTERRUPTIBLE); +#endif + __acquire(bitlock); } static inline int jbd_trylock_bh_state(struct buffer_head *bh) { - return bit_spin_trylock(BH_State, bh-b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + if (test_and_set_bit(BH_State, bh-b_state)) + return 0; +#endif + __acquire(bitlock); + return 1; } static inline int jbd_is_locked_bh_state(struct buffer_head *bh) { - return bit_spin_is_locked(BH_State, bh-b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + return test_bit(BH_State, bh-b_state); +#else + return 1; +#endif } static inline void jbd_unlock_bh_state(struct buffer_head *bh) { - bit_spin_unlock(BH_State, bh-b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + clear_bit(BH_State, bh-b_state); + smp_mb__after_clear_bit(); + wake_up_bit(bh-b_state, BH_State); +#endif + __release(bitlock); } static inline void jbd_lock_bh_journal_head(struct buffer_head *bh) { - bit_spin_lock(BH_JournalHead, bh-b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + might_sleep(); + wait_on_bit_lock(bh-b_state,BH_JournalHead,jbd_lock_bh_sleep,TASK_UNINTERRUPTIBLE); +#endif + __acquire(bitlock); } static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh) { - bit_spin_unlock(BH_JournalHead, bh-b_state); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) + clear_bit(BH_JournalHead, bh-b_state); + smp_mb__after_clear_bit(); + wake_up_bit(bh-b_state, BH_JournalHead); +#endif + __release(bitlock); } struct jbd_revoke_table_s; diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/spinlock.h linux-2.6.11-final-V0.7.40-00/include/linux/spinlock.h --- linux-2.6.11-final-V0.7.40-00.orig/include/linux/spinlock.h 2005-03-14 06:00:54.0 -0500 +++ linux-2.6.11-final-V0.7.40-00/include/linux/spinlock.h 2005-03-15 12:19:11.0 -0500 @@ -774,67 +774,6 @@ })) -/* - * bit-based spin_lock() - * - * Don't use this unless you really need to: spin_lock() and spin_unlock() - * are significantly
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Tue, 15 Mar 2005, Lee Revell wrote: On Tue, 2005-03-15 at 13:05 -0500, Steven Rostedt wrote: Damn! The answer was right there in front of my eyes! Here's the cleanest solution. I forgot about wait_on_bit_lock. I've converted all the locks to use this instead. We probably need to get priority inheritence working on this too someday, but for now it's better than wasting memory or getting into deadlocks. I am still not clear on why this did not hit with earlier kernels + PREEMPT_DESKTOP. Were the bitlocks introduced recently? Or was another lock-break patch dropped? When did you start seeing this? This code has been there as far back as 2.6.7 (the earliest 2.6 kernel I still have laying around) and as far back as Ingo's realtime-preempt-2.6.9-mm1-U10. Maybe the tracing didn't start picking this up till later, or that you were just lucky that no contention was happening on that lock. -- Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
Hi Ingo, I've found something that is very interesting and I can't explain it. On Mon, 14 Mar 2005, Steven Rostedt wrote: > > > On Mon, 14 Mar 2005, Steven Rostedt wrote: > > > > On Mon, 14 Mar 2005, Steven Rostedt wrote: > > > > > > I just downloaded -40 and applied my patch, compiled it with > > > PREEMPT_DESKTOP and data=ordered, ran it and everything seems OK, except > > > I'm getting the following... > > > > > > BUG: Unable to handle kernel NULL pointer dereference at virtual address > > > > > > printing eip: > > > c0213438 > > > *pde = > > > > [snip] > > > > > All I did now was to add this patch to your -40-00 kernel: diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h --- linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h 2005-03-02 02:38:19.0 -0500 +++ linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h 2005-03-14 13:22:04.0 -0500 @@ -324,6 +324,8 @@ return bh->b_private; } +BUFFER_FNS(JournalHead,journalhead) + static inline void jbd_lock_bh_state(struct buffer_head *bh) { bit_spin_lock(BH_State, >b_state); And I get the following output: BUG: Unable to handle kernel NULL pointer dereference at virtual address printing eip: c0213118 *pde = Oops: [#1] Modules linked in: ipv6 af_packet tsdev mousedev evdev floppy psmouse pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc shpchp pci_hotplug ehci_hcd intel_agp agpgart uhci_hcd usbcore e100 mii ide_cd cdrom unix CPU:0 EIP:0060:[]Not tainted VLI EFLAGS: 00010286 (2.6.11-RT-V0.7.40-00) EIP is at vt_ioctl+0x18/0x1ab0 eax: ebx: 5603 ecx: 5603 edx: cee14d80 esi: c0213100 edi: cb4bd000 ebp: cc03bf18 esp: cc03be48 ds: 007b es: 007b ss: 0068 preempt: Process XFree86 (pid: 4709, threadinfo=cc03a000 task=cf0d5020) Stack: cf0d5170 cc03a000 cf0d5020 c03448ec cf0d5020 0246 cc03be7c c0117267 c03448f4 0006 0001 cc03bebc cf1b81ec ce820600 ce94a9b8 cc03bed4 c01704f1 ce94a9b8 0007 Call Trace: [] show_stack+0x7f/0xa0 (28) [] show_registers+0x165/0x1d0 (56) [] die+0xc8/0x150 (64) [] do_page_fault+0x356/0x6c4 (216) [] error_code+0x2b/0x30 (268) [] tty_ioctl+0x34b/0x490 (52) [] do_ioctl+0x4f/0x70 (32) [] vfs_ioctl+0x62/0x1d0 (40) [] sys_ioctl+0x61/0x90 (40) [] syscall_call+0x7/0xb (-8124) Code: ff ff 8d 05 28 4d 34 c0 e8 f6 60 0a 00 e9 3a ff ff ff 90 55 89 e5 57 56 53 81 ec c4 00 00 00 8b 7d 08 8b 5d 10 8b 87 7c 09 00 00 <8b> 30 89 34 24 8b 04 b5 e0 b7 3c c0 89 45 8c e8 a4 6a 00 00 85 I don't know why. BUFFER_FNS is just defined as: #define BUFFER_FNS(bit, name) \ static inline void set_buffer_##name(struct buffer_head *bh)\ { \ set_bit(BH_##bit, &(bh)->b_state); \ } \ static inline void clear_buffer_##name(struct buffer_head *bh) \ { \ clear_bit(BH_##bit, &(bh)->b_state);\ } \ static inline int buffer_##name(const struct buffer_head *bh) \ { \ return test_bit(BH_##bit, &(bh)->b_state); \ } So all it does is make three function that are never used. set_buffer_journalhead(...) clear_buffer_journalhead(...) buffer_journalhead(...) Unless, some macro uses it, but I don't know why adding that line causes the bug output that I showed. If I remove that line, I don't get that output. And this is consistent. I've recompiled the kernel several times, and everytime I compile it with this added patch I get that output. And everytime without it, it runs fine. Oh, please note that this only happens with PREEMPT_DESKTOP, and not with PREEMPT_RT. I really think this is a symptom of something else and not the cause of the bug. What do you think? -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Mon, 14 Mar 2005, Steven Rostedt wrote: > > On Mon, 14 Mar 2005, Steven Rostedt wrote: > > > > I just downloaded -40 and applied my patch, compiled it with > > PREEMPT_DESKTOP and data=ordered, ran it and everything seems OK, except > > I'm getting the following... > > > > BUG: Unable to handle kernel NULL pointer dereference at virtual address > > > > printing eip: > > c0213438 > > *pde = > > [snip] > > > > > > > I'll see if this happens without the patch, and if so, then I'll look into > > this further. > > > > Well, I took out my patch and this bug didn't happen, so I guess it's may > fault! OK, I'll dig into it further. > Here's a new patch. All I did was move BUFFER_FNS(JournalHead,journalhead) to inside the #ifdef CONFIG_PREEMPT_RT and my oops went away !?! This really bothers me since it just declares some functions and is not used with CONFIG_PREEMPT_RT off. I have no idea what's going on. Lee, can you see if this still crashes for you. Thanks, -- Steve diff -ur linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c --- linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c 2005-03-02 02:37:49.0 -0500 +++ linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c 2005-03-14 09:46:41.0 -0500 @@ -80,6 +80,10 @@ EXPORT_SYMBOL(journal_try_to_free_buffers); EXPORT_SYMBOL(journal_force_commit); +#ifdef CONFIG_PREEMPT_RT +spinlock_t jbd_journal_head_lock = SPIN_LOCK_UNLOCKED; +#endif + static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *); /* @@ -1727,6 +1731,9 @@ jh = new_jh; new_jh = NULL; /* We consumed it */ set_buffer_jbd(bh); +#ifdef CONFIG_PREEMPT_RT + spin_lock_init(>b_state_lock); +#endif bh->b_private = jh; jh->b_bh = bh; get_bh(bh); @@ -1767,26 +1774,34 @@ if (jh->b_transaction == NULL && jh->b_next_transaction == NULL && jh->b_cp_transaction == NULL) { - J_ASSERT_BH(bh, buffer_jbd(bh)); - J_ASSERT_BH(bh, jh2bh(jh) == bh); - BUFFER_TRACE(bh, "remove journal_head"); - if (jh->b_frozen_data) { - printk(KERN_WARNING "%s: freeing " - "b_frozen_data\n", - __FUNCTION__); - kfree(jh->b_frozen_data); - } - if (jh->b_committed_data) { - printk(KERN_WARNING "%s: freeing " - "b_committed_data\n", - __FUNCTION__); - kfree(jh->b_committed_data); +#ifdef CONFIG_PREEMPT_RT + if (atomic_read(>b_state_wait_count)) { + BUG_ON(buffer_journalhead(bh)); + set_buffer_journalhead(bh); + } else +#endif + { + J_ASSERT_BH(bh, buffer_jbd(bh)); + J_ASSERT_BH(bh, jh2bh(jh) == bh); + BUFFER_TRACE(bh, "remove journal_head"); + if (jh->b_frozen_data) { + printk(KERN_WARNING "%s: freeing " + "b_frozen_data\n", + __FUNCTION__); + kfree(jh->b_frozen_data); + } + if (jh->b_committed_data) { + printk(KERN_WARNING "%s: freeing " + "b_committed_data\n", + __FUNCTION__); + kfree(jh->b_committed_data); + } + bh->b_private = NULL; + jh->b_bh = NULL;/* debug, really */ + clear_buffer_jbd(bh); + __brelse(bh); + journal_free_journal_head(jh); } - bh->b_private = NULL; - jh->b_bh = NULL;/* debug, really */ - clear_buffer_jbd(bh); - __brelse(bh); - journal_free_journal_head(jh); } else { BUFFER_TRACE(bh, "journal_head was locked"); } diff -ur linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/transaction.c linux-2.6.11-final-V0.7.40-00/fs/jbd/transaction.c ---
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Mon, 14 Mar 2005, Steven Rostedt wrote: > > I just downloaded -40 and applied my patch, compiled it with > PREEMPT_DESKTOP and data=ordered, ran it and everything seems OK, except > I'm getting the following... > > BUG: Unable to handle kernel NULL pointer dereference at virtual address > > printing eip: > c0213438 > *pde = [snip] > > > I'll see if this happens without the patch, and if so, then I'll look into > this further. > Well, I took out my patch and this bug didn't happen, so I guess it's may fault! OK, I'll dig into it further. -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Mon, 14 Mar 2005, Steven Rostedt wrote: > > > > I'll test this with PREEMPT_DESKTOP and data=ordered also and see how it > > > goes. > > > > Does not seem to work at all with the above settings. It seemed OK > > until I started X. Then every time I launched an xterm it would > > disappear as soon as I typed anything. I could not switch consoles to > > see the Oops. > > > > Hi Lee, > > I just compiled PREEMPT_DESKTOP and mounted root (only disk filesystem on > my test machine) as data=ordered. I had no problem getting to X, starting > an xterm and running a make. Actually it was a gnome-term since I didn't > have xterm. But then I su to root, apt-get xterm, ran xterm, and did a > make there with no problems. > > Did you patch this against 39-02 or -40-X? > > I haven't had time to upgrade to 40 yet. Maybe, I'll work on that today. > I just downloaded -40 and applied my patch, compiled it with PREEMPT_DESKTOP and data=ordered, ran it and everything seems OK, except I'm getting the following... BUG: Unable to handle kernel NULL pointer dereference at virtual address printing eip: c0213438 *pde = Oops: [#1] Modules linked in: ipv6 af_packet tsdev mousedev evdev floppy psmouse pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc shpchp pci_hotplug ehci_hcd intel_agp agpgart uhci_hcd usbcore e100 mii ide_cd cdrom unix CPU:0 EIP:0060:[]Not tainted VLI EFLAGS: 00010286 (2.6.11-RT-V0.7.40-00) EIP is at vt_ioctl+0x18/0x1ab0 eax: ebx: 5603 ecx: 5603 edx: cb6c8780 esi: c0213420 edi: cc956000 ebp: cb613f18 esp: cb613e48 ds: 007b es: 007b ss: 0068 preempt: Process XFree86 (pid: 4713, threadinfo=cb612000 task=cb5e0a40) Stack: cb5e0b90 cb612000 cb5e0a40 c034494c cb5e0a40 0246 cb613e7c c0117217 c0344954 0006 0001 cb613ebc ce0cce24 c13e1800 cf1279b8 cb613ed4 c01707f1 cf1279b8 0007 Call Trace: [] show_stack+0x7f/0xa0 (28) [] show_registers+0x165/0x1d0 (56) [] die+0xc8/0x150 (64) [] do_page_fault+0x356/0x6c4 (216) [] error_code+0x2b/0x30 (268) [] tty_ioctl+0x34b/0x490 (52) [] do_ioctl+0x4f/0x70 (32) [] vfs_ioctl+0x62/0x1d0 (40) [] sys_ioctl+0x61/0x90 (40) [] syscall_call+0x7/0xb (-8124) Code: ff ff 8d 05 88 4d 34 c0 e8 f6 60 0a 00 e9 3a ff ff ff 90 55 89 e5 57 56 53 81 ec c4 00 00 00 8b 7d 08 8b 5d 10 8b 87 7c 09 00 00 <8b> 30 89 34 24 8b 04 b5 e0 b7 3c c0 89 45 8c e8 a4 6a 00 00 85 I'll see if this happens without the patch, and if so, then I'll look into this further. Thanks, -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Mon, 14 Mar 2005, Steven Rostedt wrote: I'll test this with PREEMPT_DESKTOP and data=ordered also and see how it goes. Does not seem to work at all with the above settings. It seemed OK until I started X. Then every time I launched an xterm it would disappear as soon as I typed anything. I could not switch consoles to see the Oops. Hi Lee, I just compiled PREEMPT_DESKTOP and mounted root (only disk filesystem on my test machine) as data=ordered. I had no problem getting to X, starting an xterm and running a make. Actually it was a gnome-term since I didn't have xterm. But then I su to root, apt-get xterm, ran xterm, and did a make there with no problems. Did you patch this against 39-02 or -40-X? I haven't had time to upgrade to 40 yet. Maybe, I'll work on that today. I just downloaded -40 and applied my patch, compiled it with PREEMPT_DESKTOP and data=ordered, ran it and everything seems OK, except I'm getting the following... BUG: Unable to handle kernel NULL pointer dereference at virtual address printing eip: c0213438 *pde = Oops: [#1] Modules linked in: ipv6 af_packet tsdev mousedev evdev floppy psmouse pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc shpchp pci_hotplug ehci_hcd intel_agp agpgart uhci_hcd usbcore e100 mii ide_cd cdrom unix CPU:0 EIP:0060:[c0213438]Not tainted VLI EFLAGS: 00010286 (2.6.11-RT-V0.7.40-00) EIP is at vt_ioctl+0x18/0x1ab0 eax: ebx: 5603 ecx: 5603 edx: cb6c8780 esi: c0213420 edi: cc956000 ebp: cb613f18 esp: cb613e48 ds: 007b es: 007b ss: 0068 preempt: Process XFree86 (pid: 4713, threadinfo=cb612000 task=cb5e0a40) Stack: cb5e0b90 cb612000 cb5e0a40 c034494c cb5e0a40 0246 cb613e7c c0117217 c0344954 0006 0001 cb613ebc ce0cce24 c13e1800 cf1279b8 cb613ed4 c01707f1 cf1279b8 0007 Call Trace: [c0103cdf] show_stack+0x7f/0xa0 (28) [c0103e95] show_registers+0x165/0x1d0 (56) [c0104088] die+0xc8/0x150 (64) [c0115376] do_page_fault+0x356/0x6c4 (216) [c0103973] error_code+0x2b/0x30 (268) [c020e91b] tty_ioctl+0x34b/0x490 (52) [c016837f] do_ioctl+0x4f/0x70 (32) [c0168582] vfs_ioctl+0x62/0x1d0 (40) [c0168751] sys_ioctl+0x61/0x90 (40) [c0102ec3] syscall_call+0x7/0xb (-8124) Code: ff ff 8d 05 88 4d 34 c0 e8 f6 60 0a 00 e9 3a ff ff ff 90 55 89 e5 57 56 53 81 ec c4 00 00 00 8b 7d 08 8b 5d 10 8b 87 7c 09 00 00 8b 30 89 34 24 8b 04 b5 e0 b7 3c c0 89 45 8c e8 a4 6a 00 00 85 I'll see if this happens without the patch, and if so, then I'll look into this further. Thanks, -- Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Mon, 14 Mar 2005, Steven Rostedt wrote: I just downloaded -40 and applied my patch, compiled it with PREEMPT_DESKTOP and data=ordered, ran it and everything seems OK, except I'm getting the following... BUG: Unable to handle kernel NULL pointer dereference at virtual address printing eip: c0213438 *pde = [snip] I'll see if this happens without the patch, and if so, then I'll look into this further. Well, I took out my patch and this bug didn't happen, so I guess it's may fault! OK, I'll dig into it further. -- Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Mon, 14 Mar 2005, Steven Rostedt wrote: On Mon, 14 Mar 2005, Steven Rostedt wrote: I just downloaded -40 and applied my patch, compiled it with PREEMPT_DESKTOP and data=ordered, ran it and everything seems OK, except I'm getting the following... BUG: Unable to handle kernel NULL pointer dereference at virtual address printing eip: c0213438 *pde = [snip] I'll see if this happens without the patch, and if so, then I'll look into this further. Well, I took out my patch and this bug didn't happen, so I guess it's may fault! OK, I'll dig into it further. Here's a new patch. All I did was move BUFFER_FNS(JournalHead,journalhead) to inside the #ifdef CONFIG_PREEMPT_RT and my oops went away !?! This really bothers me since it just declares some functions and is not used with CONFIG_PREEMPT_RT off. I have no idea what's going on. Lee, can you see if this still crashes for you. Thanks, -- Steve diff -ur linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c --- linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c 2005-03-02 02:37:49.0 -0500 +++ linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c 2005-03-14 09:46:41.0 -0500 @@ -80,6 +80,10 @@ EXPORT_SYMBOL(journal_try_to_free_buffers); EXPORT_SYMBOL(journal_force_commit); +#ifdef CONFIG_PREEMPT_RT +spinlock_t jbd_journal_head_lock = SPIN_LOCK_UNLOCKED; +#endif + static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *); /* @@ -1727,6 +1731,9 @@ jh = new_jh; new_jh = NULL; /* We consumed it */ set_buffer_jbd(bh); +#ifdef CONFIG_PREEMPT_RT + spin_lock_init(jh-b_state_lock); +#endif bh-b_private = jh; jh-b_bh = bh; get_bh(bh); @@ -1767,26 +1774,34 @@ if (jh-b_transaction == NULL jh-b_next_transaction == NULL jh-b_cp_transaction == NULL) { - J_ASSERT_BH(bh, buffer_jbd(bh)); - J_ASSERT_BH(bh, jh2bh(jh) == bh); - BUFFER_TRACE(bh, remove journal_head); - if (jh-b_frozen_data) { - printk(KERN_WARNING %s: freeing - b_frozen_data\n, - __FUNCTION__); - kfree(jh-b_frozen_data); - } - if (jh-b_committed_data) { - printk(KERN_WARNING %s: freeing - b_committed_data\n, - __FUNCTION__); - kfree(jh-b_committed_data); +#ifdef CONFIG_PREEMPT_RT + if (atomic_read(jh-b_state_wait_count)) { + BUG_ON(buffer_journalhead(bh)); + set_buffer_journalhead(bh); + } else +#endif + { + J_ASSERT_BH(bh, buffer_jbd(bh)); + J_ASSERT_BH(bh, jh2bh(jh) == bh); + BUFFER_TRACE(bh, remove journal_head); + if (jh-b_frozen_data) { + printk(KERN_WARNING %s: freeing + b_frozen_data\n, + __FUNCTION__); + kfree(jh-b_frozen_data); + } + if (jh-b_committed_data) { + printk(KERN_WARNING %s: freeing + b_committed_data\n, + __FUNCTION__); + kfree(jh-b_committed_data); + } + bh-b_private = NULL; + jh-b_bh = NULL;/* debug, really */ + clear_buffer_jbd(bh); + __brelse(bh); + journal_free_journal_head(jh); } - bh-b_private = NULL; - jh-b_bh = NULL;/* debug, really */ - clear_buffer_jbd(bh); - __brelse(bh); - journal_free_journal_head(jh); } else { BUFFER_TRACE(bh, journal_head was locked); } diff -ur linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/transaction.c linux-2.6.11-final-V0.7.40-00/fs/jbd/transaction.c --- linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/transaction.c 2005-03-02 02:37:53.0 -0500 +++
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
Hi Ingo, I've found something that is very interesting and I can't explain it. On Mon, 14 Mar 2005, Steven Rostedt wrote: On Mon, 14 Mar 2005, Steven Rostedt wrote: On Mon, 14 Mar 2005, Steven Rostedt wrote: I just downloaded -40 and applied my patch, compiled it with PREEMPT_DESKTOP and data=ordered, ran it and everything seems OK, except I'm getting the following... BUG: Unable to handle kernel NULL pointer dereference at virtual address printing eip: c0213438 *pde = [snip] All I did now was to add this patch to your -40-00 kernel: diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h --- linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h 2005-03-02 02:38:19.0 -0500 +++ linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h 2005-03-14 13:22:04.0 -0500 @@ -324,6 +324,8 @@ return bh-b_private; } +BUFFER_FNS(JournalHead,journalhead) + static inline void jbd_lock_bh_state(struct buffer_head *bh) { bit_spin_lock(BH_State, bh-b_state); And I get the following output: BUG: Unable to handle kernel NULL pointer dereference at virtual address printing eip: c0213118 *pde = Oops: [#1] Modules linked in: ipv6 af_packet tsdev mousedev evdev floppy psmouse pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc shpchp pci_hotplug ehci_hcd intel_agp agpgart uhci_hcd usbcore e100 mii ide_cd cdrom unix CPU:0 EIP:0060:[c0213118]Not tainted VLI EFLAGS: 00010286 (2.6.11-RT-V0.7.40-00) EIP is at vt_ioctl+0x18/0x1ab0 eax: ebx: 5603 ecx: 5603 edx: cee14d80 esi: c0213100 edi: cb4bd000 ebp: cc03bf18 esp: cc03be48 ds: 007b es: 007b ss: 0068 preempt: Process XFree86 (pid: 4709, threadinfo=cc03a000 task=cf0d5020) Stack: cf0d5170 cc03a000 cf0d5020 c03448ec cf0d5020 0246 cc03be7c c0117267 c03448f4 0006 0001 cc03bebc cf1b81ec ce820600 ce94a9b8 cc03bed4 c01704f1 ce94a9b8 0007 Call Trace: [c0103cdf] show_stack+0x7f/0xa0 (28) [c0103e95] show_registers+0x165/0x1d0 (56) [c0104088] die+0xc8/0x150 (64) [c01153c6] do_page_fault+0x356/0x6c4 (216) [c0103973] error_code+0x2b/0x30 (268) [c020e5fb] tty_ioctl+0x34b/0x490 (52) [c016807f] do_ioctl+0x4f/0x70 (32) [c0168282] vfs_ioctl+0x62/0x1d0 (40) [c0168451] sys_ioctl+0x61/0x90 (40) [c0102ec3] syscall_call+0x7/0xb (-8124) Code: ff ff 8d 05 28 4d 34 c0 e8 f6 60 0a 00 e9 3a ff ff ff 90 55 89 e5 57 56 53 81 ec c4 00 00 00 8b 7d 08 8b 5d 10 8b 87 7c 09 00 00 8b 30 89 34 24 8b 04 b5 e0 b7 3c c0 89 45 8c e8 a4 6a 00 00 85 I don't know why. BUFFER_FNS is just defined as: #define BUFFER_FNS(bit, name) \ static inline void set_buffer_##name(struct buffer_head *bh)\ { \ set_bit(BH_##bit, (bh)-b_state); \ } \ static inline void clear_buffer_##name(struct buffer_head *bh) \ { \ clear_bit(BH_##bit, (bh)-b_state);\ } \ static inline int buffer_##name(const struct buffer_head *bh) \ { \ return test_bit(BH_##bit, (bh)-b_state); \ } So all it does is make three function that are never used. set_buffer_journalhead(...) clear_buffer_journalhead(...) buffer_journalhead(...) Unless, some macro uses it, but I don't know why adding that line causes the bug output that I showed. If I remove that line, I don't get that output. And this is consistent. I've recompiled the kernel several times, and everytime I compile it with this added patch I get that output. And everytime without it, it runs fine. Oh, please note that this only happens with PREEMPT_DESKTOP, and not with PREEMPT_RT. I really think this is a symptom of something else and not the cause of the bug. What do you think? -- Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Fri, 11 Mar 2005, Lee Revell wrote: > On Fri, 2005-03-11 at 15:46 -0500, Lee Revell wrote: > > On Fri, 2005-03-11 at 15:39 -0500, Steven Rostedt wrote: > > > I'm leaving now for the weekend, so I won't be able to respond to anyone > > > till Monday. I'll also run this patch over the weekend while compiling > > > the kernel in an endless loop > > > > I'll test this with PREEMPT_DESKTOP and data=ordered also and see how it > > goes. > > Does not seem to work at all with the above settings. It seemed OK > until I started X. Then every time I launched an xterm it would > disappear as soon as I typed anything. I could not switch consoles to > see the Oops. > Hi Lee, I just compiled PREEMPT_DESKTOP and mounted root (only disk filesystem on my test machine) as data=ordered. I had no problem getting to X, starting an xterm and running a make. Actually it was a gnome-term since I didn't have xterm. But then I su to root, apt-get xterm, ran xterm, and did a make there with no problems. Did you patch this against 39-02 or -40-X? I haven't had time to upgrade to 40 yet. Maybe, I'll work on that today. Maybe your crash has something else to do with. My test machine has a serial hookup that I can look at even if the term goes down. I'll see if 40 gives me problems. -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Fri, 11 Mar 2005, Lee Revell wrote: On Fri, 2005-03-11 at 15:46 -0500, Lee Revell wrote: On Fri, 2005-03-11 at 15:39 -0500, Steven Rostedt wrote: I'm leaving now for the weekend, so I won't be able to respond to anyone till Monday. I'll also run this patch over the weekend while compiling the kernel in an endless loop I'll test this with PREEMPT_DESKTOP and data=ordered also and see how it goes. Does not seem to work at all with the above settings. It seemed OK until I started X. Then every time I launched an xterm it would disappear as soon as I typed anything. I could not switch consoles to see the Oops. Hi Lee, I just compiled PREEMPT_DESKTOP and mounted root (only disk filesystem on my test machine) as data=ordered. I had no problem getting to X, starting an xterm and running a make. Actually it was a gnome-term since I didn't have xterm. But then I su to root, apt-get xterm, ran xterm, and did a make there with no problems. Did you patch this against 39-02 or -40-X? I haven't had time to upgrade to 40 yet. Maybe, I'll work on that today. Maybe your crash has something else to do with. My test machine has a serial hookup that I can look at even if the term goes down. I'll see if 40 gives me problems. -- Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Fri, 2005-03-11 at 15:46 -0500, Lee Revell wrote: > On Fri, 2005-03-11 at 15:39 -0500, Steven Rostedt wrote: > > I'm leaving now for the weekend, so I won't be able to respond to anyone > > till Monday. I'll also run this patch over the weekend while compiling > > the kernel in an endless loop > > I'll test this with PREEMPT_DESKTOP and data=ordered also and see how it > goes. Does not seem to work at all with the above settings. It seemed OK until I started X. Then every time I launched an xterm it would disappear as soon as I typed anything. I could not switch consoles to see the Oops. Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Fri, 2005-03-11 at 15:39 -0500, Steven Rostedt wrote: > I'm leaving now for the weekend, so I won't be able to respond to anyone > till Monday. I'll also run this patch over the weekend while compiling > the kernel in an endless loop I'll test this with PREEMPT_DESKTOP and data=ordered also and see how it goes. Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Fri, 11 Mar 2005, Ingo Molnar wrote: > > * Steven Rostedt <[EMAIL PROTECTED]> wrote: > > > Here's the patch. It's probably more of an overkill wrt buffer heads, > > but it seems to be the easiest solution. > > isnt there some ext3-private journal structure (journal-bh) linked off > the bh? If the lock is in that structure then the overhead would only > affect ext3. > OK, here it is (Yuck!). I was able to use the journal head (private data of the buffer head) for the state lock. I just decided to have the journal head lock be one global lock for all buffer heads, since it is used to add and remove the journal private data from the buffer head, and thus can't be stored in the journal private data. The state lock is now in the journal private data but we must be careful not to free this data before we unlock it. So here's what I've done. static inline void jbd_lock_bh_state(struct buffer_head *bh) { BUG_ON(!bh->b_private); atomic_inc((bh)->b_state_wait_count); spin_lock((bh)->b_state_lock); } I have a counter of those that want/have the lock, and this informs the journal_remove_journal_head that it should not free the jh. static void __journal_remove_journal_head(struct buffer_head *bh) { struct journal_head *jh = bh2jh(bh); J_ASSERT_JH(jh, jh->b_jcount >= 0); get_bh(bh); if (jh->b_jcount == 0) { if (jh->b_transaction == NULL && jh->b_next_transaction == NULL && jh->b_cp_transaction == NULL) { #ifdef CONFIG_PREEMPT_RT if (atomic_read(>b_state_wait_count)) { BUG_ON(buffer_journalhead(bh)); set_buffer_journalhead(bh); } else #endif { Here the state_wait_count is checked, and if > 0, then using the bit that was originally used for locking the journal head, is set to inform the unlocking of the state lock that it needs to be removed. static inline void jbd_unlock_bh_state(struct buffer_head *bh) { int rmjh = 0; BUG_ON(!atomic_read((bh)->b_state_wait_count)); atomic_dec((bh)->b_state_wait_count); if (buffer_journalhead(bh)) { clear_buffer_journalhead(bh); rmjh = 1; } spin_unlock((bh)->b_state_lock); if (rmjh) journal_remove_journal_head(bh); } Now in the unlocking of the state lock, the journal head bit is tested and if it is set, then the remove journal head function is called. Maybe this isn't the cleanest solution, but it keeps the overhead on the buffer heads down, so it's prefered over my last patch. Once again, this has only been tested with full preemption enabled, but I tried to keep it from changing the way non PREEMPT_RT works. I'm leaving now for the weekend, so I won't be able to respond to anyone till Monday. I'll also run this patch over the weekend while compiling the kernel in an endless loop while [ 1 ]; do make clean; make done With kjournal running FIFO, to see if it survives. Cheers, -- Steve diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/fs/jbd/journal.c linux-2.6.11-rc4-V0.7.39-02/fs/jbd/journal.c --- linux-2.6.11-rc4-V0.7.39-02.orig/fs/jbd/journal.c 2005-02-12 22:05:29.0 -0500 +++ linux-2.6.11-rc4-V0.7.39-02/fs/jbd/journal.c2005-03-11 14:54:21.0 -0500 @@ -80,6 +80,10 @@ EXPORT_SYMBOL(journal_try_to_free_buffers); EXPORT_SYMBOL(journal_force_commit); +#ifdef CONFIG_PREEMPT_RT +spinlock_t jbd_journal_head_lock = SPIN_LOCK_UNLOCKED; +#endif + static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *); /* @@ -1727,6 +1731,9 @@ jh = new_jh; new_jh = NULL; /* We consumed it */ set_buffer_jbd(bh); +#ifdef CONFIG_PREEMPT_RT + spin_lock_init(>b_state_lock); +#endif bh->b_private = jh; jh->b_bh = bh; get_bh(bh); @@ -1767,26 +1774,34 @@ if (jh->b_transaction == NULL && jh->b_next_transaction == NULL && jh->b_cp_transaction == NULL) { - J_ASSERT_BH(bh, buffer_jbd(bh)); - J_ASSERT_BH(bh, jh2bh(jh) == bh); - BUFFER_TRACE(bh, "remove journal_head"); - if (jh->b_frozen_data) { - printk(KERN_WARNING "%s: freeing " - "b_frozen_data\n", - __FUNCTION__); - kfree(jh->b_frozen_data); - } - if (jh->b_committed_data) { - printk(KERN_WARNING "%s: freeing " - "b_committed_data\n", -
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Fri, 11 Mar 2005, Ingo Molnar wrote: > > * Steven Rostedt <[EMAIL PROTECTED]> wrote: > > > Here's the patch. It's probably more of an overkill wrt buffer heads, > > but it seems to be the easiest solution. > > isnt there some ext3-private journal structure (journal-bh) linked off > the bh? If the lock is in that structure then the overhead would only > affect ext3. > Yes, there is, and I was trying to use it before you mentioned trying this (which works for now). The locks are called before and after the private pointer of the bh is set and removed. The journal_head lock, I was going to make global, and the state lock would go on this structure. I would have to do some hack in journal.c to flag the state lock when it was removing the journal head so that it didn't do the remove there, but did it after the state lock was released. But this still had a few crashes. The journal_head lock was used to lock when to add or remove the private data from the bh, so you can see why this structure can't be used for this purpose. But the state lock seemed to be ok for this. I need to know more about the journaling system. I'll look into doing this too, but this fix should due for now. -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
* Steven Rostedt <[EMAIL PROTECTED]> wrote: > Here's the patch. It's probably more of an overkill wrt buffer heads, > but it seems to be the easiest solution. isnt there some ext3-private journal structure (journal-bh) linked off the bh? If the lock is in that structure then the overhead would only affect ext3. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
Steven Rostedt wrote: +#ifdef CONFIG_PREEMPT_RT +#define PICK_SPIN_LOCK(otype,bit,name) spin_##otype(>b_##name##_lock) +#else +#define PICK_SPIN_LOCK(otype,bit,name) bit_spin_##otype(bit,bh->b_state); +#endif + Oops, extra semicolon on the non RT side. I'll try again. -- Steve Haven't tried it yet, but does apply cleanly to 2.6.11-final-V0.7.40-00. kr - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
> > +#ifdef CONFIG_PREEMPT_RT > +#define PICK_SPIN_LOCK(otype,bit,name) spin_##otype(>b_##name##_lock) > +#else > +#define PICK_SPIN_LOCK(otype,bit,name) bit_spin_##otype(bit,bh->b_state); > +#endif > + Oops, extra semicolon on the non RT side. I'll try again. -- Steve diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/fs/buffer.c linux-2.6.11-rc4-V0.7.39-02/fs/buffer.c --- linux-2.6.11-rc4-V0.7.39-02.orig/fs/buffer.c2005-02-12 22:06:54.0 -0500 +++ linux-2.6.11-rc4-V0.7.39-02/fs/buffer.c 2005-03-11 07:48:04.0 -0500 @@ -3002,6 +3002,10 @@ preempt_disable(); __get_cpu_var(bh_accounting).nr++; recalc_bh_state(); +#ifdef CONFIG_PREEMPT_RT + spin_lock_init(>b_jstate_lock); + spin_lock_init(>b_jhead_lock); +#endif preempt_enable(); } return ret; diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/buffer_head.h linux-2.6.11-rc4-V0.7.39-02/include/linux/buffer_head.h --- linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/buffer_head.h 2005-02-12 22:05:10.0 -0500 +++ linux-2.6.11-rc4-V0.7.39-02/include/linux/buffer_head.h 2005-03-11 07:59:44.0 -0500 @@ -62,6 +62,14 @@ bh_end_io_t *b_end_io; /* I/O completion */ void *b_private;/* reserved for b_end_io */ struct list_head b_assoc_buffers; /* associated with another mapping */ + +#ifdef CONFIG_PREEMPT_RT + /* +* Fixme: This should be in the journal code. +*/ + spinlock_t b_jstate_lock; /* lock for journal state. */ + spinlock_t b_jhead_lock;/* lock for journal head. */ +#endif }; /* diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/jbd.h linux-2.6.11-rc4-V0.7.39-02/include/linux/jbd.h --- linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/jbd.h2005-02-12 22:07:18.0 -0500 +++ linux-2.6.11-rc4-V0.7.39-02/include/linux/jbd.h 2005-03-11 07:57:47.0 -0500 @@ -314,6 +314,12 @@ TAS_BUFFER_FNS(RevokeValid, revokevalid) BUFFER_FNS(Freed, freed) +#ifdef CONFIG_PREEMPT_RT +#define PICK_SPIN_LOCK(otype,bit,name) spin_##otype(>b_##name##_lock) +#else +#define PICK_SPIN_LOCK(otype,bit,name) bit_spin_##otype(bit,bh->b_state) +#endif + static inline struct buffer_head *jh2bh(struct journal_head *jh) { return jh->b_bh; @@ -326,33 +332,34 @@ static inline void jbd_lock_bh_state(struct buffer_head *bh) { - bit_spin_lock(BH_State, >b_state); + PICK_SPIN_LOCK(lock,BH_State,jstate); } static inline int jbd_trylock_bh_state(struct buffer_head *bh) { - return bit_spin_trylock(BH_State, >b_state); + return PICK_SPIN_LOCK(trylock,BH_State,jstate); } static inline int jbd_is_locked_bh_state(struct buffer_head *bh) { - return bit_spin_is_locked(BH_State, >b_state); + return PICK_SPIN_LOCK(is_locked,BH_State,jstate); } static inline void jbd_unlock_bh_state(struct buffer_head *bh) { - bit_spin_unlock(BH_State, >b_state); + PICK_SPIN_LOCK(unlock,BH_State,jstate); } static inline void jbd_lock_bh_journal_head(struct buffer_head *bh) { - bit_spin_lock(BH_JournalHead, >b_state); + PICK_SPIN_LOCK(lock,BH_JournalHead,jhead); } static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh) { - bit_spin_unlock(BH_JournalHead, >b_state); + PICK_SPIN_LOCK(unlock,BH_JournalHead,jhead); } +#undef PICK_SPIN_LOCK struct jbd_revoke_table_s; diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/spinlock.h linux-2.6.11-rc4-V0.7.39-02/include/linux/spinlock.h --- linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/spinlock.h 2005-03-10 08:47:25.0 -0500 +++ linux-2.6.11-rc4-V0.7.39-02/include/linux/spinlock.h2005-03-11 09:06:26.254317378 -0500 @@ -774,6 +774,10 @@ })) +#ifndef CONFIG_PREEMPT_RT + +/* These are just plain evil! */ + /* * bit-based spin_lock() * @@ -789,10 +793,15 @@ * busywait with less bus contention for a good time to * attempt to acquire the lock bit. */ -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) - while (test_and_set_bit(bitnum, addr)) - while (test_bit(bitnum, addr)) + preempt_disable(); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) + while (test_and_set_bit(bitnum, addr)) { + while (test_bit(bitnum, addr)) { + preempt_enable(); cpu_relax(); + preempt_disable(); + } + } #endif __acquire(bitlock); } @@ -802,9 +811,12 @@ */ static inline int bit_spin_trylock(int bitnum, unsigned long *addr) { -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) - if (test_and_set_bit(bitnum, addr)) + preempt_disable(); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) + if
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
Here's the patch. It's probably more of an overkill wrt buffer heads, but it seems to be the easiest solution. I also put back some of the changes you made for the bit_spin_locks, so that they act the same as the vanilla kernel if PREEMPT_RT is not defined. Now I only tested this with PREEMPT_RT configured so I hope others can test it with it off. If I get time I'll do that as well. I patched this against linux-2.6.11-rc4-V0.7.39-02, so I hope it goes easily into .40. Lee, Could you see what the latencies are with kjournal with this patch applied. Thanks, -- Steve diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/fs/buffer.c linux-2.6.11-rc4-V0.7.39-02/fs/buffer.c --- linux-2.6.11-rc4-V0.7.39-02.orig/fs/buffer.c2005-02-12 22:06:54.0 -0500 +++ linux-2.6.11-rc4-V0.7.39-02/fs/buffer.c 2005-03-11 07:48:04.0 -0500 @@ -3002,6 +3002,10 @@ preempt_disable(); __get_cpu_var(bh_accounting).nr++; recalc_bh_state(); +#ifdef CONFIG_PREEMPT_RT + spin_lock_init(>b_jstate_lock); + spin_lock_init(>b_jhead_lock); +#endif preempt_enable(); } return ret; diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/buffer_head.h linux-2.6.11-rc4-V0.7.39-02/include/linux/buffer_head.h --- linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/buffer_head.h 2005-02-12 22:05:10.0 -0500 +++ linux-2.6.11-rc4-V0.7.39-02/include/linux/buffer_head.h 2005-03-11 07:59:44.0 -0500 @@ -62,6 +62,14 @@ bh_end_io_t *b_end_io; /* I/O completion */ void *b_private;/* reserved for b_end_io */ struct list_head b_assoc_buffers; /* associated with another mapping */ + +#ifdef CONFIG_PREEMPT_RT + /* +* Fixme: This should be in the journal code. +*/ + spinlock_t b_jstate_lock; /* lock for journal state. */ + spinlock_t b_jhead_lock;/* lock for journal head. */ +#endif }; /* diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/jbd.h linux-2.6.11-rc4-V0.7.39-02/include/linux/jbd.h --- linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/jbd.h2005-02-12 22:07:18.0 -0500 +++ linux-2.6.11-rc4-V0.7.39-02/include/linux/jbd.h 2005-03-11 07:57:47.0 -0500 @@ -314,6 +314,12 @@ TAS_BUFFER_FNS(RevokeValid, revokevalid) BUFFER_FNS(Freed, freed) +#ifdef CONFIG_PREEMPT_RT +#define PICK_SPIN_LOCK(otype,bit,name) spin_##otype(>b_##name##_lock) +#else +#define PICK_SPIN_LOCK(otype,bit,name) bit_spin_##otype(bit,bh->b_state); +#endif + static inline struct buffer_head *jh2bh(struct journal_head *jh) { return jh->b_bh; @@ -326,33 +332,34 @@ static inline void jbd_lock_bh_state(struct buffer_head *bh) { - bit_spin_lock(BH_State, >b_state); + PICK_SPIN_LOCK(lock,BH_State,jstate); } static inline int jbd_trylock_bh_state(struct buffer_head *bh) { - return bit_spin_trylock(BH_State, >b_state); + return PICK_SPIN_LOCK(trylock,BH_State,jstate); } static inline int jbd_is_locked_bh_state(struct buffer_head *bh) { - return bit_spin_is_locked(BH_State, >b_state); + return PICK_SPIN_LOCK(is_locked,BH_State,jstate); } static inline void jbd_unlock_bh_state(struct buffer_head *bh) { - bit_spin_unlock(BH_State, >b_state); + PICK_SPIN_LOCK(unlock,BH_State,jstate); } static inline void jbd_lock_bh_journal_head(struct buffer_head *bh) { - bit_spin_lock(BH_JournalHead, >b_state); + PICK_SPIN_LOCK(lock,BH_JournalHead,jhead); } static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh) { - bit_spin_unlock(BH_JournalHead, >b_state); + PICK_SPIN_LOCK(unlock,BH_JournalHead,jhead); } +#undef PICK_SPIN_LOCK struct jbd_revoke_table_s; diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/spinlock.h linux-2.6.11-rc4-V0.7.39-02/include/linux/spinlock.h --- linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/spinlock.h 2005-03-10 08:47:25.0 -0500 +++ linux-2.6.11-rc4-V0.7.39-02/include/linux/spinlock.h2005-03-11 09:06:26.254317378 -0500 @@ -774,6 +774,10 @@ })) +#ifndef CONFIG_PREEMPT_RT + +/* These are just plain evil! */ + /* * bit-based spin_lock() * @@ -789,10 +793,15 @@ * busywait with less bus contention for a good time to * attempt to acquire the lock bit. */ -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) - while (test_and_set_bit(bitnum, addr)) - while (test_bit(bitnum, addr)) + preempt_disable(); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) + while (test_and_set_bit(bitnum, addr)) { + while (test_bit(bitnum, addr)) { + preempt_enable(); cpu_relax(); + preempt_disable(); + } + } #endif __acquire(bitlock); } @@ -802,9 +811,12 @@ */
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Fri, 11 Mar 2005, Andrew Morton wrote: > Steven Rostedt <[EMAIL PROTECTED]> wrote: > > No, I'll try that now. I just didn't want to modify the buffer head struct > > just for journaling. But if it is the quickest and easiest fix, then I'll > > submit it and we can change it later. > > You'll need two spinlocks. jbd_lock_bh_state() and > jbd_lock_bh_journal_head(). > Yep, already did that. Now I need to reboot the new kernel and give it a try. -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
Steven Rostedt <[EMAIL PROTECTED]> wrote: > > > did you try the canonical way of putting a spinlock into every > > buffer_head? > > > > No, I'll try that now. I just didn't want to modify the buffer head struct > just for journaling. But if it is the quickest and easiest fix, then I'll > submit it and we can change it later. You'll need two spinlocks. jbd_lock_bh_state() and jbd_lock_bh_journal_head(). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
* Steven Rostedt <[EMAIL PROTECTED]> wrote: > > > Doing a quick search on the kernel, it looks like only kjournald uses > > > the bit_spin_locks. I'll start converting them to spinlocks. The use > > > seems to be more of a hack, since it is using bits in the state field > > > for locking, and these bits aren't used for anything else. > > > > yeah. bit-spinlocks are really a hack. > > And this really sucks too! I've been looking into a fix for this and > have yet to get something stable. As you probably already know, you > can't just put back the preempt_disable since your spinlocks now > schedule. So I've been looking into finding a way to get rid of these. > > I've tried making two global spinlocks, one for the state bit and one > for the journal head bit use. But this deadlocks with j_state_lock. > The journal head lock seems to be ok to be global, but the state lock > needs to have one for every buffer head. I'm now hacking away to do > this without touching the actual buffer head. But I'm not sure what > some of the side effects this is having. I'll keep you posted when I > get something working. I'm now having a crash course in how kjournal > and friends work. did you try the canonical way of putting a spinlock into every buffer_head? Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Fri, 11 Mar 2005, Ingo Molnar wrote: > > * Steven Rostedt <[EMAIL PROTECTED]> wrote: > > > > The short term fix is probably to put back the preempt_disables, the long > > > term is to get rid of these stupid bit_spin_lock busy loops. > > > > Doing a quick search on the kernel, it looks like only kjournald uses > > the bit_spin_locks. I'll start converting them to spinlocks. The use > > seems to be more of a hack, since it is using bits in the state field > > for locking, and these bits aren't used for anything else. > > yeah. bit-spinlocks are really a hack. > > Ingo > And this really sucks too! I've been looking into a fix for this and have yet to get something stable. As you probably already know, you can't just put back the preempt_disable since your spinlocks now schedule. So I've been looking into finding a way to get rid of these. I've tried making two global spinlocks, one for the state bit and one for the journal head bit use. But this deadlocks with j_state_lock. The journal head lock seems to be ok to be global, but the state lock needs to have one for every buffer head. I'm now hacking away to do this without touching the actual buffer head. But I'm not sure what some of the side effects this is having. I'll keep you posted when I get something working. I'm now having a crash course in how kjournal and friends work. -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
* Steven Rostedt <[EMAIL PROTECTED]> wrote: > > The short term fix is probably to put back the preempt_disables, the long > > term is to get rid of these stupid bit_spin_lock busy loops. > > Doing a quick search on the kernel, it looks like only kjournald uses > the bit_spin_locks. I'll start converting them to spinlocks. The use > seems to be more of a hack, since it is using bits in the state field > for locking, and these bits aren't used for anything else. yeah. bit-spinlocks are really a hack. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
* Steven Rostedt [EMAIL PROTECTED] wrote: The short term fix is probably to put back the preempt_disables, the long term is to get rid of these stupid bit_spin_lock busy loops. Doing a quick search on the kernel, it looks like only kjournald uses the bit_spin_locks. I'll start converting them to spinlocks. The use seems to be more of a hack, since it is using bits in the state field for locking, and these bits aren't used for anything else. yeah. bit-spinlocks are really a hack. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Fri, 11 Mar 2005, Ingo Molnar wrote: * Steven Rostedt [EMAIL PROTECTED] wrote: The short term fix is probably to put back the preempt_disables, the long term is to get rid of these stupid bit_spin_lock busy loops. Doing a quick search on the kernel, it looks like only kjournald uses the bit_spin_locks. I'll start converting them to spinlocks. The use seems to be more of a hack, since it is using bits in the state field for locking, and these bits aren't used for anything else. yeah. bit-spinlocks are really a hack. Ingo And this really sucks too! I've been looking into a fix for this and have yet to get something stable. As you probably already know, you can't just put back the preempt_disable since your spinlocks now schedule. So I've been looking into finding a way to get rid of these. I've tried making two global spinlocks, one for the state bit and one for the journal head bit use. But this deadlocks with j_state_lock. The journal head lock seems to be ok to be global, but the state lock needs to have one for every buffer head. I'm now hacking away to do this without touching the actual buffer head. But I'm not sure what some of the side effects this is having. I'll keep you posted when I get something working. I'm now having a crash course in how kjournal and friends work. -- Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
* Steven Rostedt [EMAIL PROTECTED] wrote: Doing a quick search on the kernel, it looks like only kjournald uses the bit_spin_locks. I'll start converting them to spinlocks. The use seems to be more of a hack, since it is using bits in the state field for locking, and these bits aren't used for anything else. yeah. bit-spinlocks are really a hack. And this really sucks too! I've been looking into a fix for this and have yet to get something stable. As you probably already know, you can't just put back the preempt_disable since your spinlocks now schedule. So I've been looking into finding a way to get rid of these. I've tried making two global spinlocks, one for the state bit and one for the journal head bit use. But this deadlocks with j_state_lock. The journal head lock seems to be ok to be global, but the state lock needs to have one for every buffer head. I'm now hacking away to do this without touching the actual buffer head. But I'm not sure what some of the side effects this is having. I'll keep you posted when I get something working. I'm now having a crash course in how kjournal and friends work. did you try the canonical way of putting a spinlock into every buffer_head? Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
Steven Rostedt [EMAIL PROTECTED] wrote: did you try the canonical way of putting a spinlock into every buffer_head? No, I'll try that now. I just didn't want to modify the buffer head struct just for journaling. But if it is the quickest and easiest fix, then I'll submit it and we can change it later. You'll need two spinlocks. jbd_lock_bh_state() and jbd_lock_bh_journal_head(). - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Fri, 11 Mar 2005, Andrew Morton wrote: Steven Rostedt [EMAIL PROTECTED] wrote: No, I'll try that now. I just didn't want to modify the buffer head struct just for journaling. But if it is the quickest and easiest fix, then I'll submit it and we can change it later. You'll need two spinlocks. jbd_lock_bh_state() and jbd_lock_bh_journal_head(). Yep, already did that. Now I need to reboot the new kernel and give it a try. -- Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
Here's the patch. It's probably more of an overkill wrt buffer heads, but it seems to be the easiest solution. I also put back some of the changes you made for the bit_spin_locks, so that they act the same as the vanilla kernel if PREEMPT_RT is not defined. Now I only tested this with PREEMPT_RT configured so I hope others can test it with it off. If I get time I'll do that as well. I patched this against linux-2.6.11-rc4-V0.7.39-02, so I hope it goes easily into .40. Lee, Could you see what the latencies are with kjournal with this patch applied. Thanks, -- Steve diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/fs/buffer.c linux-2.6.11-rc4-V0.7.39-02/fs/buffer.c --- linux-2.6.11-rc4-V0.7.39-02.orig/fs/buffer.c2005-02-12 22:06:54.0 -0500 +++ linux-2.6.11-rc4-V0.7.39-02/fs/buffer.c 2005-03-11 07:48:04.0 -0500 @@ -3002,6 +3002,10 @@ preempt_disable(); __get_cpu_var(bh_accounting).nr++; recalc_bh_state(); +#ifdef CONFIG_PREEMPT_RT + spin_lock_init(ret-b_jstate_lock); + spin_lock_init(ret-b_jhead_lock); +#endif preempt_enable(); } return ret; diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/buffer_head.h linux-2.6.11-rc4-V0.7.39-02/include/linux/buffer_head.h --- linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/buffer_head.h 2005-02-12 22:05:10.0 -0500 +++ linux-2.6.11-rc4-V0.7.39-02/include/linux/buffer_head.h 2005-03-11 07:59:44.0 -0500 @@ -62,6 +62,14 @@ bh_end_io_t *b_end_io; /* I/O completion */ void *b_private;/* reserved for b_end_io */ struct list_head b_assoc_buffers; /* associated with another mapping */ + +#ifdef CONFIG_PREEMPT_RT + /* +* Fixme: This should be in the journal code. +*/ + spinlock_t b_jstate_lock; /* lock for journal state. */ + spinlock_t b_jhead_lock;/* lock for journal head. */ +#endif }; /* diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/jbd.h linux-2.6.11-rc4-V0.7.39-02/include/linux/jbd.h --- linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/jbd.h2005-02-12 22:07:18.0 -0500 +++ linux-2.6.11-rc4-V0.7.39-02/include/linux/jbd.h 2005-03-11 07:57:47.0 -0500 @@ -314,6 +314,12 @@ TAS_BUFFER_FNS(RevokeValid, revokevalid) BUFFER_FNS(Freed, freed) +#ifdef CONFIG_PREEMPT_RT +#define PICK_SPIN_LOCK(otype,bit,name) spin_##otype(bh-b_##name##_lock) +#else +#define PICK_SPIN_LOCK(otype,bit,name) bit_spin_##otype(bit,bh-b_state); +#endif + static inline struct buffer_head *jh2bh(struct journal_head *jh) { return jh-b_bh; @@ -326,33 +332,34 @@ static inline void jbd_lock_bh_state(struct buffer_head *bh) { - bit_spin_lock(BH_State, bh-b_state); + PICK_SPIN_LOCK(lock,BH_State,jstate); } static inline int jbd_trylock_bh_state(struct buffer_head *bh) { - return bit_spin_trylock(BH_State, bh-b_state); + return PICK_SPIN_LOCK(trylock,BH_State,jstate); } static inline int jbd_is_locked_bh_state(struct buffer_head *bh) { - return bit_spin_is_locked(BH_State, bh-b_state); + return PICK_SPIN_LOCK(is_locked,BH_State,jstate); } static inline void jbd_unlock_bh_state(struct buffer_head *bh) { - bit_spin_unlock(BH_State, bh-b_state); + PICK_SPIN_LOCK(unlock,BH_State,jstate); } static inline void jbd_lock_bh_journal_head(struct buffer_head *bh) { - bit_spin_lock(BH_JournalHead, bh-b_state); + PICK_SPIN_LOCK(lock,BH_JournalHead,jhead); } static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh) { - bit_spin_unlock(BH_JournalHead, bh-b_state); + PICK_SPIN_LOCK(unlock,BH_JournalHead,jhead); } +#undef PICK_SPIN_LOCK struct jbd_revoke_table_s; diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/spinlock.h linux-2.6.11-rc4-V0.7.39-02/include/linux/spinlock.h --- linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/spinlock.h 2005-03-10 08:47:25.0 -0500 +++ linux-2.6.11-rc4-V0.7.39-02/include/linux/spinlock.h2005-03-11 09:06:26.254317378 -0500 @@ -774,6 +774,10 @@ })) +#ifndef CONFIG_PREEMPT_RT + +/* These are just plain evil! */ + /* * bit-based spin_lock() * @@ -789,10 +793,15 @@ * busywait with less bus contention for a good time to * attempt to acquire the lock bit. */ -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) - while (test_and_set_bit(bitnum, addr)) - while (test_bit(bitnum, addr)) + preempt_disable(); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) + while (test_and_set_bit(bitnum, addr)) { + while (test_bit(bitnum, addr)) { + preempt_enable(); cpu_relax(); + preempt_disable(); + } + } #endif __acquire(bitlock); } @@ -802,9
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
+#ifdef CONFIG_PREEMPT_RT +#define PICK_SPIN_LOCK(otype,bit,name) spin_##otype(bh-b_##name##_lock) +#else +#define PICK_SPIN_LOCK(otype,bit,name) bit_spin_##otype(bit,bh-b_state); +#endif + Oops, extra semicolon on the non RT side. I'll try again. -- Steve diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/fs/buffer.c linux-2.6.11-rc4-V0.7.39-02/fs/buffer.c --- linux-2.6.11-rc4-V0.7.39-02.orig/fs/buffer.c2005-02-12 22:06:54.0 -0500 +++ linux-2.6.11-rc4-V0.7.39-02/fs/buffer.c 2005-03-11 07:48:04.0 -0500 @@ -3002,6 +3002,10 @@ preempt_disable(); __get_cpu_var(bh_accounting).nr++; recalc_bh_state(); +#ifdef CONFIG_PREEMPT_RT + spin_lock_init(ret-b_jstate_lock); + spin_lock_init(ret-b_jhead_lock); +#endif preempt_enable(); } return ret; diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/buffer_head.h linux-2.6.11-rc4-V0.7.39-02/include/linux/buffer_head.h --- linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/buffer_head.h 2005-02-12 22:05:10.0 -0500 +++ linux-2.6.11-rc4-V0.7.39-02/include/linux/buffer_head.h 2005-03-11 07:59:44.0 -0500 @@ -62,6 +62,14 @@ bh_end_io_t *b_end_io; /* I/O completion */ void *b_private;/* reserved for b_end_io */ struct list_head b_assoc_buffers; /* associated with another mapping */ + +#ifdef CONFIG_PREEMPT_RT + /* +* Fixme: This should be in the journal code. +*/ + spinlock_t b_jstate_lock; /* lock for journal state. */ + spinlock_t b_jhead_lock;/* lock for journal head. */ +#endif }; /* diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/jbd.h linux-2.6.11-rc4-V0.7.39-02/include/linux/jbd.h --- linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/jbd.h2005-02-12 22:07:18.0 -0500 +++ linux-2.6.11-rc4-V0.7.39-02/include/linux/jbd.h 2005-03-11 07:57:47.0 -0500 @@ -314,6 +314,12 @@ TAS_BUFFER_FNS(RevokeValid, revokevalid) BUFFER_FNS(Freed, freed) +#ifdef CONFIG_PREEMPT_RT +#define PICK_SPIN_LOCK(otype,bit,name) spin_##otype(bh-b_##name##_lock) +#else +#define PICK_SPIN_LOCK(otype,bit,name) bit_spin_##otype(bit,bh-b_state) +#endif + static inline struct buffer_head *jh2bh(struct journal_head *jh) { return jh-b_bh; @@ -326,33 +332,34 @@ static inline void jbd_lock_bh_state(struct buffer_head *bh) { - bit_spin_lock(BH_State, bh-b_state); + PICK_SPIN_LOCK(lock,BH_State,jstate); } static inline int jbd_trylock_bh_state(struct buffer_head *bh) { - return bit_spin_trylock(BH_State, bh-b_state); + return PICK_SPIN_LOCK(trylock,BH_State,jstate); } static inline int jbd_is_locked_bh_state(struct buffer_head *bh) { - return bit_spin_is_locked(BH_State, bh-b_state); + return PICK_SPIN_LOCK(is_locked,BH_State,jstate); } static inline void jbd_unlock_bh_state(struct buffer_head *bh) { - bit_spin_unlock(BH_State, bh-b_state); + PICK_SPIN_LOCK(unlock,BH_State,jstate); } static inline void jbd_lock_bh_journal_head(struct buffer_head *bh) { - bit_spin_lock(BH_JournalHead, bh-b_state); + PICK_SPIN_LOCK(lock,BH_JournalHead,jhead); } static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh) { - bit_spin_unlock(BH_JournalHead, bh-b_state); + PICK_SPIN_LOCK(unlock,BH_JournalHead,jhead); } +#undef PICK_SPIN_LOCK struct jbd_revoke_table_s; diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/spinlock.h linux-2.6.11-rc4-V0.7.39-02/include/linux/spinlock.h --- linux-2.6.11-rc4-V0.7.39-02.orig/include/linux/spinlock.h 2005-03-10 08:47:25.0 -0500 +++ linux-2.6.11-rc4-V0.7.39-02/include/linux/spinlock.h2005-03-11 09:06:26.254317378 -0500 @@ -774,6 +774,10 @@ })) +#ifndef CONFIG_PREEMPT_RT + +/* These are just plain evil! */ + /* * bit-based spin_lock() * @@ -789,10 +793,15 @@ * busywait with less bus contention for a good time to * attempt to acquire the lock bit. */ -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) - while (test_and_set_bit(bitnum, addr)) - while (test_bit(bitnum, addr)) + preempt_disable(); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) + while (test_and_set_bit(bitnum, addr)) { + while (test_bit(bitnum, addr)) { + preempt_enable(); cpu_relax(); + preempt_disable(); + } + } #endif __acquire(bitlock); } @@ -802,9 +811,12 @@ */ static inline int bit_spin_trylock(int bitnum, unsigned long *addr) { -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) - if (test_and_set_bit(bitnum, addr)) + preempt_disable(); +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) + if
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
Steven Rostedt wrote: +#ifdef CONFIG_PREEMPT_RT +#define PICK_SPIN_LOCK(otype,bit,name) spin_##otype(bh-b_##name##_lock) +#else +#define PICK_SPIN_LOCK(otype,bit,name) bit_spin_##otype(bit,bh-b_state); +#endif + Oops, extra semicolon on the non RT side. I'll try again. -- Steve Haven't tried it yet, but does apply cleanly to 2.6.11-final-V0.7.40-00. kr - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
* Steven Rostedt [EMAIL PROTECTED] wrote: Here's the patch. It's probably more of an overkill wrt buffer heads, but it seems to be the easiest solution. isnt there some ext3-private journal structure (journal-bh) linked off the bh? If the lock is in that structure then the overhead would only affect ext3. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Fri, 11 Mar 2005, Ingo Molnar wrote: * Steven Rostedt [EMAIL PROTECTED] wrote: Here's the patch. It's probably more of an overkill wrt buffer heads, but it seems to be the easiest solution. isnt there some ext3-private journal structure (journal-bh) linked off the bh? If the lock is in that structure then the overhead would only affect ext3. Yes, there is, and I was trying to use it before you mentioned trying this (which works for now). The locks are called before and after the private pointer of the bh is set and removed. The journal_head lock, I was going to make global, and the state lock would go on this structure. I would have to do some hack in journal.c to flag the state lock when it was removing the journal head so that it didn't do the remove there, but did it after the state lock was released. But this still had a few crashes. The journal_head lock was used to lock when to add or remove the private data from the bh, so you can see why this structure can't be used for this purpose. But the state lock seemed to be ok for this. I need to know more about the journaling system. I'll look into doing this too, but this fix should due for now. -- Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Fri, 11 Mar 2005, Ingo Molnar wrote: * Steven Rostedt [EMAIL PROTECTED] wrote: Here's the patch. It's probably more of an overkill wrt buffer heads, but it seems to be the easiest solution. isnt there some ext3-private journal structure (journal-bh) linked off the bh? If the lock is in that structure then the overhead would only affect ext3. OK, here it is (Yuck!). I was able to use the journal head (private data of the buffer head) for the state lock. I just decided to have the journal head lock be one global lock for all buffer heads, since it is used to add and remove the journal private data from the buffer head, and thus can't be stored in the journal private data. The state lock is now in the journal private data but we must be careful not to free this data before we unlock it. So here's what I've done. static inline void jbd_lock_bh_state(struct buffer_head *bh) { BUG_ON(!bh-b_private); atomic_inc(bh2jh(bh)-b_state_wait_count); spin_lock(bh2jh(bh)-b_state_lock); } I have a counter of those that want/have the lock, and this informs the journal_remove_journal_head that it should not free the jh. static void __journal_remove_journal_head(struct buffer_head *bh) { struct journal_head *jh = bh2jh(bh); J_ASSERT_JH(jh, jh-b_jcount = 0); get_bh(bh); if (jh-b_jcount == 0) { if (jh-b_transaction == NULL jh-b_next_transaction == NULL jh-b_cp_transaction == NULL) { #ifdef CONFIG_PREEMPT_RT if (atomic_read(jh-b_state_wait_count)) { BUG_ON(buffer_journalhead(bh)); set_buffer_journalhead(bh); } else #endif { Here the state_wait_count is checked, and if 0, then using the bit that was originally used for locking the journal head, is set to inform the unlocking of the state lock that it needs to be removed. static inline void jbd_unlock_bh_state(struct buffer_head *bh) { int rmjh = 0; BUG_ON(!atomic_read(bh2jh(bh)-b_state_wait_count)); atomic_dec(bh2jh(bh)-b_state_wait_count); if (buffer_journalhead(bh)) { clear_buffer_journalhead(bh); rmjh = 1; } spin_unlock(bh2jh(bh)-b_state_lock); if (rmjh) journal_remove_journal_head(bh); } Now in the unlocking of the state lock, the journal head bit is tested and if it is set, then the remove journal head function is called. Maybe this isn't the cleanest solution, but it keeps the overhead on the buffer heads down, so it's prefered over my last patch. Once again, this has only been tested with full preemption enabled, but I tried to keep it from changing the way non PREEMPT_RT works. I'm leaving now for the weekend, so I won't be able to respond to anyone till Monday. I'll also run this patch over the weekend while compiling the kernel in an endless loop while [ 1 ]; do make clean; make done With kjournal running FIFO, to see if it survives. Cheers, -- Steve diff -ur linux-2.6.11-rc4-V0.7.39-02.orig/fs/jbd/journal.c linux-2.6.11-rc4-V0.7.39-02/fs/jbd/journal.c --- linux-2.6.11-rc4-V0.7.39-02.orig/fs/jbd/journal.c 2005-02-12 22:05:29.0 -0500 +++ linux-2.6.11-rc4-V0.7.39-02/fs/jbd/journal.c2005-03-11 14:54:21.0 -0500 @@ -80,6 +80,10 @@ EXPORT_SYMBOL(journal_try_to_free_buffers); EXPORT_SYMBOL(journal_force_commit); +#ifdef CONFIG_PREEMPT_RT +spinlock_t jbd_journal_head_lock = SPIN_LOCK_UNLOCKED; +#endif + static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *); /* @@ -1727,6 +1731,9 @@ jh = new_jh; new_jh = NULL; /* We consumed it */ set_buffer_jbd(bh); +#ifdef CONFIG_PREEMPT_RT + spin_lock_init(jh-b_state_lock); +#endif bh-b_private = jh; jh-b_bh = bh; get_bh(bh); @@ -1767,26 +1774,34 @@ if (jh-b_transaction == NULL jh-b_next_transaction == NULL jh-b_cp_transaction == NULL) { - J_ASSERT_BH(bh, buffer_jbd(bh)); - J_ASSERT_BH(bh, jh2bh(jh) == bh); - BUFFER_TRACE(bh, remove journal_head); - if (jh-b_frozen_data) { - printk(KERN_WARNING %s: freeing - b_frozen_data\n, - __FUNCTION__); - kfree(jh-b_frozen_data); - } - if (jh-b_committed_data) { - printk(KERN_WARNING %s: freeing - b_committed_data\n, -
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Fri, 2005-03-11 at 15:39 -0500, Steven Rostedt wrote: I'm leaving now for the weekend, so I won't be able to respond to anyone till Monday. I'll also run this patch over the weekend while compiling the kernel in an endless loop I'll test this with PREEMPT_DESKTOP and data=ordered also and see how it goes. Lee - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Fri, 2005-03-11 at 15:46 -0500, Lee Revell wrote: On Fri, 2005-03-11 at 15:39 -0500, Steven Rostedt wrote: I'm leaving now for the weekend, so I won't be able to respond to anyone till Monday. I'll also run this patch over the weekend while compiling the kernel in an endless loop I'll test this with PREEMPT_DESKTOP and data=ordered also and see how it goes. Does not seem to work at all with the above settings. It seemed OK until I started X. Then every time I launched an xterm it would disappear as soon as I typed anything. I could not switch consoles to see the Oops. Lee - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Thu, 10 Mar 2005, Steven Rostedt wrote: > The short term fix is probably to put back the preempt_disables, the long > term is to get rid of these stupid bit_spin_lock busy loops. > Doing a quick search on the kernel, it looks like only kjournald uses the bit_spin_locks. I'll start converting them to spinlocks. The use seems to be more of a hack, since it is using bits in the state field for locking, and these bits aren't used for anything else. -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
Hi Ingo, I notice a problem with the bit_spin_locks that would probably explain the kjournald latency problems. I'm working on a custom kernel based on your's and I needed to temporarily remove the scheduler_tick from update_process_times to implement some special scheduling needs. This caused kjournal to go into an infinite loop. Here's your bit_spin_lock: static inline void bit_spin_lock(int bitnum, unsigned long *addr) { /* * Assuming the lock is uncontended, this never enters * the body of the outer loop. If it is contended, then * within the inner loop a non-atomic test is used to * busywait with less bus contention for a good time to * attempt to acquire the lock bit. */ #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) while (test_and_set_bit(bitnum, addr)) while (test_bit(bitnum, addr)) cpu_relax(); #endif __acquire(bitlock); } You removed the preempt disable and added the CONFIG_PREEMPT. What happens if a lower priority process gets the bit lock and gets preempted by a higher priority process that then tries to get this lock. It spins until it's quota runs out. This is what is happening to kjournald. A lower priority process gets the bit lock and kjournald preempts it causing kjournald to spin until it's quota is up to let the other process release the lock. Now, luckly your kernel kjournald is not realtime FIFO. If it were, you would than have a deadlock, try it. I just set kjournald (using your kernel) to FIFO prio 42 (prio 58 inside the kernel), and with a non-rt task, I did a build of the kernel. After a minute or two, all processes under the priority of kjournald were starved out of the CPU, and kjournald was spinning. Make sure your kjournald has a lower prioirty than your interrupt threads. The culprit is jbd_lock_bh_state and jbd_lock_bh_journal_head which call bit_spin_lock. Example of long latency: (or deadlock) journal_refile_buffer --> spin_lock(>j_list_lock); --> journal_remove_journal_head(bh); --> jbd_lock_bh_journal_head(bh); --> bit_spin_lock(BH_JournalHead, >b_state); The short term fix is probably to put back the preempt_disables, the long term is to get rid of these stupid bit_spin_lock busy loops. -- Steve On Sat, 19 Feb 2005, Lee Revell wrote: > On Fri, 2005-02-04 at 11:03 +0100, Ingo Molnar wrote: > > http://redhat.com/~mingo/realtime-preempt/ > > > > Testing on an all SCSI 1.3Ghz Athlon XP system, I am seeing very long > latencies in the journalling code with 2.6.11-rc4-RT-V0.7.39-02. > > preemption latency trace v1.1.4 on 2.6.11-rc4-RT-V0.7.39-02 > > latency: 713 µs, #3455/3455, CPU#0 | (M:preempt VP:0, KP:1, SP:1 HP:1 #P:1) > - > | task: ksoftirqd/0-2 (uid:0 nice:-10 policy:0 rt_prio:0) > - > > _--=> CPU# > / _-=> irqs-off >| / _=> need-resched >|| / _---=> hardirq/softirq >||| / _--=> preempt-depth > / >| delay >cmd pid | time | caller > \ /| \ | / > kjournal-2478 0dn.40µs!: <756f6a6b> (<6c616e72>) > kjournal-2478 0dn.40µs : __trace_start_sched_wakeup (try_to_wake_up) > kjournal-2478 0dn.30µs : preempt_schedule (try_to_wake_up) > kjournal-2478 0dn.30µs : try_to_wake_up <<...>-2> (69 73): > kjournal-2478 0dn.20µs : preempt_schedule (try_to_wake_up) > kjournal-2478 0dn.20µs : wake_up_process (do_softirq) > kjournal-2478 0dn.11µs < (1) > > The repeating pattern is 8 of these: > > kjournal-2478 0.n.11µs : inverted_lock (journal_commit_transaction) > kjournal-2478 0.n.11µs : __journal_unfile_buffer > (journal_commit_transaction) > kjournal-2478 0.n.11µs : journal_remove_journal_head > (journal_commit_transaction) > kjournal-2478 0.n.11µs : __journal_remove_journal_head > (journal_remove_journal_head) > kjournal-2478 0.n.11µs : __brelse (__journal_remove_journal_head) > kjournal-2478 0.n.11µs : journal_free_journal_head > (journal_remove_journal_head) > kjournal-2478 0.n.12µs : kmem_cache_free (journal_free_journal_head) > > and one of these: > > kjournal-2478 0dn.19µs : cache_flusharray (kmem_cache_free) > kjournal-2478 0dn.29µs : free_block (cache_flusharray) > kjournal-2478 0dn.1 11µs : preempt_schedule (cache_flusharray) > kjournal-2478 0dn.1 11µs : memmove (cache_flusharray) > kjournal-2478 0dn.1 11µs : memcpy (memmove) > > etc. Finally: > > kjournal-2478 0dn.1 704µs : cache_flusharray (kmem_cache_free) > kjournal-2478 0dn.2 704µs+: free_block (cache_flusharray) > kjournal-2478 0dn.1 707µs : preempt_schedule (cache_flusharray) > kjournal-2478 0dn.1 707µs : memmove (cache_flusharray) >
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
Hi Ingo, I notice a problem with the bit_spin_locks that would probably explain the kjournald latency problems. I'm working on a custom kernel based on your's and I needed to temporarily remove the scheduler_tick from update_process_times to implement some special scheduling needs. This caused kjournal to go into an infinite loop. Here's your bit_spin_lock: static inline void bit_spin_lock(int bitnum, unsigned long *addr) { /* * Assuming the lock is uncontended, this never enters * the body of the outer loop. If it is contended, then * within the inner loop a non-atomic test is used to * busywait with less bus contention for a good time to * attempt to acquire the lock bit. */ #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) while (test_and_set_bit(bitnum, addr)) while (test_bit(bitnum, addr)) cpu_relax(); #endif __acquire(bitlock); } You removed the preempt disable and added the CONFIG_PREEMPT. What happens if a lower priority process gets the bit lock and gets preempted by a higher priority process that then tries to get this lock. It spins until it's quota runs out. This is what is happening to kjournald. A lower priority process gets the bit lock and kjournald preempts it causing kjournald to spin until it's quota is up to let the other process release the lock. Now, luckly your kernel kjournald is not realtime FIFO. If it were, you would than have a deadlock, try it. I just set kjournald (using your kernel) to FIFO prio 42 (prio 58 inside the kernel), and with a non-rt task, I did a build of the kernel. After a minute or two, all processes under the priority of kjournald were starved out of the CPU, and kjournald was spinning. Make sure your kjournald has a lower prioirty than your interrupt threads. The culprit is jbd_lock_bh_state and jbd_lock_bh_journal_head which call bit_spin_lock. Example of long latency: (or deadlock) journal_refile_buffer -- spin_lock(journal-j_list_lock); -- journal_remove_journal_head(bh); -- jbd_lock_bh_journal_head(bh); -- bit_spin_lock(BH_JournalHead, bh-b_state); The short term fix is probably to put back the preempt_disables, the long term is to get rid of these stupid bit_spin_lock busy loops. -- Steve On Sat, 19 Feb 2005, Lee Revell wrote: On Fri, 2005-02-04 at 11:03 +0100, Ingo Molnar wrote: http://redhat.com/~mingo/realtime-preempt/ Testing on an all SCSI 1.3Ghz Athlon XP system, I am seeing very long latencies in the journalling code with 2.6.11-rc4-RT-V0.7.39-02. preemption latency trace v1.1.4 on 2.6.11-rc4-RT-V0.7.39-02 latency: 713 µs, #3455/3455, CPU#0 | (M:preempt VP:0, KP:1, SP:1 HP:1 #P:1) - | task: ksoftirqd/0-2 (uid:0 nice:-10 policy:0 rt_prio:0) - _--= CPU# / _-= irqs-off | / _= need-resched || / _---= hardirq/softirq ||| / _--= preempt-depth / | delay cmd pid | time | caller \ /| \ | / kjournal-2478 0dn.40µs!: 756f6a6b (6c616e72) kjournal-2478 0dn.40µs : __trace_start_sched_wakeup (try_to_wake_up) kjournal-2478 0dn.30µs : preempt_schedule (try_to_wake_up) kjournal-2478 0dn.30µs : try_to_wake_up ...-2 (69 73): kjournal-2478 0dn.20µs : preempt_schedule (try_to_wake_up) kjournal-2478 0dn.20µs : wake_up_process (do_softirq) kjournal-2478 0dn.11µs (1) The repeating pattern is 8 of these: kjournal-2478 0.n.11µs : inverted_lock (journal_commit_transaction) kjournal-2478 0.n.11µs : __journal_unfile_buffer (journal_commit_transaction) kjournal-2478 0.n.11µs : journal_remove_journal_head (journal_commit_transaction) kjournal-2478 0.n.11µs : __journal_remove_journal_head (journal_remove_journal_head) kjournal-2478 0.n.11µs : __brelse (__journal_remove_journal_head) kjournal-2478 0.n.11µs : journal_free_journal_head (journal_remove_journal_head) kjournal-2478 0.n.12µs : kmem_cache_free (journal_free_journal_head) and one of these: kjournal-2478 0dn.19µs : cache_flusharray (kmem_cache_free) kjournal-2478 0dn.29µs : free_block (cache_flusharray) kjournal-2478 0dn.1 11µs : preempt_schedule (cache_flusharray) kjournal-2478 0dn.1 11µs : memmove (cache_flusharray) kjournal-2478 0dn.1 11µs : memcpy (memmove) etc. Finally: kjournal-2478 0dn.1 704µs : cache_flusharray (kmem_cache_free) kjournal-2478 0dn.2 704µs+: free_block (cache_flusharray) kjournal-2478 0dn.1 707µs : preempt_schedule (cache_flusharray) kjournal-2478 0dn.1 707µs : memmove (cache_flusharray) kjournal-2478 0dn.1 707µs : memcpy (memmove) kjournal-2478 0.n.1
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Thu, 10 Mar 2005, Steven Rostedt wrote: The short term fix is probably to put back the preempt_disables, the long term is to get rid of these stupid bit_spin_lock busy loops. Doing a quick search on the kernel, it looks like only kjournald uses the bit_spin_locks. I'll start converting them to spinlocks. The use seems to be more of a hack, since it is using bits in the state field for locking, and these bits aren't used for anything else. -- Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Sat, 2005-02-19 at 10:03 +0100, Ingo Molnar wrote: > * Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > > Testing on an all SCSI 1.3Ghz Athlon XP system, I am seeing very long > > > latencies in the journalling code with 2.6.11-rc4-RT-V0.7.39-02. > > > > could you send me the full trace? > On my other machine this 333us trace is the longest latency reported in the first few minutes with PREEMPT_DESKTOP. It seems to be a regression from earlier versions. If I read the trace right copy_pte_range is the problem. Lee preemption latency trace v1.1.4 on 2.6.11-rc4-RT-V0.7.39-02 latency: 333 µs, #63/63, CPU#0 | (M:preempt VP:0, KP:1, SP:1 HP:1 #P:1) - | task: XFree86-2593 (uid:0 nice:0 policy:0 rt_prio:0) - _--=> CPU# / _-=> irqs-off | / _=> need-resched || / _---=> hardirq/softirq ||| / _--=> preempt-depth / | delay cmd pid | time | caller \ /| \ | / (T1/#0) dpkg 4362 0 5 0006 [380181315825] 0.000ms (+3550398.796ms): <676b7064> (<00746500>) (T1/#2) dpkg 4362 0 5 0006 0002 [380181316227] 0.000ms (+0.000ms): __trace_start_sched_wakeup+0x96/0xc0 (try_to_wake_up+0x81/0x150 ) (T1/#3) dpkg 4362 0 5 0004 0003 [380181316766] 0.001ms (+0.001ms): wake_up_state+0x1e/0x30 (signal_wake_up+0x2d/0x30 ) (T1/#4) dpkg 4362 0 5 0004 [380181317637] 0.003ms (+0.000ms): __wake_up+0xe/0x70 (mousedev_event+0xd8/0x140 ) (T1/#5) dpkg 4362 0 5 0001 0005 [380181318080] 0.003ms (+0.001ms): __wake_up_common+0xb/0x70 (__wake_up+0x3b/0x70 ) (T1/#6) dpkg 4362 0 5 0006 [380181318983] 0.005ms (+0.002ms): usb_submit_urb+0xe/0x2c0 (hid_irq_in+0x4e/0xe0 ) (T1/#7) dpkg 4362 0 5 0007 [380181320688] 0.008ms (+0.001ms): hcd_submit_urb+0xe/0x200 (usb_submit_urb+0x1c6/0x2c0 ) (T1/#8) dpkg 4362 0 5 0001 0008 [380181321463] 0.009ms (+0.000ms): usb_get_dev+0x9/0x30 (hcd_submit_urb+0x1a9/0x200 ) (T1/#9) dpkg 4362 0 5 0001 0009 [380181321943] 0.010ms (+0.000ms): get_device+0x8/0x30 (usb_get_dev+0x19/0x30 ) (T1/#10) dpkg 4362 0 5 0001 000a [380181322283] 0.010ms (+0.000ms): kobject_get+0x9/0x30 (get_device+0x1a/0x30 ) (T1/#11) dpkg 4362 0 5 0001 000b [380181322691] 0.011ms (+0.001ms): kref_get+0x9/0x60 (kobject_get+0x19/0x30 ) (T1/#12) dpkg 4362 0 5 000c [380181323295] 0.012ms (+0.000ms): usb_get_urb+0x9/0x20 (hcd_submit_urb+0xc6/0x200 ) (T1/#13) dpkg 4362 0 5 000d [380181323566] 0.012ms (+0.001ms): kref_get+0x9/0x60 (usb_get_urb+0x16/0x20 ) (T1/#14) dpkg 4362 0 5 000e [380181324216] 0.013ms (+0.000ms): uhci_urb_enqueue+0xe/0x290 (hcd_submit_urb+0x123/0x200 ) (T1/#15) dpkg 4362 0 5 0001 000f [380181324743] 0.014ms (+0.000ms): uhci_find_urb_ep+0xe/0xb0 (uhci_urb_enqueue+0x7a/0x290 ) (T1/#16) dpkg 4362 0 5 0001 0010 [380181325251] 0.015ms (+0.000ms): uhci_alloc_urb_priv+0xb/0x80 (uhci_urb_enqueue+0x87/0x290 ) (T1/#17) dpkg 4362 0 5 0001 0011 [380181325582] 0.016ms (+0.001ms): kmem_cache_alloc+0xb/0x70 (uhci_alloc_urb_priv+0x1c/0x80 ) (T1/#18) dpkg 4362 0 5 0001 0012 [380181326332] 0.017ms (+0.000ms): usb_check_bandwidth+0xc/0x140 (uhci_urb_enqueue+0x200/0x290 ) (T1/#19) dpkg 4362 0 5 0001 0013 [380181326926] 0.018ms (+0.001ms): usb_calc_bus_time+0x9/0x270 (usb_check_bandwidth+0x6b/0x140 ) (T1/#20) dpkg 4362 0 5 0001 0014 [380181327893] 0.020ms (+0.001ms): uhci_submit_common+0xe/0x380 (uhci_urb_enqueue+0x239/0x290 ) (T1/#21) dpkg 4362 0 5 0001 0015 [380181328984] 0.021ms (+0.001ms): uhci_alloc_td+0xb/0x80 (uhci_submit_common+0xf0/0x380 ) (T1/#22) dpkg 4362 0 5 0001 0016 [380181329685] 0.023ms (+0.002ms): dma_pool_alloc+0xe/0x1a0 (uhci_alloc_td+0x20/0x80 ) (T1/#23) dpkg 4362 0 5 0001 0017 [380181331207] 0.025ms (+0.000ms): usb_get_dev+0x9/0x30 (uhci_alloc_td+0x69/0x80 ) (T1/#24) dpkg 4362 0 5 0001 0018 [380181331544] 0.026ms (+0.000ms): get_device+0x8/0x30 (usb_get_dev+0x19/0x30 ) (T1/#25) dpkg 4362 0 5 0001 0019 [380181331882] 0.026ms (+0.000ms): kobject_get+0x9/0x30 (get_device+0x1a/0x30 ) (T1/#26) dpkg 4362 0 5 0001 001a [380181332215] 0.027ms (+0.000ms):
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Sat, 2005-02-19 at 10:03 +0100, Ingo Molnar wrote: * Ingo Molnar [EMAIL PROTECTED] wrote: Testing on an all SCSI 1.3Ghz Athlon XP system, I am seeing very long latencies in the journalling code with 2.6.11-rc4-RT-V0.7.39-02. could you send me the full trace? On my other machine this 333us trace is the longest latency reported in the first few minutes with PREEMPT_DESKTOP. It seems to be a regression from earlier versions. If I read the trace right copy_pte_range is the problem. Lee preemption latency trace v1.1.4 on 2.6.11-rc4-RT-V0.7.39-02 latency: 333 µs, #63/63, CPU#0 | (M:preempt VP:0, KP:1, SP:1 HP:1 #P:1) - | task: XFree86-2593 (uid:0 nice:0 policy:0 rt_prio:0) - _--= CPU# / _-= irqs-off | / _= need-resched || / _---= hardirq/softirq ||| / _--= preempt-depth / | delay cmd pid | time | caller \ /| \ | / (T1/#0) dpkg 4362 0 5 0006 [380181315825] 0.000ms (+3550398.796ms): 676b7064 (00746500) (T1/#2) dpkg 4362 0 5 0006 0002 [380181316227] 0.000ms (+0.000ms): __trace_start_sched_wakeup+0x96/0xc0 c012cbe6 (try_to_wake_up+0x81/0x150 c010f911) (T1/#3) dpkg 4362 0 5 0004 0003 [380181316766] 0.001ms (+0.001ms): wake_up_state+0x1e/0x30 c010fa5e (signal_wake_up+0x2d/0x30 c011f7bd) (T1/#4) dpkg 4362 0 5 0004 [380181317637] 0.003ms (+0.000ms): __wake_up+0xe/0x70 c011059e (mousedev_event+0xd8/0x140 c0223ac8) (T1/#5) dpkg 4362 0 5 0001 0005 [380181318080] 0.003ms (+0.001ms): __wake_up_common+0xb/0x70 c011052b (__wake_up+0x3b/0x70 c01105cb) (T1/#6) dpkg 4362 0 5 0006 [380181318983] 0.005ms (+0.002ms): usb_submit_urb+0xe/0x2c0 dcabaefe (hid_irq_in+0x4e/0xe0 dca7335e) (T1/#7) dpkg 4362 0 5 0007 [380181320688] 0.008ms (+0.001ms): hcd_submit_urb+0xe/0x200 dcaba57e (usb_submit_urb+0x1c6/0x2c0 dcabb0b6) (T1/#8) dpkg 4362 0 5 0001 0008 [380181321463] 0.009ms (+0.000ms): usb_get_dev+0x9/0x30 dcab5939 (hcd_submit_urb+0x1a9/0x200 dcaba719) (T1/#9) dpkg 4362 0 5 0001 0009 [380181321943] 0.010ms (+0.000ms): get_device+0x8/0x30 c02012d8 (usb_get_dev+0x19/0x30 dcab5949) (T1/#10) dpkg 4362 0 5 0001 000a [380181322283] 0.010ms (+0.000ms): kobject_get+0x9/0x30 c01d7869 (get_device+0x1a/0x30 c02012ea) (T1/#11) dpkg 4362 0 5 0001 000b [380181322691] 0.011ms (+0.001ms): kref_get+0x9/0x60 c01d8339 (kobject_get+0x19/0x30 c01d7879) (T1/#12) dpkg 4362 0 5 000c [380181323295] 0.012ms (+0.000ms): usb_get_urb+0x9/0x20 dcabaed9 (hcd_submit_urb+0xc6/0x200 dcaba636) (T1/#13) dpkg 4362 0 5 000d [380181323566] 0.012ms (+0.001ms): kref_get+0x9/0x60 c01d8339 (usb_get_urb+0x16/0x20 dcabaee6) (T1/#14) dpkg 4362 0 5 000e [380181324216] 0.013ms (+0.000ms): uhci_urb_enqueue+0xe/0x290 dca6bf4e (hcd_submit_urb+0x123/0x200 dcaba693) (T1/#15) dpkg 4362 0 5 0001 000f [380181324743] 0.014ms (+0.000ms): uhci_find_urb_ep+0xe/0xb0 dca6be9e (uhci_urb_enqueue+0x7a/0x290 dca6bfba) (T1/#16) dpkg 4362 0 5 0001 0010 [380181325251] 0.015ms (+0.000ms): uhci_alloc_urb_priv+0xb/0x80 dca6aebb (uhci_urb_enqueue+0x87/0x290 dca6bfc7) (T1/#17) dpkg 4362 0 5 0001 0011 [380181325582] 0.016ms (+0.001ms): kmem_cache_alloc+0xb/0x70 c013dc6b (uhci_alloc_urb_priv+0x1c/0x80 dca6aecc) (T1/#18) dpkg 4362 0 5 0001 0012 [380181326332] 0.017ms (+0.000ms): usb_check_bandwidth+0xc/0x140 dcaba2fc (uhci_urb_enqueue+0x200/0x290 dca6c140) (T1/#19) dpkg 4362 0 5 0001 0013 [380181326926] 0.018ms (+0.001ms): usb_calc_bus_time+0x9/0x270 dcaba089 (usb_check_bandwidth+0x6b/0x140 dcaba35b) (T1/#20) dpkg 4362 0 5 0001 0014 [380181327893] 0.020ms (+0.001ms): uhci_submit_common+0xe/0x380 dca6b77e (uhci_urb_enqueue+0x239/0x290 dca6c179) (T1/#21) dpkg 4362 0 5 0001 0015 [380181328984] 0.021ms (+0.001ms): uhci_alloc_td+0xb/0x80 dca6a5bb (uhci_submit_common+0xf0/0x380 dca6b860) (T1/#22) dpkg 4362 0 5 0001 0016 [380181329685] 0.023ms (+0.002ms): dma_pool_alloc+0xe/0x1a0 c02051fe (uhci_alloc_td+0x20/0x80 dca6a5d0) (T1/#23) dpkg 4362 0 5 0001 0017 [380181331207] 0.025ms (+0.000ms): usb_get_dev+0x9/0x30 dcab5939 (uhci_alloc_td+0x69/0x80 dca6a619) (T1/#24) dpkg 4362 0 5 0001
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Sat, 2005-02-19 at 15:45 -0500, Lee Revell wrote: > I have not tried "data=journal". As previously stated "data=writeback" > works perfectly - I ran JACK overnight while stressing the fs and did > not get one xrun. "data=journal" has the same good performance as "data=writeback". Only the ordered data mode is affected. Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
* Lee Revell <[EMAIL PROTECTED]> wrote: > On Fri, 2005-02-04 at 11:03 +0100, Ingo Molnar wrote: > > http://redhat.com/~mingo/realtime-preempt/ > > > > Testing on an all SCSI 1.3Ghz Athlon XP system, I am seeing very long > latencies in the journalling code with 2.6.11-rc4-RT-V0.7.39-02. could you send me the full trace? Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > > Testing on an all SCSI 1.3Ghz Athlon XP system, I am seeing very long > > latencies in the journalling code with 2.6.11-rc4-RT-V0.7.39-02. > > could you send me the full trace? just in case the system in question is still running - could you also do a 'verbose' trace via: echo 1 > /proc/sys/kernel/trace_verbose and then copying /proc/latency_trace again? (so that we can see the precise function call offsets - journal_commit_transaction() is a long function.) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
* Ingo Molnar [EMAIL PROTECTED] wrote: Testing on an all SCSI 1.3Ghz Athlon XP system, I am seeing very long latencies in the journalling code with 2.6.11-rc4-RT-V0.7.39-02. could you send me the full trace? just in case the system in question is still running - could you also do a 'verbose' trace via: echo 1 /proc/sys/kernel/trace_verbose and then copying /proc/latency_trace again? (so that we can see the precise function call offsets - journal_commit_transaction() is a long function.) Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
* Lee Revell [EMAIL PROTECTED] wrote: On Fri, 2005-02-04 at 11:03 +0100, Ingo Molnar wrote: http://redhat.com/~mingo/realtime-preempt/ Testing on an all SCSI 1.3Ghz Athlon XP system, I am seeing very long latencies in the journalling code with 2.6.11-rc4-RT-V0.7.39-02. could you send me the full trace? Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Sat, 2005-02-19 at 15:45 -0500, Lee Revell wrote: I have not tried data=journal. As previously stated data=writeback works perfectly - I ran JACK overnight while stressing the fs and did not get one xrun. data=journal has the same good performance as data=writeback. Only the ordered data mode is affected. Lee - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Sat, 2005-02-19 at 00:08 -0500, Lee Revell wrote: > On Fri, 2005-02-04 at 11:03 +0100, Ingo Molnar wrote: > > http://redhat.com/~mingo/realtime-preempt/ > > > > Testing on an all SCSI 1.3Ghz Athlon XP system, I am seeing very long > latencies in the journalling code with 2.6.11-rc4-RT-V0.7.39-02. If I mount all filesystems with 'data=writeback', it works perfectly. I can run 'dbench 64', JACK with Hydrogen at 32 frames and have been unable to produce a single xrun. The maximum wakeup latency I have seen is 139us. With 'data=ordered', just launching a web browser can produce an xrun. Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Fri, 2005-02-04 at 11:03 +0100, Ingo Molnar wrote: > http://redhat.com/~mingo/realtime-preempt/ > Testing on an all SCSI 1.3Ghz Athlon XP system, I am seeing very long latencies in the journalling code with 2.6.11-rc4-RT-V0.7.39-02. preemption latency trace v1.1.4 on 2.6.11-rc4-RT-V0.7.39-02 latency: 713 µs, #3455/3455, CPU#0 | (M:preempt VP:0, KP:1, SP:1 HP:1 #P:1) - | task: ksoftirqd/0-2 (uid:0 nice:-10 policy:0 rt_prio:0) - _--=> CPU# / _-=> irqs-off | / _=> need-resched || / _---=> hardirq/softirq ||| / _--=> preempt-depth / | delay cmd pid | time | caller \ /| \ | / kjournal-2478 0dn.40µs!: <756f6a6b> (<6c616e72>) kjournal-2478 0dn.40µs : __trace_start_sched_wakeup (try_to_wake_up) kjournal-2478 0dn.30µs : preempt_schedule (try_to_wake_up) kjournal-2478 0dn.30µs : try_to_wake_up <<...>-2> (69 73): kjournal-2478 0dn.20µs : preempt_schedule (try_to_wake_up) kjournal-2478 0dn.20µs : wake_up_process (do_softirq) kjournal-2478 0dn.11µs < (1) The repeating pattern is 8 of these: kjournal-2478 0.n.11µs : inverted_lock (journal_commit_transaction) kjournal-2478 0.n.11µs : __journal_unfile_buffer (journal_commit_transaction) kjournal-2478 0.n.11µs : journal_remove_journal_head (journal_commit_transaction) kjournal-2478 0.n.11µs : __journal_remove_journal_head (journal_remove_journal_head) kjournal-2478 0.n.11µs : __brelse (__journal_remove_journal_head) kjournal-2478 0.n.11µs : journal_free_journal_head (journal_remove_journal_head) kjournal-2478 0.n.12µs : kmem_cache_free (journal_free_journal_head) and one of these: kjournal-2478 0dn.19µs : cache_flusharray (kmem_cache_free) kjournal-2478 0dn.29µs : free_block (cache_flusharray) kjournal-2478 0dn.1 11µs : preempt_schedule (cache_flusharray) kjournal-2478 0dn.1 11µs : memmove (cache_flusharray) kjournal-2478 0dn.1 11µs : memcpy (memmove) etc. Finally: kjournal-2478 0dn.1 704µs : cache_flusharray (kmem_cache_free) kjournal-2478 0dn.2 704µs+: free_block (cache_flusharray) kjournal-2478 0dn.1 707µs : preempt_schedule (cache_flusharray) kjournal-2478 0dn.1 707µs : memmove (cache_flusharray) kjournal-2478 0dn.1 707µs : memcpy (memmove) kjournal-2478 0.n.1 708µs : inverted_lock (journal_commit_transaction) kjournal-2478 0.n.1 708µs : __journal_unfile_buffer (journal_commit_transaction) kjournal-2478 0.n.1 709µs : journal_remove_journal_head (journal_commit_transaction) kjournal-2478 0.n.1 709µs : __journal_remove_journal_head (journal_remove_journal_head) kjournal-2478 0.n.1 709µs : __brelse (__journal_remove_journal_head) kjournal-2478 0.n.1 709µs : journal_free_journal_head (journal_remove_journal_head) kjournal-2478 0.n.1 709µs : kmem_cache_free (journal_free_journal_head) kjournal-2478 0.n.. 710µs : preempt_schedule (journal_commit_transaction) kjournal-2478 0dn.. 710µs : __schedule (preempt_schedule) kjournal-2478 0dn.. 710µs : profile_hit (__schedule) kjournal-2478 0dn.1 710µs : sched_clock (__schedule) kjournal-2478 0dn.2 711µs : dequeue_task (__schedule) kjournal-2478 0dn.2 711µs : recalc_task_prio (__schedule) kjournal-2478 0dn.2 711µs : effective_prio (recalc_task_prio) kjournal-2478 0dn.2 711µs : enqueue_task (__schedule) <...>-2 0d..2 712µs : __switch_to (__schedule) <...>-2 0d..2 712µs : __schedule (73 69): <...>-2 0d..2 712µs : finish_task_switch (__schedule) <...>-2 0d..1 712µs : trace_stop_sched_switched (finish_task_switch) <...>-2 0d..1 712µs : trace_stop_sched_switched <<...>-2> (69 0): <...>-2 0d..1 713µs : trace_stop_sched_switched (finish_task_switch) Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Fri, 2005-02-04 at 11:03 +0100, Ingo Molnar wrote: http://redhat.com/~mingo/realtime-preempt/ Testing on an all SCSI 1.3Ghz Athlon XP system, I am seeing very long latencies in the journalling code with 2.6.11-rc4-RT-V0.7.39-02. preemption latency trace v1.1.4 on 2.6.11-rc4-RT-V0.7.39-02 latency: 713 µs, #3455/3455, CPU#0 | (M:preempt VP:0, KP:1, SP:1 HP:1 #P:1) - | task: ksoftirqd/0-2 (uid:0 nice:-10 policy:0 rt_prio:0) - _--= CPU# / _-= irqs-off | / _= need-resched || / _---= hardirq/softirq ||| / _--= preempt-depth / | delay cmd pid | time | caller \ /| \ | / kjournal-2478 0dn.40µs!: 756f6a6b (6c616e72) kjournal-2478 0dn.40µs : __trace_start_sched_wakeup (try_to_wake_up) kjournal-2478 0dn.30µs : preempt_schedule (try_to_wake_up) kjournal-2478 0dn.30µs : try_to_wake_up ...-2 (69 73): kjournal-2478 0dn.20µs : preempt_schedule (try_to_wake_up) kjournal-2478 0dn.20µs : wake_up_process (do_softirq) kjournal-2478 0dn.11µs (1) The repeating pattern is 8 of these: kjournal-2478 0.n.11µs : inverted_lock (journal_commit_transaction) kjournal-2478 0.n.11µs : __journal_unfile_buffer (journal_commit_transaction) kjournal-2478 0.n.11µs : journal_remove_journal_head (journal_commit_transaction) kjournal-2478 0.n.11µs : __journal_remove_journal_head (journal_remove_journal_head) kjournal-2478 0.n.11µs : __brelse (__journal_remove_journal_head) kjournal-2478 0.n.11µs : journal_free_journal_head (journal_remove_journal_head) kjournal-2478 0.n.12µs : kmem_cache_free (journal_free_journal_head) and one of these: kjournal-2478 0dn.19µs : cache_flusharray (kmem_cache_free) kjournal-2478 0dn.29µs : free_block (cache_flusharray) kjournal-2478 0dn.1 11µs : preempt_schedule (cache_flusharray) kjournal-2478 0dn.1 11µs : memmove (cache_flusharray) kjournal-2478 0dn.1 11µs : memcpy (memmove) etc. Finally: kjournal-2478 0dn.1 704µs : cache_flusharray (kmem_cache_free) kjournal-2478 0dn.2 704µs+: free_block (cache_flusharray) kjournal-2478 0dn.1 707µs : preempt_schedule (cache_flusharray) kjournal-2478 0dn.1 707µs : memmove (cache_flusharray) kjournal-2478 0dn.1 707µs : memcpy (memmove) kjournal-2478 0.n.1 708µs : inverted_lock (journal_commit_transaction) kjournal-2478 0.n.1 708µs : __journal_unfile_buffer (journal_commit_transaction) kjournal-2478 0.n.1 709µs : journal_remove_journal_head (journal_commit_transaction) kjournal-2478 0.n.1 709µs : __journal_remove_journal_head (journal_remove_journal_head) kjournal-2478 0.n.1 709µs : __brelse (__journal_remove_journal_head) kjournal-2478 0.n.1 709µs : journal_free_journal_head (journal_remove_journal_head) kjournal-2478 0.n.1 709µs : kmem_cache_free (journal_free_journal_head) kjournal-2478 0.n.. 710µs : preempt_schedule (journal_commit_transaction) kjournal-2478 0dn.. 710µs : __schedule (preempt_schedule) kjournal-2478 0dn.. 710µs : profile_hit (__schedule) kjournal-2478 0dn.1 710µs : sched_clock (__schedule) kjournal-2478 0dn.2 711µs : dequeue_task (__schedule) kjournal-2478 0dn.2 711µs : recalc_task_prio (__schedule) kjournal-2478 0dn.2 711µs : effective_prio (recalc_task_prio) kjournal-2478 0dn.2 711µs : enqueue_task (__schedule) ...-2 0d..2 712µs : __switch_to (__schedule) ...-2 0d..2 712µs : __schedule kjournal-2478 (73 69): ...-2 0d..2 712µs : finish_task_switch (__schedule) ...-2 0d..1 712µs : trace_stop_sched_switched (finish_task_switch) ...-2 0d..1 712µs : trace_stop_sched_switched ...-2 (69 0): ...-2 0d..1 713µs : trace_stop_sched_switched (finish_task_switch) Lee - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Sat, 2005-02-19 at 00:08 -0500, Lee Revell wrote: On Fri, 2005-02-04 at 11:03 +0100, Ingo Molnar wrote: http://redhat.com/~mingo/realtime-preempt/ Testing on an all SCSI 1.3Ghz Athlon XP system, I am seeing very long latencies in the journalling code with 2.6.11-rc4-RT-V0.7.39-02. If I mount all filesystems with 'data=writeback', it works perfectly. I can run 'dbench 64', JACK with Hydrogen at 32 frames and have been unable to produce a single xrun. The maximum wakeup latency I have seen is 139us. With 'data=ordered', just launching a web browser can produce an xrun. Lee - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Sun, 2005-02-13 at 13:59 +0100, Ingo Molnar wrote: > yeah - it's "M" already in fs/proc/array.c, but i missed the sched.c > case. > You also missed the kernel/rt.c case :-) -- Steve Index: kernel/rt.c === --- kernel/rt.c (revision 75) +++ kernel/rt.c (working copy) @@ -207,6 +207,7 @@ { switch (p->state) { case TASK_RUNNING: printk("R"); break; + case TASK_RUNNING_MUTEX:printk("M"); break; case TASK_INTERRUPTIBLE:printk("s"); break; case TASK_UNINTERRUPTIBLE: printk("D"); break; case TASK_STOPPED: printk("T"); break; This is still from the 38-06. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
* Steven Rostedt <[EMAIL PROTECTED]> wrote: > Ingo, > > Here's a trivial patch to help others from freaking out when they see > on a show_trace that most of their processes are TASK_UNINTERRUPTIBLE. thanks, applied it to -39-00. > - static const char *stat_nam[] = { "R", "S", "D", "T", "t", "Z", "X" }; > + static const char *stat_nam[] = { "R", "M", "S", "D", "T", "t", "Z", > "X" }; > I figure that "M" would be a good fit for TASK_RUNNING_MUTEX. yeah - it's "M" already in fs/proc/array.c, but i missed the sched.c case. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
* Steven Rostedt [EMAIL PROTECTED] wrote: Ingo, Here's a trivial patch to help others from freaking out when they see on a show_trace that most of their processes are TASK_UNINTERRUPTIBLE. thanks, applied it to -39-00. - static const char *stat_nam[] = { R, S, D, T, t, Z, X }; + static const char *stat_nam[] = { R, M, S, D, T, t, Z, X }; I figure that M would be a good fit for TASK_RUNNING_MUTEX. yeah - it's M already in fs/proc/array.c, but i missed the sched.c case. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
On Sun, 2005-02-13 at 13:59 +0100, Ingo Molnar wrote: yeah - it's M already in fs/proc/array.c, but i missed the sched.c case. You also missed the kernel/rt.c case :-) -- Steve Index: kernel/rt.c === --- kernel/rt.c (revision 75) +++ kernel/rt.c (working copy) @@ -207,6 +207,7 @@ { switch (p-state) { case TASK_RUNNING: printk(R); break; + case TASK_RUNNING_MUTEX:printk(M); break; case TASK_INTERRUPTIBLE:printk(s); break; case TASK_UNINTERRUPTIBLE: printk(D); break; case TASK_STOPPED: printk(T); break; This is still from the 38-06. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
Ingo, Here's a trivial patch to help others from freaking out when they see on a show_trace that most of their processes are TASK_UNINTERRUPTIBLE. Index: kernel/sched.c === --- kernel/sched.c (revision 75) +++ kernel/sched.c (working copy) @@ -4489,7 +4489,7 @@ task_t *relative; unsigned state; unsigned long free = 0; - static const char *stat_nam[] = { "R", "S", "D", "T", "t", "Z", "X" }; + static const char *stat_nam[] = { "R", "M", "S", "D", "T", "t", "Z", "X" }; printk("%-13.13s [%p]", p->comm, p); state = p->state ? __ffs(p->state) + 1 : 0; I figure that "M" would be a good fit for TASK_RUNNING_MUTEX. -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
* Sven Dietrich <[EMAIL PROTECTED]> wrote: > > this patch only changes xtime_lock back and forth - it does > > in no way impact the 'threadedness' of the timer IRQ. (it > > does not move the timer IRQ into an interrupt thread.) > > > > nor do we really want to make it configurable - it's > > non-threaded right now and we'll see what effect this has on > > the worst-case latencies. > > Its clear that there are all sorts of issues with process accounting > and other race conditions associated with running the timer in a > thread. > > The timer IRQ does have a noticable impact especially on the slower > CPUS. In this domain, precise process time accounting may not be all > that important, as long as the scheduler does not get confused, and > that lone NODELAY IRQ doesn't get delayed (as much). well, i saved the delta when i removed threaded timer IRQs, find the patch below, apply it with -R to -RT-V0.7.37-00 to get threaded irqs back on x86. Right now i dont plan to reintroduce threaded timer IRQs because it causes architecture merging problems (e.g. on x64 and MIPS) and also caused artifacts. So the complexity vs. latency benefit is not all that clear, especially at this stage. Also note that there were unsolved problems wrt. time handling in the threaded setup. (we can try it again later on. But if we do so it will have to be an all-or-nothing item - #ifdef hell and behavioral divergence is to be avoided.) Ingo --- linux.old/Makefile +++ linux.new/Makefile @@ -1,7 +1,7 @@ VERSION = 2 PATCHLEVEL = 6 SUBLEVEL = 11 -EXTRAVERSION =-rc2-RT-V0.7.36-06 +EXTRAVERSION =-rc2-RT-V0.7.37-00 NAME=Woozy Numbat # *DOCUMENTATION* --- linux.old/arch/i386/kernel/irq.c +++ linux.new/arch/i386/kernel/irq.c @@ -70,8 +70,6 @@ fastcall notrace unsigned int do_IRQ(str } } #endif - if (unlikely(!irq)) - direct_timer_interrupt(regs); #ifdef CONFIG_4KSTACKS --- linux.old/arch/i386/kernel/time.c +++ linux.new/arch/i386/kernel/time.c @@ -82,7 +82,7 @@ unsigned long cpu_khz;/* Detected as we extern unsigned long wall_jiffies; -DEFINE_SPINLOCK(rtc_lock); +DEFINE_RAW_SPINLOCK(rtc_lock); #include @@ -217,19 +217,6 @@ unsigned long notrace profile_pc(struct EXPORT_SYMBOL(profile_pc); #endif -#ifdef CONFIG_PREEMPT_HARDIRQS - -/* - * If the timer is redirected then this is the minimal - * interrupt-context processing we have to do: - */ -void direct_timer_interrupt(struct pt_regs *regs) -{ - do_timer_interrupt_hook(regs); -} - -#endif - /* * timer_interrupt() needs to keep up the real-time clock, * as well as call the "do_timer()" routine every clocktick @@ -254,9 +241,7 @@ static inline void do_timer_interrupt(in } #endif -#ifndef CONFIG_PREEMPT_HARDIRQS do_timer_interrupt_hook(regs); -#endif /* * If we have an externally synchronized Linux clock, then update @@ -313,7 +298,6 @@ irqreturn_t timer_interrupt(int irq, voi write_seqlock(_lock); cur_timer->mark_offset(); - do_timer(regs); do_timer_interrupt(irq, NULL, regs); --- linux.old/arch/i386/mach-default/setup.c +++ linux.new/arch/i386/mach-default/setup.c @@ -71,7 +71,7 @@ void __init trap_init_hook(void) { } -static struct irqaction irq0 = { timer_interrupt, SA_INTERRUPT, CPU_MASK_NONE, "timer", NULL, NULL}; +static struct irqaction irq0 = { timer_interrupt, SA_INTERRUPT | SA_NODELAY, CPU_MASK_NONE, "timer", NULL, NULL}; /** * time_init_hook - do any specific initialisations for the system timer. --- linux.old/drivers/char/rtc.c +++ linux.new/drivers/char/rtc.c @@ -380,6 +380,8 @@ static inline void rtc_close_event(void) irqreturn_t rtc_interrupt(int irq, void *dev_id, struct pt_regs *regs) { + int mod; + /* * Can be an alarm interrupt, update complete interrupt, * or a periodic interrupt. We store the status in the @@ -401,10 +403,13 @@ irqreturn_t rtc_interrupt(int irq, void rtc_irq_data |= (CMOS_READ(RTC_INTR_FLAGS) & 0xF0); } + mod = 0; if (rtc_status & RTC_TIMER_ON) - mod_timer(_irq_timer, jiffies + HZ/rtc_freq + 2*HZ/100); + mod = 1; spin_unlock (_lock); + if (mod) + mod_timer(_irq_timer, jiffies + HZ/rtc_freq + 2*HZ/100); /* Now do the rest of the actions */ spin_lock(_task_lock); @@ -569,8 +574,8 @@ static int rtc_do_ioctl(unsigned int cmd if (rtc_status & RTC_TIMER_ON) { spin_lock_irq (_lock); rtc_status &= ~RTC_TIMER_ON; - del_timer(_irq_timer); spin_unlock_irq (_lock); + del_timer(_irq_timer); } return 0; } @@ -588,9 +593,9 @@ static int rtc_do_ioctl(unsigned int cmd if (!(rtc_status &
RE: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
Ingo wrote: > > * Sven Dietrich <[EMAIL PROTECTED]> wrote: > > > This patch adds a config option to allow you to select > whether timer > > IRQ runs in thread or not. > > this patch only changes xtime_lock back and forth - it does > in no way impact the 'threadedness' of the timer IRQ. (it > does not move the timer IRQ into an interrupt thread.) > > nor do we really want to make it configurable - it's > non-threaded right now and we'll see what effect this has on > the worst-case latencies. > > Ingo > Its clear that there are all sorts of issues with process accounting and other race conditions associated with running the timer in a thread. The timer IRQ does have a noticable impact especially on the slower CPUS. In this domain, precise process time accounting may not be all that important, as long as the scheduler does not get confused, and that lone NODELAY IRQ doesn't get delayed (as much). It would be nice if some of the process accounting could be pipelined or deferred, but I don't have those answers right now. Sven - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
* Sven Dietrich <[EMAIL PROTECTED]> wrote: > No, this is not in arm. Here is the patch. > > Index: linux-2.6.10/include/asm-i386/spinlock.h what version do you have? The current released patch is 2.6.11-rc3-V0.7.38-10. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
No, this is not in arm. Here is the patch. Index: linux-2.6.10/include/asm-i386/spinlock.h === --- linux-2.6.10.orig/include/asm-i386/spinlock.h 2005-02-11 09:25:39.224240321 + +++ linux-2.6.10/include/asm-i386/spinlock.h 2005-02-11 09:25:58.006812173 + @@ -30,7 +30,7 @@ #define __raw_spin_is_locked(x)(*(volatile signed char *)(&(x)->lock) <= 0) #define __raw_spin_unlock_wait(x) \ - do { barrier(); } while(__spin_is_locked(x)) + do { barrier(); } while(__raw_spin_is_locked(x)) #define spin_lock_string \ "\n1:\t" \ > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Ingo Molnar > Sent: Friday, February 11, 2005 12:34 AM > To: George Anzinger > Cc: William Weston; linux-kernel@vger.kernel.org > Subject: Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 > > > > * George Anzinger wrote: > > > Possibly from: > > define __raw_spin_is_locked(x) (*(volatile signed char > *)(&(x)->lock) <= 0) > > #define __raw_spin_unlock_wait(x) \ > > do { barrier(); } while(__spin_is_locked(x)) > > in asm/spinlock.h > > > > should that be __raw_spin_is_locked(x) instead? > > yeah. Is this in the ARM patch? I havent applied the ARM > patch yet, waiting to see Thomas Gleixner's generic-hardirq > based one. (which is more compelling from an architectural > and long-term maintainance POV - but also more work to > address all of RMK's concerns.) > > Ingo > - > To unsubscribe from this list: send the line "unsubscribe > linux-kernel" in the body of a message to > [EMAIL PROTECTED] More majordomo info at > http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
* George Anzinger wrote: > Possibly from: > define __raw_spin_is_locked(x)(*(volatile signed char *)(&(x)->lock) > <= 0) > #define __raw_spin_unlock_wait(x) \ > do { barrier(); } while(__spin_is_locked(x)) > in asm/spinlock.h > > should that be __raw_spin_is_locked(x) instead? yeah. Is this in the ARM patch? I havent applied the ARM patch yet, waiting to see Thomas Gleixner's generic-hardirq based one. (which is more compelling from an architectural and long-term maintainance POV - but also more work to address all of RMK's concerns.) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
* Sven Dietrich <[EMAIL PROTECTED]> wrote: > This patch adds a config option to allow you to select whether timer > IRQ runs in thread or not. this patch only changes xtime_lock back and forth - it does in no way impact the 'threadedness' of the timer IRQ. (it does not move the timer IRQ into an interrupt thread.) nor do we really want to make it configurable - it's non-threaded right now and we'll see what effect this has on the worst-case latencies. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
* Sven Dietrich [EMAIL PROTECTED] wrote: This patch adds a config option to allow you to select whether timer IRQ runs in thread or not. this patch only changes xtime_lock back and forth - it does in no way impact the 'threadedness' of the timer IRQ. (it does not move the timer IRQ into an interrupt thread.) nor do we really want to make it configurable - it's non-threaded right now and we'll see what effect this has on the worst-case latencies. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
* George Anzinger george@mvista.com wrote: Possibly from: define __raw_spin_is_locked(x)(*(volatile signed char *)((x)-lock) = 0) #define __raw_spin_unlock_wait(x) \ do { barrier(); } while(__spin_is_locked(x)) in asm/spinlock.h should that be __raw_spin_is_locked(x) instead? yeah. Is this in the ARM patch? I havent applied the ARM patch yet, waiting to see Thomas Gleixner's generic-hardirq based one. (which is more compelling from an architectural and long-term maintainance POV - but also more work to address all of RMK's concerns.) Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
No, this is not in arm. Here is the patch. Index: linux-2.6.10/include/asm-i386/spinlock.h === --- linux-2.6.10.orig/include/asm-i386/spinlock.h 2005-02-11 09:25:39.224240321 + +++ linux-2.6.10/include/asm-i386/spinlock.h 2005-02-11 09:25:58.006812173 + @@ -30,7 +30,7 @@ #define __raw_spin_is_locked(x)(*(volatile signed char *)((x)-lock) = 0) #define __raw_spin_unlock_wait(x) \ - do { barrier(); } while(__spin_is_locked(x)) + do { barrier(); } while(__raw_spin_is_locked(x)) #define spin_lock_string \ \n1:\t \ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ingo Molnar Sent: Friday, February 11, 2005 12:34 AM To: George Anzinger Cc: William Weston; linux-kernel@vger.kernel.org Subject: Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 * George Anzinger george@mvista.com wrote: Possibly from: define __raw_spin_is_locked(x) (*(volatile signed char *)((x)-lock) = 0) #define __raw_spin_unlock_wait(x) \ do { barrier(); } while(__spin_is_locked(x)) in asm/spinlock.h should that be __raw_spin_is_locked(x) instead? yeah. Is this in the ARM patch? I havent applied the ARM patch yet, waiting to see Thomas Gleixner's generic-hardirq based one. (which is more compelling from an architectural and long-term maintainance POV - but also more work to address all of RMK's concerns.) Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
* Sven Dietrich [EMAIL PROTECTED] wrote: No, this is not in arm. Here is the patch. Index: linux-2.6.10/include/asm-i386/spinlock.h what version do you have? The current released patch is 2.6.11-rc3-V0.7.38-10. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
Ingo wrote: * Sven Dietrich [EMAIL PROTECTED] wrote: This patch adds a config option to allow you to select whether timer IRQ runs in thread or not. this patch only changes xtime_lock back and forth - it does in no way impact the 'threadedness' of the timer IRQ. (it does not move the timer IRQ into an interrupt thread.) nor do we really want to make it configurable - it's non-threaded right now and we'll see what effect this has on the worst-case latencies. Ingo Its clear that there are all sorts of issues with process accounting and other race conditions associated with running the timer in a thread. The timer IRQ does have a noticable impact especially on the slower CPUS. In this domain, precise process time accounting may not be all that important, as long as the scheduler does not get confused, and that lone NODELAY IRQ doesn't get delayed (as much). It would be nice if some of the process accounting could be pipelined or deferred, but I don't have those answers right now. Sven - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
* Sven Dietrich [EMAIL PROTECTED] wrote: this patch only changes xtime_lock back and forth - it does in no way impact the 'threadedness' of the timer IRQ. (it does not move the timer IRQ into an interrupt thread.) nor do we really want to make it configurable - it's non-threaded right now and we'll see what effect this has on the worst-case latencies. Its clear that there are all sorts of issues with process accounting and other race conditions associated with running the timer in a thread. The timer IRQ does have a noticable impact especially on the slower CPUS. In this domain, precise process time accounting may not be all that important, as long as the scheduler does not get confused, and that lone NODELAY IRQ doesn't get delayed (as much). well, i saved the delta when i removed threaded timer IRQs, find the patch below, apply it with -R to -RT-V0.7.37-00 to get threaded irqs back on x86. Right now i dont plan to reintroduce threaded timer IRQs because it causes architecture merging problems (e.g. on x64 and MIPS) and also caused artifacts. So the complexity vs. latency benefit is not all that clear, especially at this stage. Also note that there were unsolved problems wrt. time handling in the threaded setup. (we can try it again later on. But if we do so it will have to be an all-or-nothing item - #ifdef hell and behavioral divergence is to be avoided.) Ingo --- linux.old/Makefile +++ linux.new/Makefile @@ -1,7 +1,7 @@ VERSION = 2 PATCHLEVEL = 6 SUBLEVEL = 11 -EXTRAVERSION =-rc2-RT-V0.7.36-06 +EXTRAVERSION =-rc2-RT-V0.7.37-00 NAME=Woozy Numbat # *DOCUMENTATION* --- linux.old/arch/i386/kernel/irq.c +++ linux.new/arch/i386/kernel/irq.c @@ -70,8 +70,6 @@ fastcall notrace unsigned int do_IRQ(str } } #endif - if (unlikely(!irq)) - direct_timer_interrupt(regs); #ifdef CONFIG_4KSTACKS --- linux.old/arch/i386/kernel/time.c +++ linux.new/arch/i386/kernel/time.c @@ -82,7 +82,7 @@ unsigned long cpu_khz;/* Detected as we extern unsigned long wall_jiffies; -DEFINE_SPINLOCK(rtc_lock); +DEFINE_RAW_SPINLOCK(rtc_lock); #include asm/i8253.h @@ -217,19 +217,6 @@ unsigned long notrace profile_pc(struct EXPORT_SYMBOL(profile_pc); #endif -#ifdef CONFIG_PREEMPT_HARDIRQS - -/* - * If the timer is redirected then this is the minimal - * interrupt-context processing we have to do: - */ -void direct_timer_interrupt(struct pt_regs *regs) -{ - do_timer_interrupt_hook(regs); -} - -#endif - /* * timer_interrupt() needs to keep up the real-time clock, * as well as call the do_timer() routine every clocktick @@ -254,9 +241,7 @@ static inline void do_timer_interrupt(in } #endif -#ifndef CONFIG_PREEMPT_HARDIRQS do_timer_interrupt_hook(regs); -#endif /* * If we have an externally synchronized Linux clock, then update @@ -313,7 +298,6 @@ irqreturn_t timer_interrupt(int irq, voi write_seqlock(xtime_lock); cur_timer-mark_offset(); - do_timer(regs); do_timer_interrupt(irq, NULL, regs); --- linux.old/arch/i386/mach-default/setup.c +++ linux.new/arch/i386/mach-default/setup.c @@ -71,7 +71,7 @@ void __init trap_init_hook(void) { } -static struct irqaction irq0 = { timer_interrupt, SA_INTERRUPT, CPU_MASK_NONE, timer, NULL, NULL}; +static struct irqaction irq0 = { timer_interrupt, SA_INTERRUPT | SA_NODELAY, CPU_MASK_NONE, timer, NULL, NULL}; /** * time_init_hook - do any specific initialisations for the system timer. --- linux.old/drivers/char/rtc.c +++ linux.new/drivers/char/rtc.c @@ -380,6 +380,8 @@ static inline void rtc_close_event(void) irqreturn_t rtc_interrupt(int irq, void *dev_id, struct pt_regs *regs) { + int mod; + /* * Can be an alarm interrupt, update complete interrupt, * or a periodic interrupt. We store the status in the @@ -401,10 +403,13 @@ irqreturn_t rtc_interrupt(int irq, void rtc_irq_data |= (CMOS_READ(RTC_INTR_FLAGS) 0xF0); } + mod = 0; if (rtc_status RTC_TIMER_ON) - mod_timer(rtc_irq_timer, jiffies + HZ/rtc_freq + 2*HZ/100); + mod = 1; spin_unlock (rtc_lock); + if (mod) + mod_timer(rtc_irq_timer, jiffies + HZ/rtc_freq + 2*HZ/100); /* Now do the rest of the actions */ spin_lock(rtc_task_lock); @@ -569,8 +574,8 @@ static int rtc_do_ioctl(unsigned int cmd if (rtc_status RTC_TIMER_ON) { spin_lock_irq (rtc_lock); rtc_status = ~RTC_TIMER_ON; - del_timer(rtc_irq_timer); spin_unlock_irq (rtc_lock); + del_timer(rtc_irq_timer); } return 0; } @@ -588,9 +593,9 @@ static int rtc_do_ioctl(unsigned int cmd if (!(rtc_status
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
Ingo, Here's a trivial patch to help others from freaking out when they see on a show_trace that most of their processes are TASK_UNINTERRUPTIBLE. Index: kernel/sched.c === --- kernel/sched.c (revision 75) +++ kernel/sched.c (working copy) @@ -4489,7 +4489,7 @@ task_t *relative; unsigned state; unsigned long free = 0; - static const char *stat_nam[] = { R, S, D, T, t, Z, X }; + static const char *stat_nam[] = { R, M, S, D, T, t, Z, X }; printk(%-13.13s [%p], p-comm, p); state = p-state ? __ffs(p-state) + 1 : 0; I figure that M would be a good fit for TASK_RUNNING_MUTEX. -- Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
Sven Dietrich wrote: Hi George, you may want to use this for reference. This patch adds a config option to allow you to select whether timer IRQ runs in thread or not. I'm not totally happy with the #ifdefs, but it may make witching back and forth easier. Thanks, but... You are addressing a different problem than I. I want to code the VST patch to work in a system with or without the RT patch (it is easy to work with the RT option on or off). The problem is setting up the spin locks it needs. My solution assumes that RAW_SPIN_LOCK_UNLOCKED will not be defined unless the RT patch is applied. As to your patch, in most archs the timer interrupt does accounting which requires input on just who was interrupted on the interrupt. This is lost when threading the timer IRQ. I think it was problems of this sort that caused Ingo to back away... George PS By the way, your mailer (Microsoft Outlook) set up your attachment in such a way that my mailer would not inline it. You might want to look into this. Sven -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of George Anzinger Sent: Thursday, February 10, 2005 12:21 PM To: Ingo Molnar Cc: William Weston; linux-kernel@vger.kernel.org Subject: Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 If I want to write a patch that will work with or without the RT patch applied is the following enough? #ifndef RAW_SPIN_LOCK_UNLOCKED typedef raw_spinlock_t spinlock_t #define RAW_SPIN_LOCK_UNLOCKED SPIN_LOCK_UNLOCKED #endif -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01
Hi George, you may want to use this for reference. This patch adds a config option to allow you to select whether timer IRQ runs in thread or not. I'm not totally happy with the #ifdefs, but it may make witching back and forth easier. Sven > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > George Anzinger > Sent: Thursday, February 10, 2005 12:21 PM > To: Ingo Molnar > Cc: William Weston; linux-kernel@vger.kernel.org > Subject: Re: [patch] Real-Time Preemption, -RT-2.6.11-rc3-V0.7.38-01 > > > If I want to write a patch that will work with or without the > RT patch applied > is the following enough? > > #ifndef RAW_SPIN_LOCK_UNLOCKED > typedef raw_spinlock_t spinlock_t > #define RAW_SPIN_LOCK_UNLOCKED SPIN_LOCK_UNLOCKED > #endif > > > -- > George Anzinger george@mvista.com > High-res-timers: http://sourceforge.net/projects/high-res-timers/ > > - > To unsubscribe from this list: send the line "unsubscribe > linux-kernel" in the body of a message to > [EMAIL PROTECTED] More majordomo info at > http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > common_timer_irqthread.patch Description: Binary data