Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-18 Thread Lee Revell
On Fri, 2005-03-18 at 01:58 -0500, Steven Rostedt wrote:
> 
> On Thu, 17 Mar 2005, Lee Revell wrote:
> >
> > OK, no need to cc: me on this one any more.  It's really low priority
> > IMO compared to the big latencies I am seeing with ext3 and
> > "data=ordered".  Unless you think there is any relation.
> >
> 
> IMO a deadlock is higher priority than a big latency :-)
> 

Of course, if I was hitting the deadlock in normal use.

> I still belive that something to do with the locking in ext3 has to do
> with your latencies, but I'll take you off when I send something to Andrew
> or Ingo next time. Hopefully, they'll do the same.

If you suspect they are related then yes I would like to be copied.

> 
> When this problem is solved on Ingo's side, maybe this will solve your
> latency problem, so I recommend that you keep trying the latest RT
> kernels.  BTW what test are you running that causes these latencies?

dbench 16

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] remove lame schedule in journal inverted_lock (was: Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks)

2005-03-18 Thread Steven Rostedt

On Fri, 18 Mar 2005, Andrew Morton wrote:
> Steven Rostedt <[EMAIL PROTECTED]> wrote:
> >
> >  I wanted to keep the wait logic out when it wasn't a problem. Basically,
> >  the problem only occurs when bit_spin_trylock is defined as an actual
> >  trylock. So I put in a define there to enable the wait queues.  I didn't
> >  want to waste cycles checking the wait queue in jbd_unlock_bh_state when
> >  there would never be anything on it.  Heck, I figured why even have the
> >  wait queue wasting memory if it wasn't needed.  So that added the
> >  ifdeffery complexity.
>
> No, that code's just a problem.  For ranking reasons it's essentially doing
> this:
>
> repeat:
>   cond_resched();
>   spin_lock(j_list_lock);
>   
>   if (!bit_spin_trylock(bh)) {
>   spin_unlock(j_list_lock);
>   schedule();
>   goto repeat;
>   }
>

Yep, that I understand.

> Now imagine that some other CPU holds the bit_spin_lock and is spinning,
> trying to get the spin_lock().  The above code assumes that the schedule()
> and cond_resched() will take "long enough" for the other CPU to get the
> spinlock, do its business then release the locks.
>
> So all the schedule() is really doing is "blow a few cycles so the other
> CPU can get in and grab the spinlock".  That'll work OK on normal SMP but I
> suspect that on NUMA setups with really big latencies we could end up
> starving the other CPU: this CPU would keep on grabbing the lock.  It
> depends on how the interconnect cache and all that goop works.
>
> So what to do?
>
> One approach would be to spin on the bit_spin_trylock after having dropped
> j_list_lock.  That'll tell us when the other CPU has moved on.
>

This is probably the best for mainline, since, as you mentioned, the
abover code is just bad.

> Another approach would be to sleep on a waitqueue somewhere.  But that
> means that jbd_unlock_bh_state() needs to do wakeups all the time - costly.
>

That's the approach that my patch made.

> Another approach would be to simply whack an msleep(1) in there.  That
> might be OK - it should be very rare.
>

This approach is not much better than the current implementation.

> Probably the first approach would be the one to use.  That's for mainline.
> I don't know what the super-duper-RT fix would be.  Why did we start
> discussing this anyway?
>
> Oh, SCHED_FIFO.  kjournald doesn't run SCHED_FIFO, but someone may decide
> to make it do so.  But even then I don't see a problem for the mainline
> kernel, because this CPU's SCHED_FIFO doesn't stop the other CPU from
> running.
>

So this comes down to just a problem with Ingo's PREEPMT_RT.  This means
that the latency of kjournald, even without SCHED_FIFO will be large. If
it preempts a process that has one of these bit spinlocks, (Ingo's RT
kernel takes out the preempt_disable in them), then the kjournal thread
will spin till its quota is free, causing problems for other processes.
Even a process with a higher priority than kjournal if it blocks on one of
the other locks that kjournal can have while attempting to get the bit
locks.

I know Ingo wants to get his patch eventually into the mainline without
too much drag. But this problem needs to be solved in the mainline to
accomplish this.

What do you recommend?

-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] remove lame schedule in journal inverted_lock (was: Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks)

2005-03-18 Thread Andrew Morton
Steven Rostedt <[EMAIL PROTECTED]> wrote:
>
> >
>  > I really should knock up a script to send out an email when I add a patch
>  > to -mm.
>  >
> 
>  I thought you might have had something like that already, which was
>  another reason I thought you might have skipped this.
>

I do now..

> 
>  > > I'd figure that you may have
>  > > missed this, since the subject didn't change.  So I changed the subject 
> to
>  > > get your attention, and I've resent this. Here's the patch to get rid of
>  > > the the lame schedule that was in fs/jbd/commit.c.   Let me know if this
>  > > patch is appropriate.
>  >
>  > I'm rather aghast at all the ifdeffery and complexity in this one.  But I
>  > haven't looked at it closely yet.
>  >
> 
>  I wanted to keep the wait logic out when it wasn't a problem. Basically,
>  the problem only occurs when bit_spin_trylock is defined as an actual
>  trylock. So I put in a define there to enable the wait queues.  I didn't
>  want to waste cycles checking the wait queue in jbd_unlock_bh_state when
>  there would never be anything on it.  Heck, I figured why even have the
>  wait queue wasting memory if it wasn't needed.  So that added the
>  ifdeffery complexity.

No, that code's just a problem.  For ranking reasons it's essentially doing
this:

repeat:
cond_resched();
spin_lock(j_list_lock);

if (!bit_spin_trylock(bh)) {
spin_unlock(j_list_lock);
schedule();
goto repeat;
}

Now imagine that some other CPU holds the bit_spin_lock and is spinning,
trying to get the spin_lock().  The above code assumes that the schedule()
and cond_resched() will take "long enough" for the other CPU to get the
spinlock, do its business then release the locks.

So all the schedule() is really doing is "blow a few cycles so the other
CPU can get in and grab the spinlock".  That'll work OK on normal SMP but I
suspect that on NUMA setups with really big latencies we could end up
starving the other CPU: this CPU would keep on grabbing the lock.  It
depends on how the interconnect cache and all that goop works.

So what to do?

One approach would be to spin on the bit_spin_trylock after having dropped
j_list_lock.  That'll tell us when the other CPU has moved on.

Another approach would be to sleep on a waitqueue somewhere.  But that
means that jbd_unlock_bh_state() needs to do wakeups all the time - costly.

Another approach would be to simply whack an msleep(1) in there.  That
might be OK - it should be very rare.

Probably the first approach would be the one to use.  That's for mainline. 
I don't know what the super-duper-RT fix would be.  Why did we start
discussing this anyway?

Oh, SCHED_FIFO.  kjournald doesn't run SCHED_FIFO, but someone may decide
to make it do so.  But even then I don't see a problem for the mainline
kernel, because this CPU's SCHED_FIFO doesn't stop the other CPU from
running.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] remove lame schedule in journal inverted_lock (was: Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks)

2005-03-18 Thread Steven Rostedt

On Fri, 18 Mar 2005, Andrew Morton wrote:
> Steven Rostedt <[EMAIL PROTECTED]> wrote:
> >
> >
> > Andrew,
> >
> > Since I haven't gotten a response from you,
>
> It sometimes takes me half a day to get onto looking at patches.  And if I
> take them I usually don't reply (sorry).  But I don't drop stuff, so if you
> don't hear, please assume the patch stuck.  If others raise objections
> to the patch I'll usually duck it as well, but it's pretty obvious when that
> happens.

Sorry, I didn't mean to be pushy. I understand that you have a lot on your
plate, and I'm sure you don't drop stuff. I just wasn't sure that you
noticed that that was a patch and not just a reply on this thread, since I
didn't flag it as such in the subject. I just didn't want it to slip under
the radar.


>
> I really should knock up a script to send out an email when I add a patch
> to -mm.
>

I thought you might have had something like that already, which was
another reason I thought you might have skipped this.


> > I'd figure that you may have
> > missed this, since the subject didn't change.  So I changed the subject to
> > get your attention, and I've resent this. Here's the patch to get rid of
> > the the lame schedule that was in fs/jbd/commit.c.   Let me know if this
> > patch is appropriate.
>
> I'm rather aghast at all the ifdeffery and complexity in this one.  But I
> haven't looked at it closely yet.
>

I wanted to keep the wait logic out when it wasn't a problem. Basically,
the problem only occurs when bit_spin_trylock is defined as an actual
trylock. So I put in a define there to enable the wait queues.  I didn't
want to waste cycles checking the wait queue in jbd_unlock_bh_state when
there would never be anything on it.  Heck, I figured why even have the
wait queue wasting memory if it wasn't needed.  So that added the
ifdeffery complexity.

Thanks,

-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] remove lame schedule in journal inverted_lock (was: Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks)

2005-03-18 Thread Andrew Morton
Steven Rostedt <[EMAIL PROTECTED]> wrote:
>
> 
> Andrew,
> 
> Since I haven't gotten a response from you,

It sometimes takes me half a day to get onto looking at patches.  And if I
take them I usually don't reply (sorry).  But I don't drop stuff, so if you
don't hear, please assume the patch stuck.  If others raise objections
to the patch I'll usually duck it as well, but it's pretty obvious when that
happens.

I really should knock up a script to send out an email when I add a patch
to -mm.

> I'd figure that you may have
> missed this, since the subject didn't change.  So I changed the subject to
> get your attention, and I've resent this. Here's the patch to get rid of
> the the lame schedule that was in fs/jbd/commit.c.   Let me know if this
> patch is appropriate.

I'm rather aghast at all the ifdeffery and complexity in this one.  But I
haven't looked at it closely yet.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] remove lame schedule in journal inverted_lock (was: Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks)

2005-03-18 Thread Steven Rostedt

Andrew,

Since I haven't gotten a response from you, I'd figure that you may have
missed this, since the subject didn't change.  So I changed the subject to
get your attention, and I've resent this. Here's the patch to get rid of
the the lame schedule that was in fs/jbd/commit.c.   Let me know if this
patch is appropriate.

Thanks,

-- Steve


On Thu, 17 Mar 2005, Steven Rostedt wrote:

>
>
> On Wed, 16 Mar 2005, Andrew Morton wrote:
>
> > >  I guess one way to solve this is to add a wait queue here (before
> > >  schedule()), and have the one holding the lock to wake up all on the
> > >  waitqueue when they release it.
> >
> > yup.  A patch against mainline would be appropriate, please.
> >
>
> Hi Andrew,
>
> Here's the patch against 2.6.11.  I tested it, by adding (after making the
> patch) global spinlocks for jbd_lock_bh_state and jbd_lock_bh_journalhead.
> That way I have same scenerio as with Ingo's kernel, and I turned on
> NEED_JOURNAL_STATE_WAIT.  I'm still running that kernel so it looks like
> it works.  Making those two locks global causes this deadlock on kjournal
> much quicker, and I don't need to run on an SMP machine (since my SMP
> machines are currently being used for other tasks).
>
> Some comments on my patch.  I only implement the wait queue when
> bit_spin_trylock is an actual lock (thus creating the problem). I didn't
> want to add this code if it was needed (ie. !(CONFIG_SMP &&
> CONFIG_DEBUG_SPINLOCKS)).  So in bit_spin_trylock, I define
> NEED_JOURNAL_STATE_WAIT if bit_spin_trylock is really a lock.  When
> NEED_JOURNAL_STATE_WAIT is set, then the wait queue is set up in the
> journal code.
>
> Now the question is, should we make those two locks global? It would help
> Ingo's cause (and mine as well). But I don't know the impact on a large
> SMP configuration.  Andrew, since you have a 16xSMP machine, could you (if
> you have time) try out the effect of that. If you do have time, then I'll
> send you a patch that goes on top of this one to change the two locks into
> global spin locks.
>
> Ingo, where do you want to go from here? I guess we need to wait on what
> Andrew decides.
>
> -- Steve
>
>

diff -ur linux-2.6.11.orig/fs/jbd/commit.c linux-2.6.11/fs/jbd/commit.c
--- linux-2.6.11.orig/fs/jbd/commit.c   2005-03-02 02:38:25.0 -0500
+++ linux-2.6.11/fs/jbd/commit.c2005-03-17 03:40:06.0 -0500
@@ -80,15 +80,33 @@

 /*
  * Try to acquire jbd_lock_bh_state() against the buffer, when j_list_lock is
- * held.  For ranking reasons we must trylock.  If we lose, schedule away and
- * return 0.  j_list_lock is dropped in this case.
+ * held.  For ranking reasons we must trylock.  If we lose put ourselves on a
+ * state wait queue and we'll be woken up when it is unlocked. Then we return
+ * 0 to try this again.  j_list_lock is dropped in this case.
  */
 static int inverted_lock(journal_t *journal, struct buffer_head *bh)
 {
if (!jbd_trylock_bh_state(bh)) {
+   /*
+* jbd_trylock_bh_state always returns true unless CONFIG_SMP or
+* CONFIG_DEBUG_SPINLOCK, so the wait queue is not needed there.
+* The bit_spin_locks in jbd_lock_bh_state need to be removed 
anyway.
+*/
+#ifdef NEED_JOURNAL_STATE_WAIT
+   DECLARE_WAITQUEUE(wait, current);
spin_unlock(>j_list_lock);
-   schedule();
+   add_wait_queue_exclusive(_state_wait,);
+   set_current_state(TASK_UNINTERRUPTIBLE);
+   /* Check to see if the lock has been unlocked in this short 
time */
+   if (jbd_is_locked_bh_state(bh))
+   schedule();
+   set_current_state(TASK_RUNNING);
+   remove_wait_queue(_state_wait,);
return 0;
+#else
+   /* This should never be hit */
+   BUG();
+#endif
}
return 1;
 }
diff -ur linux-2.6.11.orig/fs/jbd/journal.c linux-2.6.11/fs/jbd/journal.c
--- linux-2.6.11.orig/fs/jbd/journal.c  2005-03-02 02:37:49.0 -0500
+++ linux-2.6.11/fs/jbd/journal.c   2005-03-17 03:47:40.0 -0500
@@ -80,6 +80,11 @@
 EXPORT_SYMBOL(journal_try_to_free_buffers);
 EXPORT_SYMBOL(journal_force_commit);

+#ifdef NEED_JOURNAL_STATE_WAIT
+EXPORT_SYMBOL(journal_state_wait);
+DECLARE_WAIT_QUEUE_HEAD(journal_state_wait);
+#endif
+
 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);

 /*
diff -ur linux-2.6.11.orig/include/linux/jbd.h linux-2.6.11/include/linux/jbd.h
--- linux-2.6.11.orig/include/linux/jbd.h   2005-03-02 02:38:19.0 
-0500
+++ linux-2.6.11/include/linux/jbd.h2005-03-17 03:48:18.0 -0500
@@ -324,6 +324,20 @@
return bh->b_private;
 }

+#ifdef NEED_JOURNAL_STATE_WAIT
+/*
+ * The journal_state_wait is a wait queue that tasks will wait on
+ * if they fail to get the jbd_lock_bh_state while holding the j_list_lock.
+ * Instead of spinning on schedule, the task now adds 

[PATCH] remove lame schedule in journal inverted_lock (was: Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks)

2005-03-18 Thread Steven Rostedt

Andrew,

Since I haven't gotten a response from you, I'd figure that you may have
missed this, since the subject didn't change.  So I changed the subject to
get your attention, and I've resent this. Here's the patch to get rid of
the the lame schedule that was in fs/jbd/commit.c.   Let me know if this
patch is appropriate.

Thanks,

-- Steve


On Thu, 17 Mar 2005, Steven Rostedt wrote:



 On Wed, 16 Mar 2005, Andrew Morton wrote:

I guess one way to solve this is to add a wait queue here (before
schedule()), and have the one holding the lock to wake up all on the
waitqueue when they release it.
 
  yup.  A patch against mainline would be appropriate, please.
 

 Hi Andrew,

 Here's the patch against 2.6.11.  I tested it, by adding (after making the
 patch) global spinlocks for jbd_lock_bh_state and jbd_lock_bh_journalhead.
 That way I have same scenerio as with Ingo's kernel, and I turned on
 NEED_JOURNAL_STATE_WAIT.  I'm still running that kernel so it looks like
 it works.  Making those two locks global causes this deadlock on kjournal
 much quicker, and I don't need to run on an SMP machine (since my SMP
 machines are currently being used for other tasks).

 Some comments on my patch.  I only implement the wait queue when
 bit_spin_trylock is an actual lock (thus creating the problem). I didn't
 want to add this code if it was needed (ie. !(CONFIG_SMP 
 CONFIG_DEBUG_SPINLOCKS)).  So in bit_spin_trylock, I define
 NEED_JOURNAL_STATE_WAIT if bit_spin_trylock is really a lock.  When
 NEED_JOURNAL_STATE_WAIT is set, then the wait queue is set up in the
 journal code.

 Now the question is, should we make those two locks global? It would help
 Ingo's cause (and mine as well). But I don't know the impact on a large
 SMP configuration.  Andrew, since you have a 16xSMP machine, could you (if
 you have time) try out the effect of that. If you do have time, then I'll
 send you a patch that goes on top of this one to change the two locks into
 global spin locks.

 Ingo, where do you want to go from here? I guess we need to wait on what
 Andrew decides.

 -- Steve



diff -ur linux-2.6.11.orig/fs/jbd/commit.c linux-2.6.11/fs/jbd/commit.c
--- linux-2.6.11.orig/fs/jbd/commit.c   2005-03-02 02:38:25.0 -0500
+++ linux-2.6.11/fs/jbd/commit.c2005-03-17 03:40:06.0 -0500
@@ -80,15 +80,33 @@

 /*
  * Try to acquire jbd_lock_bh_state() against the buffer, when j_list_lock is
- * held.  For ranking reasons we must trylock.  If we lose, schedule away and
- * return 0.  j_list_lock is dropped in this case.
+ * held.  For ranking reasons we must trylock.  If we lose put ourselves on a
+ * state wait queue and we'll be woken up when it is unlocked. Then we return
+ * 0 to try this again.  j_list_lock is dropped in this case.
  */
 static int inverted_lock(journal_t *journal, struct buffer_head *bh)
 {
if (!jbd_trylock_bh_state(bh)) {
+   /*
+* jbd_trylock_bh_state always returns true unless CONFIG_SMP or
+* CONFIG_DEBUG_SPINLOCK, so the wait queue is not needed there.
+* The bit_spin_locks in jbd_lock_bh_state need to be removed 
anyway.
+*/
+#ifdef NEED_JOURNAL_STATE_WAIT
+   DECLARE_WAITQUEUE(wait, current);
spin_unlock(journal-j_list_lock);
-   schedule();
+   add_wait_queue_exclusive(journal_state_wait,wait);
+   set_current_state(TASK_UNINTERRUPTIBLE);
+   /* Check to see if the lock has been unlocked in this short 
time */
+   if (jbd_is_locked_bh_state(bh))
+   schedule();
+   set_current_state(TASK_RUNNING);
+   remove_wait_queue(journal_state_wait,wait);
return 0;
+#else
+   /* This should never be hit */
+   BUG();
+#endif
}
return 1;
 }
diff -ur linux-2.6.11.orig/fs/jbd/journal.c linux-2.6.11/fs/jbd/journal.c
--- linux-2.6.11.orig/fs/jbd/journal.c  2005-03-02 02:37:49.0 -0500
+++ linux-2.6.11/fs/jbd/journal.c   2005-03-17 03:47:40.0 -0500
@@ -80,6 +80,11 @@
 EXPORT_SYMBOL(journal_try_to_free_buffers);
 EXPORT_SYMBOL(journal_force_commit);

+#ifdef NEED_JOURNAL_STATE_WAIT
+EXPORT_SYMBOL(journal_state_wait);
+DECLARE_WAIT_QUEUE_HEAD(journal_state_wait);
+#endif
+
 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);

 /*
diff -ur linux-2.6.11.orig/include/linux/jbd.h linux-2.6.11/include/linux/jbd.h
--- linux-2.6.11.orig/include/linux/jbd.h   2005-03-02 02:38:19.0 
-0500
+++ linux-2.6.11/include/linux/jbd.h2005-03-17 03:48:18.0 -0500
@@ -324,6 +324,20 @@
return bh-b_private;
 }

+#ifdef NEED_JOURNAL_STATE_WAIT
+/*
+ * The journal_state_wait is a wait queue that tasks will wait on
+ * if they fail to get the jbd_lock_bh_state while holding the j_list_lock.
+ * Instead of spinning on schedule, the task now adds itself to this wait queue
+ * 

Re: [PATCH] remove lame schedule in journal inverted_lock (was: Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks)

2005-03-18 Thread Andrew Morton
Steven Rostedt [EMAIL PROTECTED] wrote:

 
 Andrew,
 
 Since I haven't gotten a response from you,

It sometimes takes me half a day to get onto looking at patches.  And if I
take them I usually don't reply (sorry).  But I don't drop stuff, so if you
don't hear, please assume the patch stuck.  If others raise objections
to the patch I'll usually duck it as well, but it's pretty obvious when that
happens.

I really should knock up a script to send out an email when I add a patch
to -mm.

 I'd figure that you may have
 missed this, since the subject didn't change.  So I changed the subject to
 get your attention, and I've resent this. Here's the patch to get rid of
 the the lame schedule that was in fs/jbd/commit.c.   Let me know if this
 patch is appropriate.

I'm rather aghast at all the ifdeffery and complexity in this one.  But I
haven't looked at it closely yet.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] remove lame schedule in journal inverted_lock (was: Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks)

2005-03-18 Thread Steven Rostedt

On Fri, 18 Mar 2005, Andrew Morton wrote:
 Steven Rostedt [EMAIL PROTECTED] wrote:
 
 
  Andrew,
 
  Since I haven't gotten a response from you,

 It sometimes takes me half a day to get onto looking at patches.  And if I
 take them I usually don't reply (sorry).  But I don't drop stuff, so if you
 don't hear, please assume the patch stuck.  If others raise objections
 to the patch I'll usually duck it as well, but it's pretty obvious when that
 happens.

Sorry, I didn't mean to be pushy. I understand that you have a lot on your
plate, and I'm sure you don't drop stuff. I just wasn't sure that you
noticed that that was a patch and not just a reply on this thread, since I
didn't flag it as such in the subject. I just didn't want it to slip under
the radar.



 I really should knock up a script to send out an email when I add a patch
 to -mm.


I thought you might have had something like that already, which was
another reason I thought you might have skipped this.


  I'd figure that you may have
  missed this, since the subject didn't change.  So I changed the subject to
  get your attention, and I've resent this. Here's the patch to get rid of
  the the lame schedule that was in fs/jbd/commit.c.   Let me know if this
  patch is appropriate.

 I'm rather aghast at all the ifdeffery and complexity in this one.  But I
 haven't looked at it closely yet.


I wanted to keep the wait logic out when it wasn't a problem. Basically,
the problem only occurs when bit_spin_trylock is defined as an actual
trylock. So I put in a define there to enable the wait queues.  I didn't
want to waste cycles checking the wait queue in jbd_unlock_bh_state when
there would never be anything on it.  Heck, I figured why even have the
wait queue wasting memory if it wasn't needed.  So that added the
ifdeffery complexity.

Thanks,

-- Steve

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] remove lame schedule in journal inverted_lock (was: Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks)

2005-03-18 Thread Andrew Morton
Steven Rostedt [EMAIL PROTECTED] wrote:

 
   I really should knock up a script to send out an email when I add a patch
   to -mm.
  
 
  I thought you might have had something like that already, which was
  another reason I thought you might have skipped this.


I do now..

 
I'd figure that you may have
missed this, since the subject didn't change.  So I changed the subject 
 to
get your attention, and I've resent this. Here's the patch to get rid of
the the lame schedule that was in fs/jbd/commit.c.   Let me know if this
patch is appropriate.
  
   I'm rather aghast at all the ifdeffery and complexity in this one.  But I
   haven't looked at it closely yet.
  
 
  I wanted to keep the wait logic out when it wasn't a problem. Basically,
  the problem only occurs when bit_spin_trylock is defined as an actual
  trylock. So I put in a define there to enable the wait queues.  I didn't
  want to waste cycles checking the wait queue in jbd_unlock_bh_state when
  there would never be anything on it.  Heck, I figured why even have the
  wait queue wasting memory if it wasn't needed.  So that added the
  ifdeffery complexity.

No, that code's just a problem.  For ranking reasons it's essentially doing
this:

repeat:
cond_resched();
spin_lock(j_list_lock);

if (!bit_spin_trylock(bh)) {
spin_unlock(j_list_lock);
schedule();
goto repeat;
}

Now imagine that some other CPU holds the bit_spin_lock and is spinning,
trying to get the spin_lock().  The above code assumes that the schedule()
and cond_resched() will take long enough for the other CPU to get the
spinlock, do its business then release the locks.

So all the schedule() is really doing is blow a few cycles so the other
CPU can get in and grab the spinlock.  That'll work OK on normal SMP but I
suspect that on NUMA setups with really big latencies we could end up
starving the other CPU: this CPU would keep on grabbing the lock.  It
depends on how the interconnect cache and all that goop works.

So what to do?

One approach would be to spin on the bit_spin_trylock after having dropped
j_list_lock.  That'll tell us when the other CPU has moved on.

Another approach would be to sleep on a waitqueue somewhere.  But that
means that jbd_unlock_bh_state() needs to do wakeups all the time - costly.

Another approach would be to simply whack an msleep(1) in there.  That
might be OK - it should be very rare.

Probably the first approach would be the one to use.  That's for mainline. 
I don't know what the super-duper-RT fix would be.  Why did we start
discussing this anyway?

Oh, SCHED_FIFO.  kjournald doesn't run SCHED_FIFO, but someone may decide
to make it do so.  But even then I don't see a problem for the mainline
kernel, because this CPU's SCHED_FIFO doesn't stop the other CPU from
running.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] remove lame schedule in journal inverted_lock (was: Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks)

2005-03-18 Thread Steven Rostedt

On Fri, 18 Mar 2005, Andrew Morton wrote:
 Steven Rostedt [EMAIL PROTECTED] wrote:
 
   I wanted to keep the wait logic out when it wasn't a problem. Basically,
   the problem only occurs when bit_spin_trylock is defined as an actual
   trylock. So I put in a define there to enable the wait queues.  I didn't
   want to waste cycles checking the wait queue in jbd_unlock_bh_state when
   there would never be anything on it.  Heck, I figured why even have the
   wait queue wasting memory if it wasn't needed.  So that added the
   ifdeffery complexity.

 No, that code's just a problem.  For ranking reasons it's essentially doing
 this:

 repeat:
   cond_resched();
   spin_lock(j_list_lock);
   
   if (!bit_spin_trylock(bh)) {
   spin_unlock(j_list_lock);
   schedule();
   goto repeat;
   }


Yep, that I understand.

 Now imagine that some other CPU holds the bit_spin_lock and is spinning,
 trying to get the spin_lock().  The above code assumes that the schedule()
 and cond_resched() will take long enough for the other CPU to get the
 spinlock, do its business then release the locks.

 So all the schedule() is really doing is blow a few cycles so the other
 CPU can get in and grab the spinlock.  That'll work OK on normal SMP but I
 suspect that on NUMA setups with really big latencies we could end up
 starving the other CPU: this CPU would keep on grabbing the lock.  It
 depends on how the interconnect cache and all that goop works.

 So what to do?

 One approach would be to spin on the bit_spin_trylock after having dropped
 j_list_lock.  That'll tell us when the other CPU has moved on.


This is probably the best for mainline, since, as you mentioned, the
abover code is just bad.

 Another approach would be to sleep on a waitqueue somewhere.  But that
 means that jbd_unlock_bh_state() needs to do wakeups all the time - costly.


That's the approach that my patch made.

 Another approach would be to simply whack an msleep(1) in there.  That
 might be OK - it should be very rare.


This approach is not much better than the current implementation.

 Probably the first approach would be the one to use.  That's for mainline.
 I don't know what the super-duper-RT fix would be.  Why did we start
 discussing this anyway?

 Oh, SCHED_FIFO.  kjournald doesn't run SCHED_FIFO, but someone may decide
 to make it do so.  But even then I don't see a problem for the mainline
 kernel, because this CPU's SCHED_FIFO doesn't stop the other CPU from
 running.


So this comes down to just a problem with Ingo's PREEPMT_RT.  This means
that the latency of kjournald, even without SCHED_FIFO will be large. If
it preempts a process that has one of these bit spinlocks, (Ingo's RT
kernel takes out the preempt_disable in them), then the kjournal thread
will spin till its quota is free, causing problems for other processes.
Even a process with a higher priority than kjournal if it blocks on one of
the other locks that kjournal can have while attempting to get the bit
locks.

I know Ingo wants to get his patch eventually into the mainline without
too much drag. But this problem needs to be solved in the mainline to
accomplish this.

What do you recommend?

-- Steve

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-18 Thread Lee Revell
On Fri, 2005-03-18 at 01:58 -0500, Steven Rostedt wrote:
 
 On Thu, 17 Mar 2005, Lee Revell wrote:
 
  OK, no need to cc: me on this one any more.  It's really low priority
  IMO compared to the big latencies I am seeing with ext3 and
  data=ordered.  Unless you think there is any relation.
 
 
 IMO a deadlock is higher priority than a big latency :-)
 

Of course, if I was hitting the deadlock in normal use.

 I still belive that something to do with the locking in ext3 has to do
 with your latencies, but I'll take you off when I send something to Andrew
 or Ingo next time. Hopefully, they'll do the same.

If you suspect they are related then yes I would like to be copied.

 
 When this problem is solved on Ingo's side, maybe this will solve your
 latency problem, so I recommend that you keep trying the latest RT
 kernels.  BTW what test are you running that causes these latencies?

dbench 16

Lee

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-17 Thread Steven Rostedt


On Thu, 17 Mar 2005, Lee Revell wrote:
>
> OK, no need to cc: me on this one any more.  It's really low priority
> IMO compared to the big latencies I am seeing with ext3 and
> "data=ordered".  Unless you think there is any relation.
>

IMO a deadlock is higher priority than a big latency :-)

I still belive that something to do with the locking in ext3 has to do
with your latencies, but I'll take you off when I send something to Andrew
or Ingo next time. Hopefully, they'll do the same.

When this problem is solved on Ingo's side, maybe this will solve your
latency problem, so I recommend that you keep trying the latest RT
kernels.  BTW what test are you running that causes these latencies?

-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-17 Thread Lee Revell
On Thu, 2005-03-17 at 11:23 -0500, Steven Rostedt wrote:
> 
> On Thu, 17 Mar 2005, Lee Revell wrote:
> 
> >
> > Sorry, it's hard to follow this thread.  Just to make sure we're all on
> > the same page, what exactly is the symptom of this ext3 issue you are
> > working on?  Is it a performance regression, or a latency issue, or a
> > lockup - ?
> >
> > Whatever your problem is, I am not seeing it.
> >
> 
> The root is a lockup.  I think you can get this lockup whether or not it
> is PREEMPT_RT or PREEPMT_DESKTOP.  All you need is CONFIG_PREEMPT turned
> on. Then this is what you want to do on a UP Machine.

OK, no need to cc: me on this one any more.  It's really low priority
IMO compared to the big latencies I am seeing with ext3 and
"data=ordered".  Unless you think there is any relation.

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-17 Thread Steven Rostedt


On Thu, 17 Mar 2005, Lee Revell wrote:

>
> Sorry, it's hard to follow this thread.  Just to make sure we're all on
> the same page, what exactly is the symptom of this ext3 issue you are
> working on?  Is it a performance regression, or a latency issue, or a
> lockup - ?
>
> Whatever your problem is, I am not seeing it.
>

The root is a lockup.  I think you can get this lockup whether or not it
is PREEMPT_RT or PREEPMT_DESKTOP.  All you need is CONFIG_PREEMPT turned
on. Then this is what you want to do on a UP Machine.

Set kjournald to FIFO (any realtime priority).  And then from a non-RT
task, just do a "make clean; make" on the kernel. It may take a few
minutes but your system will lock up.  That's because kjournal will wait
on the bit_spin_lock, but will never be preempted by the one holding the
lock, because it is FIFO and the one holding the lock (the kernel compile)
is not RT. Even if it was, and the same priority as kjournal, it would
still lock, since kjournal is FIFO and will only yield to higher
priority threads.

Now this lockup has uncovered other problems with ext3.  Mainly that it
uses bit spinlocks, which in of itself is bad.  You don't want a busy wait
unless you really need it.  A normal spinlock is such a thing in vanilla
SMP systems, since a schedule would take longer than the one holding the
lock. Ingo's RT kernel, removes most of these, and makes them into
mutexes.  This may slow down the overall performance but it shortens
latencies for RT tasks, which is what RT tries to do.

Now the latest problem is also bad, since you should never just call
schedule as a "yield" to let someone else release a lock.  Since the
ranking order of the locks prevents just grabbing the lock and then
risking a deadlock, ext3 tries to get the lock, and if it fails, it
releases the other lock it has, calls schedule, then tries again.  This is
usually bad, since it would most likely be rescheduled, so basically it is
worst than a spinlock, since it actually goes through the schedule logic
again and spins!  With Ingo's RT patch, this also becomes a deadlock
the same way as bit_spin_locks can.

Hope this helps,

-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-17 Thread Lee Revell
On Thu, 2005-03-17 at 02:15 -0500, Steven Rostedt wrote:
> 
> On Wed, 16 Mar 2005, Lee Revell wrote:
> 
> > I am a bit confused, big surprise.  Does this thread still have anything
> > to do with this trace from my "Latency regressions" bug report?
> 
> Don't worry, I've been in a state of confusion for a long time now ;-)
> 
> >
> > http://www.alsa-project.org/~rlrevell/2912us
> >
> > The problem only is apparent with PREEMPT_DESKTOP and "data=ordered".
> >
> > PREEMPT_RT has always worked perfectly.
> >
> 
> I'm surprise that PREEMPT_RT does work.  I'm no longer sure that this does
> affect your latency anymore.  It probably does indirectly somehow.  I
> still think it has to do with the bitspinlocks.  But I'm not sure. Just
> let me know if you want to be taken off this thread and I'll remove you
> from my CC list.  Until then, I'll keep you on.

Sorry, it's hard to follow this thread.  Just to make sure we're all on
the same page, what exactly is the symptom of this ext3 issue you are
working on?  Is it a performance regression, or a latency issue, or a
lockup - ?

Whatever your problem is, I am not seeing it.

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-17 Thread Steven Rostedt


On Wed, 16 Mar 2005, Steven Rostedt wrote:
> [...]  There's a couple of places that
> jbd_trylock_bh_state is used in checkpoint.c, but this is the one place
> that it definitely deadlocks the system.  I believe that the
> code in checkpoint.c also has this problem.
>

I've examined the code in checkpoint.c, and I now believe that it doesn't
have this problem.  When it fails a lock, it just falls out of the while
loops.

-- Steve
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-17 Thread Steven Rostedt


On Wed, 16 Mar 2005, Andrew Morton wrote:

> >  I guess one way to solve this is to add a wait queue here (before
> >  schedule()), and have the one holding the lock to wake up all on the
> >  waitqueue when they release it.
>
> yup.  A patch against mainline would be appropriate, please.
>

Hi Andrew,

Here's the patch against 2.6.11.  I tested it, by adding (after making the
patch) global spinlocks for jbd_lock_bh_state and jbd_lock_bh_journalhead.
That way I have same scenerio as with Ingo's kernel, and I turned on
NEED_JOURNAL_STATE_WAIT.  I'm still running that kernel so it looks like
it works.  Making those two locks global causes this deadlock on kjournal
much quicker, and I don't need to run on an SMP machine (since my SMP
machines are currently being used for other tasks).

Some comments on my patch.  I only implement the wait queue when
bit_spin_trylock is an actual lock (thus creating the problem). I didn't
want to add this code if it was needed (ie. !(CONFIG_SMP &&
CONFIG_DEBUG_SPINLOCKS)).  So in bit_spin_trylock, I define
NEED_JOURNAL_STATE_WAIT if bit_spin_trylock is really a lock.  When
NEED_JOURNAL_STATE_WAIT is set, then the wait queue is set up in the
journal code.

Now the question is, should we make those two locks global? It would help
Ingo's cause (and mine as well). But I don't know the impact on a large
SMP configuration.  Andrew, since you have a 16xSMP machine, could you (if
you have time) try out the effect of that. If you do have time, then I'll
send you a patch that goes on top of this one to change the two locks into
global spin locks.

Ingo, where do you want to go from here? I guess we need to wait on what
Andrew decides.

-- Steve


diff -ur linux-2.6.11.orig/fs/jbd/commit.c linux-2.6.11/fs/jbd/commit.c
--- linux-2.6.11.orig/fs/jbd/commit.c   2005-03-02 02:38:25.0 -0500
+++ linux-2.6.11/fs/jbd/commit.c2005-03-17 03:40:06.0 -0500
@@ -80,15 +80,33 @@

 /*
  * Try to acquire jbd_lock_bh_state() against the buffer, when j_list_lock is
- * held.  For ranking reasons we must trylock.  If we lose, schedule away and
- * return 0.  j_list_lock is dropped in this case.
+ * held.  For ranking reasons we must trylock.  If we lose put ourselves on a
+ * state wait queue and we'll be woken up when it is unlocked. Then we return
+ * 0 to try this again.  j_list_lock is dropped in this case.
  */
 static int inverted_lock(journal_t *journal, struct buffer_head *bh)
 {
if (!jbd_trylock_bh_state(bh)) {
+   /*
+* jbd_trylock_bh_state always returns true unless CONFIG_SMP or
+* CONFIG_DEBUG_SPINLOCK, so the wait queue is not needed there.
+* The bit_spin_locks in jbd_lock_bh_state need to be removed 
anyway.
+*/
+#ifdef NEED_JOURNAL_STATE_WAIT
+   DECLARE_WAITQUEUE(wait, current);
spin_unlock(>j_list_lock);
-   schedule();
+   add_wait_queue_exclusive(_state_wait,);
+   set_current_state(TASK_UNINTERRUPTIBLE);
+   /* Check to see if the lock has been unlocked in this short 
time */
+   if (jbd_is_locked_bh_state(bh))
+   schedule();
+   set_current_state(TASK_RUNNING);
+   remove_wait_queue(_state_wait,);
return 0;
+#else
+   /* This should never be hit */
+   BUG();
+#endif
}
return 1;
 }
diff -ur linux-2.6.11.orig/fs/jbd/journal.c linux-2.6.11/fs/jbd/journal.c
--- linux-2.6.11.orig/fs/jbd/journal.c  2005-03-02 02:37:49.0 -0500
+++ linux-2.6.11/fs/jbd/journal.c   2005-03-17 03:47:40.0 -0500
@@ -80,6 +80,11 @@
 EXPORT_SYMBOL(journal_try_to_free_buffers);
 EXPORT_SYMBOL(journal_force_commit);

+#ifdef NEED_JOURNAL_STATE_WAIT
+EXPORT_SYMBOL(journal_state_wait);
+DECLARE_WAIT_QUEUE_HEAD(journal_state_wait);
+#endif
+
 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);

 /*
diff -ur linux-2.6.11.orig/include/linux/jbd.h linux-2.6.11/include/linux/jbd.h
--- linux-2.6.11.orig/include/linux/jbd.h   2005-03-02 02:38:19.0 
-0500
+++ linux-2.6.11/include/linux/jbd.h2005-03-17 03:48:18.0 -0500
@@ -324,6 +324,20 @@
return bh->b_private;
 }

+#ifdef NEED_JOURNAL_STATE_WAIT
+/*
+ * The journal_state_wait is a wait queue that tasks will wait on
+ * if they fail to get the jbd_lock_bh_state while holding the j_list_lock.
+ * Instead of spinning on schedule, the task now adds itself to this wait queue
+ * and will be woken up when the jbd_lock_bh_state is released.
+ *
+ * Since the bit_spin_locks are only locks under CONFIG_SMP and
+ * CONFIG_DEBUG_SPINLOCK, this wait queue is only needed in those
+ * cases.
+ */
+extern wait_queue_head_t journal_state_wait;
+#endif
+
 static inline void jbd_lock_bh_state(struct buffer_head *bh)
 {
bit_spin_lock(BH_State, >b_state);
@@ -342,6 +356,13 @@
 static inline void 

Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-17 Thread Steven Rostedt


On Wed, 16 Mar 2005, Steven Rostedt wrote:
 [...]  There's a couple of places that
 jbd_trylock_bh_state is used in checkpoint.c, but this is the one place
 that it definitely deadlocks the system.  I believe that the
 code in checkpoint.c also has this problem.


I've examined the code in checkpoint.c, and I now believe that it doesn't
have this problem.  When it fails a lock, it just falls out of the while
loops.

-- Steve
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-17 Thread Lee Revell
On Thu, 2005-03-17 at 02:15 -0500, Steven Rostedt wrote:
 
 On Wed, 16 Mar 2005, Lee Revell wrote:
 
  I am a bit confused, big surprise.  Does this thread still have anything
  to do with this trace from my Latency regressions bug report?
 
 Don't worry, I've been in a state of confusion for a long time now ;-)
 
 
  http://www.alsa-project.org/~rlrevell/2912us
 
  The problem only is apparent with PREEMPT_DESKTOP and data=ordered.
 
  PREEMPT_RT has always worked perfectly.
 
 
 I'm surprise that PREEMPT_RT does work.  I'm no longer sure that this does
 affect your latency anymore.  It probably does indirectly somehow.  I
 still think it has to do with the bitspinlocks.  But I'm not sure. Just
 let me know if you want to be taken off this thread and I'll remove you
 from my CC list.  Until then, I'll keep you on.

Sorry, it's hard to follow this thread.  Just to make sure we're all on
the same page, what exactly is the symptom of this ext3 issue you are
working on?  Is it a performance regression, or a latency issue, or a
lockup - ?

Whatever your problem is, I am not seeing it.

Lee

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-17 Thread Steven Rostedt


On Thu, 17 Mar 2005, Lee Revell wrote:


 Sorry, it's hard to follow this thread.  Just to make sure we're all on
 the same page, what exactly is the symptom of this ext3 issue you are
 working on?  Is it a performance regression, or a latency issue, or a
 lockup - ?

 Whatever your problem is, I am not seeing it.


The root is a lockup.  I think you can get this lockup whether or not it
is PREEMPT_RT or PREEPMT_DESKTOP.  All you need is CONFIG_PREEMPT turned
on. Then this is what you want to do on a UP Machine.

Set kjournald to FIFO (any realtime priority).  And then from a non-RT
task, just do a make clean; make on the kernel. It may take a few
minutes but your system will lock up.  That's because kjournal will wait
on the bit_spin_lock, but will never be preempted by the one holding the
lock, because it is FIFO and the one holding the lock (the kernel compile)
is not RT. Even if it was, and the same priority as kjournal, it would
still lock, since kjournal is FIFO and will only yield to higher
priority threads.

Now this lockup has uncovered other problems with ext3.  Mainly that it
uses bit spinlocks, which in of itself is bad.  You don't want a busy wait
unless you really need it.  A normal spinlock is such a thing in vanilla
SMP systems, since a schedule would take longer than the one holding the
lock. Ingo's RT kernel, removes most of these, and makes them into
mutexes.  This may slow down the overall performance but it shortens
latencies for RT tasks, which is what RT tries to do.

Now the latest problem is also bad, since you should never just call
schedule as a yield to let someone else release a lock.  Since the
ranking order of the locks prevents just grabbing the lock and then
risking a deadlock, ext3 tries to get the lock, and if it fails, it
releases the other lock it has, calls schedule, then tries again.  This is
usually bad, since it would most likely be rescheduled, so basically it is
worst than a spinlock, since it actually goes through the schedule logic
again and spins!  With Ingo's RT patch, this also becomes a deadlock
the same way as bit_spin_locks can.

Hope this helps,

-- Steve

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-17 Thread Lee Revell
On Thu, 2005-03-17 at 11:23 -0500, Steven Rostedt wrote:
 
 On Thu, 17 Mar 2005, Lee Revell wrote:
 
 
  Sorry, it's hard to follow this thread.  Just to make sure we're all on
  the same page, what exactly is the symptom of this ext3 issue you are
  working on?  Is it a performance regression, or a latency issue, or a
  lockup - ?
 
  Whatever your problem is, I am not seeing it.
 
 
 The root is a lockup.  I think you can get this lockup whether or not it
 is PREEMPT_RT or PREEPMT_DESKTOP.  All you need is CONFIG_PREEMPT turned
 on. Then this is what you want to do on a UP Machine.

OK, no need to cc: me on this one any more.  It's really low priority
IMO compared to the big latencies I am seeing with ext3 and
data=ordered.  Unless you think there is any relation.

Lee

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-17 Thread Steven Rostedt


On Thu, 17 Mar 2005, Lee Revell wrote:

 OK, no need to cc: me on this one any more.  It's really low priority
 IMO compared to the big latencies I am seeing with ext3 and
 data=ordered.  Unless you think there is any relation.


IMO a deadlock is higher priority than a big latency :-)

I still belive that something to do with the locking in ext3 has to do
with your latencies, but I'll take you off when I send something to Andrew
or Ingo next time. Hopefully, they'll do the same.

When this problem is solved on Ingo's side, maybe this will solve your
latency problem, so I recommend that you keep trying the latest RT
kernels.  BTW what test are you running that causes these latencies?

-- Steve

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Steven Rostedt


On Wed, 16 Mar 2005, Lee Revell wrote:

> I am a bit confused, big surprise.  Does this thread still have anything
> to do with this trace from my "Latency regressions" bug report?

Don't worry, I've been in a state of confusion for a long time now ;-)

>
> http://www.alsa-project.org/~rlrevell/2912us
>
> The problem only is apparent with PREEMPT_DESKTOP and "data=ordered".
>
> PREEMPT_RT has always worked perfectly.
>

I'm surprise that PREEMPT_RT does work.  I'm no longer sure that this does
affect your latency anymore.  It probably does indirectly somehow.  I
still think it has to do with the bitspinlocks.  But I'm not sure. Just
let me know if you want to be taken off this thread and I'll remove you
from my CC list.  Until then, I'll keep you on.

-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Andrew Morton
Steven Rostedt <[EMAIL PROTECTED]> wrote:
>
> /*
>   * Try to acquire jbd_lock_bh_state() against the buffer, when j_list_lock
>  is
>   * held.  For ranking reasons we must trylock.  If we lose, schedule away
>  and
>   * return 0.  j_list_lock is dropped in this case.
>   */
>  static int inverted_lock(journal_t *journal, struct buffer_head *bh)
>  {
>   if (!jbd_trylock_bh_state(bh)) {
>   spin_unlock(>j_list_lock);
>   schedule();
>   return 0;
>   }
>   return 1;
>  }
> 

That's very lame code, that.  The old "I don't know what the heck to do now
so I'll schedule" trick.  Sorry.

>  I guess one way to solve this is to add a wait queue here (before
>  schedule()), and have the one holding the lock to wake up all on the
>  waitqueue when they release it.

yup.  A patch against mainline would be appropriate, please.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Lee Revell
On Wed, 2005-03-16 at 12:47 -0500, Steven Rostedt wrote:
> 
> On Wed, 16 Mar 2005, Steven Rostedt wrote:
> 
> >
> > Hi Ingo,
> >
> > I just ran this with PREEMPT_RT and it works fine.
> 
> Not quite, and I will assume that some of the other patches I sent have
> this same problem.  The jbd_trylock_bh_state really scares me. It seems
> that in fs/jbd/commit.c in journal_commit_transaction we have the
> following code:

I am a bit confused, big surprise.  Does this thread still have anything
to do with this trace from my "Latency regressions" bug report?

http://www.alsa-project.org/~rlrevell/2912us

The problem only is apparent with PREEMPT_DESKTOP and "data=ordered".

PREEMPT_RT has always worked perfectly.

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Steven Rostedt


On Wed, 16 Mar 2005, Steven Rostedt wrote:

>
> Hi Ingo,
>
> I just ran this with PREEMPT_RT and it works fine.

Not quite, and I will assume that some of the other patches I sent have
this same problem.  The jbd_trylock_bh_state really scares me. It seems
that in fs/jbd/commit.c in journal_commit_transaction we have the
following code:


write_out_data:
cond_resched();
spin_lock(>j_list_lock);

while (commit_transaction->t_sync_datalist) {
struct buffer_head *bh;

jh = commit_transaction->t_sync_datalist;
commit_transaction->t_sync_datalist = jh->b_tnext;
bh = jh2bh(jh);
if (buffer_locked(bh)) {
BUFFER_TRACE(bh, "locked");
if (!inverted_lock(journal, bh))
goto write_out_data;


where invert_data simply is:


/*
 * Try to acquire jbd_lock_bh_state() against the buffer, when j_list_lock
is
 * held.  For ranking reasons we must trylock.  If we lose, schedule away
and
 * return 0.  j_list_lock is dropped in this case.
 */
static int inverted_lock(journal_t *journal, struct buffer_head *bh)
{
if (!jbd_trylock_bh_state(bh)) {
spin_unlock(>j_list_lock);
schedule();
return 0;
}
return 1;
}


So, with kjournal running as a FIFO, it may hit this (as it did with my
last test) and not get the lock. All it does is release another lock
(ranking reasons) and calls schedule and tries again.  With kjournal the
highest running process on the system (UP) it deadlocks since whoever has
the lock will never get a chance to run.  There's a couple of places that
jbd_trylock_bh_state is used in checkpoint.c, but this is the one place
that it definitely deadlocks the system.  I believe that the
code in checkpoint.c also has this problem.

I guess one way to solve this is to add a wait queue here (before
schedule()), and have the one holding the lock to wake up all on the
waitqueue when they release it.

-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Steven Rostedt


On Wed, 16 Mar 2005, Steven Rostedt wrote:
>
> Ingo, I still get the following bug because of the added BUFFER_FNS and
> DESKTOP_PREEMPT.  I haven't tried this with RT yet. I'll see if this shows
> a deadlock there.
>
>

Hi Ingo,

I just ran this with PREEMPT_RT and it works fine.  Now is this the best
solution, or adding a lock to the buffer head?  This works but I don't
have anything more than a 2X CPU to test this on.  If either you or Andrew
can try this on the 8x or 16x that would be great..

Also, I only get the BUG with PREEMPT_DESKTOP.  I really don't understand
why this happens. I sent you a test patch earlier with just adding
BUFFER_FNS(JournalHead,journalhead) in jbd.h, and under PREEMPT_DESKTOP
that causes this bug as well. No other changes, just adding the BUFFER_FNS
call causes this. I can't find any other reference to buffer_journal
(besides reiser_fs).  What do you think, and are you getting the same bug?

-- Steve


> BUG: Unable to handle kernel NULL pointer dereference at virtual address
> 
>  printing eip:
> c0214888
> *pde = 
> Oops:  [#1]
> Modules linked in: ipv6 af_packet tsdev mousedev evdev floppy psmouse
> pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm
> snd_timer snd soundcore snd_page_alloc shpchp pci_hotplug ehci_hcd
> intel_agp agpgart uhci_hcd usbcore e100 mii ide_cd cdrom unix
> CPU:0
> EIP:0060:[]Not tainted VLI
> EFLAGS: 00010286   (2.6.11-RT-V0.7.40-00)
> EIP is at vt_ioctl+0x18/0x1ab0
> eax:    ebx: 5603   ecx: 5603   edx: cec18c80
> esi: c0214870   edi: cb49e000   ebp: cb479f18   esp: cb479e48
> ds: 007b   es: 007b   ss: 0068   preempt: 
> Process XFree86 (pid: 4744, threadinfo=cb478000 task=cb403530)
> Stack: cb403680 cb478000 cb403530 c034594c cb403530 0246 cb479e7c
> c0117217
>c0345954 0006 0001   cb479ebc cefa1c04
> c13e1000
>ced6b9b8   cb479ed4 c01707f1 ced6b9b8 0007
> 
> Call Trace:
>  [] show_stack+0x7f/0xa0 (28)
>  [] show_registers+0x165/0x1d0 (56)
>  [] die+0xc8/0x150 (64)
>  [] do_page_fault+0x356/0x6c4 (216)
>  [] error_code+0x2b/0x30 (268)
>  [] tty_ioctl+0x34b/0x490 (52)
>  [] do_ioctl+0x4f/0x70 (32)
>  [] vfs_ioctl+0x62/0x1d0 (40)
>  [] sys_ioctl+0x61/0x90 (40)
>  [] syscall_call+0x7/0xb (-8124)
> Code: ff ff 8d 05 88 5d 34 c0 e8 f6 60 0a 00 e9 3a ff ff ff 90 55 89 e5 57
> 56 53 81 ec c4 00 00 00 8b 7d 08 8b 5d 10 8b 87 7c 09 00 00 <8b> 30 89 34
> 24 8b 04 b5 e0 b7 3c c0 89 45 8c e8 a4 6a 00 00 85
>
>
>
>
> Here's the patch (on Ingo's -40 kernel).
>
> diff -ur linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c 
> linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c
> --- linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c   2005-03-02 
> 02:37:49.0 -0500
> +++ linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c2005-03-16 
> 07:47:50.0 -0500
> @@ -82,6 +82,9 @@
>
>  static int journal_convert_superblock_v1(journal_t *, journal_superblock_t 
> *);
>
> +spinlock_t jbd_state_lock = SPIN_LOCK_UNLOCKED;
> +spinlock_t jbd_journal_lock = SPIN_LOCK_UNLOCKED;
> +
>  /*
>   * Helper function used to manage commit timeouts
>   */
> diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h 
> linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h
> --- linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h2005-03-02 
> 02:38:19.0 -0500
> +++ linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h 2005-03-16 
> 08:51:27.292105187 -0500
> @@ -313,6 +313,8 @@
>  BUFFER_FNS(RevokeValid, revokevalid)
>  TAS_BUFFER_FNS(RevokeValid, revokevalid)
>  BUFFER_FNS(Freed, freed)
> +BUFFER_FNS(State,state)
> +BUFFER_FNS(JournalHead,journal)
>
>  static inline struct buffer_head *jh2bh(struct journal_head *jh)
>  {
> @@ -324,34 +326,50 @@
>   return bh->b_private;
>  }
>
> +extern spinlock_t jbd_state_lock;
> +extern spinlock_t jbd_journal_lock;
> +
>  static inline void jbd_lock_bh_state(struct buffer_head *bh)
>  {
> - bit_spin_lock(BH_State, >b_state);
> + spin_lock(_state_lock);
> + BUG_ON(buffer_state(bh));
> + set_buffer_state(bh);
>  }
>
>  static inline int jbd_trylock_bh_state(struct buffer_head *bh)
>  {
> - return bit_spin_trylock(BH_State, >b_state);
> + if (spin_trylock(_state_lock)) {
> + BUG_ON(buffer_state(bh));
> + set_buffer_state(bh);
> + return 1;
> + }
> + return 0;
>  }
>
>  static inline int jbd_is_locked_bh_state(struct buffer_head *bh)
>  {
> - return bit_spin_is_locked(BH_State, >b_state);
> + return buffer_state(bh); //spin_is_locked(_state_lock);
>  }
>
>  static inline void jbd_unlock_bh_state(struct buffer_head *bh)
>  {
> - bit_spin_unlock(BH_State, >b_state);
> + BUG_ON(!buffer_state(bh));
> + clear_buffer_state(bh);
> + spin_unlock(_state_lock);
>  }
>
>  static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
>  {
> - bit_spin_lock(BH_JournalHead, >b_state);
> + 

Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Steven Rostedt


On Wed, 16 Mar 2005, Andrew Morton wrote:

>
> > But
> > when I made jbd_lock_bh_state a global lock, I believe it deadlocked on
> > me.
>
> That's a worry.
>

OK, I'm wrong here. I just tried it again and it didn't deadlock (that
must have been another lock I was dealing with).  But it does test if the
buffer head is locked or not, and asserts if it is. I'm running the
following patch with on problems so far. I still use the lock bits to
determine if the bh state is locked.

Do you and Ingo think that this would have too much contention.

Ingo, I still get the following bug because of the added BUFFER_FNS and
DESKTOP_PREEMPT.  I haven't tried this with RT yet. I'll see if this shows
a deadlock there.


BUG: Unable to handle kernel NULL pointer dereference at virtual address

 printing eip:
c0214888
*pde = 
Oops:  [#1]
Modules linked in: ipv6 af_packet tsdev mousedev evdev floppy psmouse
pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm
snd_timer snd soundcore snd_page_alloc shpchp pci_hotplug ehci_hcd
intel_agp agpgart uhci_hcd usbcore e100 mii ide_cd cdrom unix
CPU:0
EIP:0060:[]Not tainted VLI
EFLAGS: 00010286   (2.6.11-RT-V0.7.40-00)
EIP is at vt_ioctl+0x18/0x1ab0
eax:    ebx: 5603   ecx: 5603   edx: cec18c80
esi: c0214870   edi: cb49e000   ebp: cb479f18   esp: cb479e48
ds: 007b   es: 007b   ss: 0068   preempt: 
Process XFree86 (pid: 4744, threadinfo=cb478000 task=cb403530)
Stack: cb403680 cb478000 cb403530 c034594c cb403530 0246 cb479e7c
c0117217
   c0345954 0006 0001   cb479ebc cefa1c04
c13e1000
   ced6b9b8   cb479ed4 c01707f1 ced6b9b8 0007

Call Trace:
 [] show_stack+0x7f/0xa0 (28)
 [] show_registers+0x165/0x1d0 (56)
 [] die+0xc8/0x150 (64)
 [] do_page_fault+0x356/0x6c4 (216)
 [] error_code+0x2b/0x30 (268)
 [] tty_ioctl+0x34b/0x490 (52)
 [] do_ioctl+0x4f/0x70 (32)
 [] vfs_ioctl+0x62/0x1d0 (40)
 [] sys_ioctl+0x61/0x90 (40)
 [] syscall_call+0x7/0xb (-8124)
Code: ff ff 8d 05 88 5d 34 c0 e8 f6 60 0a 00 e9 3a ff ff ff 90 55 89 e5 57
56 53 81 ec c4 00 00 00 8b 7d 08 8b 5d 10 8b 87 7c 09 00 00 <8b> 30 89 34
24 8b 04 b5 e0 b7 3c c0 89 45 8c e8 a4 6a 00 00 85




Here's the patch (on Ingo's -40 kernel).

diff -ur linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c 
linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c
--- linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c 2005-03-02 
02:37:49.0 -0500
+++ linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c  2005-03-16 
07:47:50.0 -0500
@@ -82,6 +82,9 @@

 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);

+spinlock_t jbd_state_lock = SPIN_LOCK_UNLOCKED;
+spinlock_t jbd_journal_lock = SPIN_LOCK_UNLOCKED;
+
 /*
  * Helper function used to manage commit timeouts
  */
diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h 
linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h
--- linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h  2005-03-02 
02:38:19.0 -0500
+++ linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h   2005-03-16 
08:51:27.292105187 -0500
@@ -313,6 +313,8 @@
 BUFFER_FNS(RevokeValid, revokevalid)
 TAS_BUFFER_FNS(RevokeValid, revokevalid)
 BUFFER_FNS(Freed, freed)
+BUFFER_FNS(State,state)
+BUFFER_FNS(JournalHead,journal)

 static inline struct buffer_head *jh2bh(struct journal_head *jh)
 {
@@ -324,34 +326,50 @@
return bh->b_private;
 }

+extern spinlock_t jbd_state_lock;
+extern spinlock_t jbd_journal_lock;
+
 static inline void jbd_lock_bh_state(struct buffer_head *bh)
 {
-   bit_spin_lock(BH_State, >b_state);
+   spin_lock(_state_lock);
+   BUG_ON(buffer_state(bh));
+   set_buffer_state(bh);
 }

 static inline int jbd_trylock_bh_state(struct buffer_head *bh)
 {
-   return bit_spin_trylock(BH_State, >b_state);
+   if (spin_trylock(_state_lock)) {
+   BUG_ON(buffer_state(bh));
+   set_buffer_state(bh);
+   return 1;
+   }
+   return 0;
 }

 static inline int jbd_is_locked_bh_state(struct buffer_head *bh)
 {
-   return bit_spin_is_locked(BH_State, >b_state);
+   return buffer_state(bh); //spin_is_locked(_state_lock);
 }

 static inline void jbd_unlock_bh_state(struct buffer_head *bh)
 {
-   bit_spin_unlock(BH_State, >b_state);
+   BUG_ON(!buffer_state(bh));
+   clear_buffer_state(bh);
+   spin_unlock(_state_lock);
 }

 static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
 {
-   bit_spin_lock(BH_JournalHead, >b_state);
+   spin_lock(_journal_lock);
+   BUG_ON(buffer_journal(bh));
+   set_buffer_journal(bh);
 }

 static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh)
 {
-   bit_spin_unlock(BH_JournalHead, >b_state);
+   BUG_ON(!buffer_journal(bh));
+   clear_buffer_journal(bh);
+   spin_unlock(_journal_lock);
 }

 struct jbd_revoke_table_s;
-
To unsubscribe from this list: send the line "unsubscribe 

Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Andrew Morton
Steven Rostedt <[EMAIL PROTECTED]> wrote:
>
> 
> 
> On Wed, 16 Mar 2005, Andrew Morton wrote:
> 
> >
> > Those two are in the journal, actually.  You refer to jbd_lock_bh_state()
> > and jbd_lock_bh_journal_head().  I think they both need to be in the
> > buffer_head.  jbd_lock_bh_journal_head() can probably go away (just use
> > caller's jbd_lock_bh_state()).
> >
> > Or make them global, or put them in the journal.
> 
> The jbd_lock_bh_journal_head can be one global lock without a problem.

As I say, we can probably eliminate it.

> But
> when I made jbd_lock_bh_state a global lock, I believe it deadlocked on
> me.

That's a worry.

>  So this one has to go into the buffer head.  What do you mean with
> "put them in the journal", do you mean the journal_s structure?

Yes.

> Is there a
> safe way to get to that structure from the buffer head?

No convenient way, iirc.  But there's usually a fairly straightforward way
to get at the journal from within JBD code.

>  The state lock is
> used quite a bit and it gets tricky trying to figure out how to use other
> structures wrt buffer_heads at all the locations that use
> jbd_lock_bh_state.

That one should go into the buffer_head, I guess.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Steven Rostedt


On Wed, 16 Mar 2005, Andrew Morton wrote:

>
> Those two are in the journal, actually.  You refer to jbd_lock_bh_state()
> and jbd_lock_bh_journal_head().  I think they both need to be in the
> buffer_head.  jbd_lock_bh_journal_head() can probably go away (just use
> caller's jbd_lock_bh_state()).
>
> Or make them global, or put them in the journal.

The jbd_lock_bh_journal_head can be one global lock without a problem. But
when I made jbd_lock_bh_state a global lock, I believe it deadlocked on
me.  So this one has to go into the buffer head.  What do you mean with
"put them in the journal", do you mean the journal_s structure? Is there a
safe way to get to that structure from the buffer head?  The state lock is
used quite a bit and it gets tricky trying to figure out how to use other
structures wrt buffer_heads at all the locations that use
jbd_lock_bh_state.

-- Steve
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Ingo Molnar

* Andrew Morton <[EMAIL PROTECTED]> wrote:

> > How much would the +4/+8 bytes size increase in
> > buffer_head [on SMP] be frowned upon? 
> 
> It wouldn't be the end of the world.  I'm not clear on what bits of
> the rt-super-low-latency stuff is intended for mainline though?

in the long run, most of it. There are no conceptual barriers so far,
the -RT tree consists of lots of small details and the PREEMPT_RT
framework itself. We are trying to solve (and merge) the small details
first (in upstream), so that PREEMPT_RT itself becomes uncontroversial.

(and it's not really the low latency that matters mainly - more valuable
is the fact that under PREEMPT_RT high latencies are statistically much
more unlikely [you need to do some really intentional and easy to see
things to introduce high latencies], while in the current upstream
kernel, high latencies are often side-effects of pretty normal kernel
coding activities, so low latencies are always a catch-up game that can
never be truly won for sure. So yes, while a 10 usec worst-case latency
under arbitrary Linux workloads [on the right hardware] is indeed sexy,
more important is that things are much more deterministic and hence much
more trustable from a hard-RT POV.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Andrew Morton
Ingo Molnar <[EMAIL PROTECTED]> wrote:
>
> 
> * Andrew Morton <[EMAIL PROTECTED]> wrote:
> 
> > I forget how much of the 1000% came from that, but it was quite a lot.
> > 
> > Removing the BKL was the first step.  That took the context switch
> > rate under high load from ~10,000/sec up to ~300,000/sec.  Because the
> > first thing a CPU hit on entry to the fs was then a semaphore. 
> > Performance rather took a dive.
> > 
> > Of course the locks also became much finer-grained, so the contention
> > opportunities lessened.  But j_list_lock and j_state_lock have fs-wide
> > scope, so I'd expect the context switch rate to go up quite a lot
> > again.
> > 
> > The hold times are short, and a context switch hurts rather ore than a
> > quick spin.
> 
> which particular workload was this - dbench? (I can try PREEMPT_RT on an
> 8-way, such effects will show up tenfold.)
> 

Oh gee, that was back in the days when Martin was being useful.  SDET, I
think.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Andrew Morton
Ingo Molnar <[EMAIL PROTECTED]> wrote:
>
> 
> * Andrew Morton <[EMAIL PROTECTED]> wrote:
> 
> > >  > There's a little lock ranking diagram in jbd.h which tells us that
> > >  > these locks nest inside j_list_lock and j_state_lock.  So I guess
> > >  > you'll need to turn those into semaphores.
> > > 
> > >  indeed. I did this (see the three followup patches, against BK-curr),
> > >  and it builds/boots/works just fine on an ext3 box. Do we want to try
> > >  this in -mm?
> > 
> > ooh, I'd rather not.  I spent an intense three days removing all the
> > sleeping locks from ext3 (and three months debugging the result). 
> > Ended up gaining 1000% on 16-way.
> > 
> > Putting them back in will really hurt the SMP performance.
> 
> seems like turning the bitlocks into spinlocks is the best option then. 
> We'd need one lock in buffer_head (j_state_lock, renamed to something
> more sensible like b_private_lock), and one lock in journal_head
> (j_list_lock) i guess.

Those two are in the journal, actually.  You refer to jbd_lock_bh_state()
and jbd_lock_bh_journal_head().  I think they both need to be in the
buffer_head.  jbd_lock_bh_journal_head() can probably go away (just use
caller's jbd_lock_bh_state()).

Or make them global, or put them in the journal.

> How much would the +4/+8 bytes size increase in
> buffer_head [on SMP] be frowned upon? 

It wouldn't be the end of the world.  I'm not clear on what bits of the
rt-super-low-latency stuff is intended for mainline though?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Arjan van de Ven
On Wed, 2005-03-16 at 02:26 -0800, Andrew Morton wrote:
> 
> The hold times are short, and a context switch hurts rather ore than a
> quick
> spin.

so we need a spinaphore ;)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Ingo Molnar

* Andrew Morton <[EMAIL PROTECTED]> wrote:

> I forget how much of the 1000% came from that, but it was quite a lot.
> 
> Removing the BKL was the first step.  That took the context switch
> rate under high load from ~10,000/sec up to ~300,000/sec.  Because the
> first thing a CPU hit on entry to the fs was then a semaphore. 
> Performance rather took a dive.
> 
> Of course the locks also became much finer-grained, so the contention
> opportunities lessened.  But j_list_lock and j_state_lock have fs-wide
> scope, so I'd expect the context switch rate to go up quite a lot
> again.
> 
> The hold times are short, and a context switch hurts rather ore than a
> quick spin.

which particular workload was this - dbench? (I can try PREEMPT_RT on an
8-way, such effects will show up tenfold.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Andrew Morton
Ingo Molnar <[EMAIL PROTECTED]> wrote:
>
> > ooh, I'd rather not.  I spent an intense three days removing all the
>  > sleeping locks from ext3 (and three months debugging the result). 
>  > Ended up gaining 1000% on 16-way.
>  > 
>  > Putting them back in will really hurt the SMP performance.
> 
>  ah. Yeah. Sniff.
> 
>  if we gain 1000% on a 16-way then there's something really wrong about
>  semaphores (or scheduling) though. A semaphore is almost a spinlock, in
>  the uncontended case - and even under contention we really (should) just
>  spend the cycles that we'd spend spinning. There will be some
>  intermediate contention level where semaphores hurt, but 1000% sounds
>  truly excessive.

I forget how much of the 1000% came from that, but it was quite a lot.

Removing the BKL was the first step.  That took the context switch rate
under high load from ~10,000/sec up to ~300,000/sec.  Because the first
thing a CPU hit on entry to the fs was then a semaphore.  Performance rather
took a dive.

Of course the locks also became much finer-grained, so the contention
opportunities lessened.  But j_list_lock and j_state_lock have fs-wide
scope, so I'd expect the context switch rate to go up quite a lot again.

The hold times are short, and a context switch hurts rather ore than a quick
spin.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Ingo Molnar

* Steven Rostedt <[EMAIL PROTECTED]> wrote:

> > > ooh, I'd rather not.  I spent an intense three days removing all the
> > > sleeping locks from ext3 (and three months debugging the result).
> > > Ended up gaining 1000% on 16-way.
> > >
> > > Putting them back in will really hurt the SMP performance.
> >
> > ah. Yeah. Sniff.
> >
> > if we gain 1000% on a 16-way then there's something really wrong about
> > semaphores (or scheduling) though. A semaphore is almost a spinlock, in
> > the uncontended case - and even under contention we really (should) just
> > spend the cycles that we'd spend spinning. There will be some
> > intermediate contention level where semaphores hurt, but 1000% sounds
> > truly excessive.
> >
> 
> Could it possibly be that in the process of removing all the sleeping
> locks from ext3, that Andrew also removed a flaw in ext3 itself that
> is responsible for the 1000% improvement?

i think the chances for that are really remote. I think it must have
been a workload ending up scheduling itself to death, while spinlocks
force atomicity of execution and affinity.

we should be able to see the same scenario with PREEMPT_RT on a 16-way
:-)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Steven Rostedt


On Wed, 16 Mar 2005, Ingo Molnar wrote:

>
> * Andrew Morton <[EMAIL PROTECTED]> wrote:
> >
> > ooh, I'd rather not.  I spent an intense three days removing all the
> > sleeping locks from ext3 (and three months debugging the result).
> > Ended up gaining 1000% on 16-way.
> >
> > Putting them back in will really hurt the SMP performance.
>
> ah. Yeah. Sniff.
>
> if we gain 1000% on a 16-way then there's something really wrong about
> semaphores (or scheduling) though. A semaphore is almost a spinlock, in
> the uncontended case - and even under contention we really (should) just
> spend the cycles that we'd spend spinning. There will be some
> intermediate contention level where semaphores hurt, but 1000% sounds
> truly excessive.
>

Could it possibly be that in the process of removing all the sleeping
locks from ext3, that Andrew also removed a flaw in ext3 itself that is
responsible for the 1000% improvement?

-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Ingo Molnar

* Andrew Morton <[EMAIL PROTECTED]> wrote:

> >  > There's a little lock ranking diagram in jbd.h which tells us that
> >  > these locks nest inside j_list_lock and j_state_lock.  So I guess
> >  > you'll need to turn those into semaphores.
> > 
> >  indeed. I did this (see the three followup patches, against BK-curr),
> >  and it builds/boots/works just fine on an ext3 box. Do we want to try
> >  this in -mm?
> 
> ooh, I'd rather not.  I spent an intense three days removing all the
> sleeping locks from ext3 (and three months debugging the result). 
> Ended up gaining 1000% on 16-way.
> 
> Putting them back in will really hurt the SMP performance.

seems like turning the bitlocks into spinlocks is the best option then. 
We'd need one lock in buffer_head (j_state_lock, renamed to something
more sensible like b_private_lock), and one lock in journal_head
(j_list_lock) i guess. How much would the +4/+8 bytes size increase in
buffer_head [on SMP] be frowned upon? 

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Ingo Molnar

* Andrew Morton <[EMAIL PROTECTED]> wrote:

> Ingo Molnar <[EMAIL PROTECTED]> wrote:
> >
> >  > There's a little lock ranking diagram in jbd.h which tells us that
> >  > these locks nest inside j_list_lock and j_state_lock.  So I guess
> >  > you'll need to turn those into semaphores.
> > 
> >  indeed. I did this (see the three followup patches, against BK-curr),
> >  and it builds/boots/works just fine on an ext3 box. Do we want to try
> >  this in -mm?
> 
> ooh, I'd rather not.  I spent an intense three days removing all the
> sleeping locks from ext3 (and three months debugging the result). 
> Ended up gaining 1000% on 16-way.
> 
> Putting them back in will really hurt the SMP performance.

ah. Yeah. Sniff.

if we gain 1000% on a 16-way then there's something really wrong about
semaphores (or scheduling) though. A semaphore is almost a spinlock, in
the uncontended case - and even under contention we really (should) just
spend the cycles that we'd spend spinning. There will be some
intermediate contention level where semaphores hurt, but 1000% sounds
truly excessive.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Andrew Morton
Ingo Molnar <[EMAIL PROTECTED]> wrote:
>
>  > There's a little lock ranking diagram in jbd.h which tells us that
>  > these locks nest inside j_list_lock and j_state_lock.  So I guess
>  > you'll need to turn those into semaphores.
> 
>  indeed. I did this (see the three followup patches, against BK-curr),
>  and it builds/boots/works just fine on an ext3 box. Do we want to try
>  this in -mm?

ooh, I'd rather not.  I spent an intense three days removing all the
sleeping locks from ext3 (and three months debugging the result).  Ended up
gaining 1000% on 16-way.

Putting them back in will really hurt the SMP performance.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Ingo Molnar

* Andrew Morton <[EMAIL PROTECTED]> wrote:

> > > Damn! The answer was right there in front of my eyes! Here's the
> > > cleanest solution. I forgot about wait_on_bit_lock.  I've converted
> > > all the locks to use this instead. [...]
> > 
> > ah, indeed, this looks really nifty. Andrew?
> > 
> 
> There's a little lock ranking diagram in jbd.h which tells us that
> these locks nest inside j_list_lock and j_state_lock.  So I guess
> you'll need to turn those into semaphores.

indeed. I did this (see the three followup patches, against BK-curr),
and it builds/boots/works just fine on an ext3 box. Do we want to try
this in -mm?

one worry would be that while spinlocks are NOP on UP, semaphores are
not. OTOH, this could relax some of the preemptability constraints
within ext3 and could make it more hackable. These patches enabled the
removal of some of the lock-break code for example and could likely
solve some of the remaining ext3 latencies.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Ingo Molnar

* Andrew Morton [EMAIL PROTECTED] wrote:

   Damn! The answer was right there in front of my eyes! Here's the
   cleanest solution. I forgot about wait_on_bit_lock.  I've converted
   all the locks to use this instead. [...]
  
  ah, indeed, this looks really nifty. Andrew?
  
 
 There's a little lock ranking diagram in jbd.h which tells us that
 these locks nest inside j_list_lock and j_state_lock.  So I guess
 you'll need to turn those into semaphores.

indeed. I did this (see the three followup patches, against BK-curr),
and it builds/boots/works just fine on an ext3 box. Do we want to try
this in -mm?

one worry would be that while spinlocks are NOP on UP, semaphores are
not. OTOH, this could relax some of the preemptability constraints
within ext3 and could make it more hackable. These patches enabled the
removal of some of the lock-break code for example and could likely
solve some of the remaining ext3 latencies.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Andrew Morton
Ingo Molnar [EMAIL PROTECTED] wrote:

   There's a little lock ranking diagram in jbd.h which tells us that
   these locks nest inside j_list_lock and j_state_lock.  So I guess
   you'll need to turn those into semaphores.
 
  indeed. I did this (see the three followup patches, against BK-curr),
  and it builds/boots/works just fine on an ext3 box. Do we want to try
  this in -mm?

ooh, I'd rather not.  I spent an intense three days removing all the
sleeping locks from ext3 (and three months debugging the result).  Ended up
gaining 1000% on 16-way.

Putting them back in will really hurt the SMP performance.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Ingo Molnar

* Andrew Morton [EMAIL PROTECTED] wrote:

 Ingo Molnar [EMAIL PROTECTED] wrote:
 
There's a little lock ranking diagram in jbd.h which tells us that
these locks nest inside j_list_lock and j_state_lock.  So I guess
you'll need to turn those into semaphores.
  
   indeed. I did this (see the three followup patches, against BK-curr),
   and it builds/boots/works just fine on an ext3 box. Do we want to try
   this in -mm?
 
 ooh, I'd rather not.  I spent an intense three days removing all the
 sleeping locks from ext3 (and three months debugging the result). 
 Ended up gaining 1000% on 16-way.
 
 Putting them back in will really hurt the SMP performance.

ah. Yeah. Sniff.

if we gain 1000% on a 16-way then there's something really wrong about
semaphores (or scheduling) though. A semaphore is almost a spinlock, in
the uncontended case - and even under contention we really (should) just
spend the cycles that we'd spend spinning. There will be some
intermediate contention level where semaphores hurt, but 1000% sounds
truly excessive.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Ingo Molnar

* Andrew Morton [EMAIL PROTECTED] wrote:

There's a little lock ranking diagram in jbd.h which tells us that
these locks nest inside j_list_lock and j_state_lock.  So I guess
you'll need to turn those into semaphores.
  
   indeed. I did this (see the three followup patches, against BK-curr),
   and it builds/boots/works just fine on an ext3 box. Do we want to try
   this in -mm?
 
 ooh, I'd rather not.  I spent an intense three days removing all the
 sleeping locks from ext3 (and three months debugging the result). 
 Ended up gaining 1000% on 16-way.
 
 Putting them back in will really hurt the SMP performance.

seems like turning the bitlocks into spinlocks is the best option then. 
We'd need one lock in buffer_head (j_state_lock, renamed to something
more sensible like b_private_lock), and one lock in journal_head
(j_list_lock) i guess. How much would the +4/+8 bytes size increase in
buffer_head [on SMP] be frowned upon? 

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Steven Rostedt


On Wed, 16 Mar 2005, Ingo Molnar wrote:


 * Andrew Morton [EMAIL PROTECTED] wrote:
 
  ooh, I'd rather not.  I spent an intense three days removing all the
  sleeping locks from ext3 (and three months debugging the result).
  Ended up gaining 1000% on 16-way.
 
  Putting them back in will really hurt the SMP performance.

 ah. Yeah. Sniff.

 if we gain 1000% on a 16-way then there's something really wrong about
 semaphores (or scheduling) though. A semaphore is almost a spinlock, in
 the uncontended case - and even under contention we really (should) just
 spend the cycles that we'd spend spinning. There will be some
 intermediate contention level where semaphores hurt, but 1000% sounds
 truly excessive.


Could it possibly be that in the process of removing all the sleeping
locks from ext3, that Andrew also removed a flaw in ext3 itself that is
responsible for the 1000% improvement?

-- Steve

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Ingo Molnar

* Steven Rostedt [EMAIL PROTECTED] wrote:

   ooh, I'd rather not.  I spent an intense three days removing all the
   sleeping locks from ext3 (and three months debugging the result).
   Ended up gaining 1000% on 16-way.
  
   Putting them back in will really hurt the SMP performance.
 
  ah. Yeah. Sniff.
 
  if we gain 1000% on a 16-way then there's something really wrong about
  semaphores (or scheduling) though. A semaphore is almost a spinlock, in
  the uncontended case - and even under contention we really (should) just
  spend the cycles that we'd spend spinning. There will be some
  intermediate contention level where semaphores hurt, but 1000% sounds
  truly excessive.
 
 
 Could it possibly be that in the process of removing all the sleeping
 locks from ext3, that Andrew also removed a flaw in ext3 itself that
 is responsible for the 1000% improvement?

i think the chances for that are really remote. I think it must have
been a workload ending up scheduling itself to death, while spinlocks
force atomicity of execution and affinity.

we should be able to see the same scenario with PREEMPT_RT on a 16-way
:-)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Andrew Morton
Ingo Molnar [EMAIL PROTECTED] wrote:

  ooh, I'd rather not.  I spent an intense three days removing all the
   sleeping locks from ext3 (and three months debugging the result). 
   Ended up gaining 1000% on 16-way.
   
   Putting them back in will really hurt the SMP performance.
 
  ah. Yeah. Sniff.
 
  if we gain 1000% on a 16-way then there's something really wrong about
  semaphores (or scheduling) though. A semaphore is almost a spinlock, in
  the uncontended case - and even under contention we really (should) just
  spend the cycles that we'd spend spinning. There will be some
  intermediate contention level where semaphores hurt, but 1000% sounds
  truly excessive.

I forget how much of the 1000% came from that, but it was quite a lot.

Removing the BKL was the first step.  That took the context switch rate
under high load from ~10,000/sec up to ~300,000/sec.  Because the first
thing a CPU hit on entry to the fs was then a semaphore.  Performance rather
took a dive.

Of course the locks also became much finer-grained, so the contention
opportunities lessened.  But j_list_lock and j_state_lock have fs-wide
scope, so I'd expect the context switch rate to go up quite a lot again.

The hold times are short, and a context switch hurts rather ore than a quick
spin.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Ingo Molnar

* Andrew Morton [EMAIL PROTECTED] wrote:

 I forget how much of the 1000% came from that, but it was quite a lot.
 
 Removing the BKL was the first step.  That took the context switch
 rate under high load from ~10,000/sec up to ~300,000/sec.  Because the
 first thing a CPU hit on entry to the fs was then a semaphore. 
 Performance rather took a dive.
 
 Of course the locks also became much finer-grained, so the contention
 opportunities lessened.  But j_list_lock and j_state_lock have fs-wide
 scope, so I'd expect the context switch rate to go up quite a lot
 again.
 
 The hold times are short, and a context switch hurts rather ore than a
 quick spin.

which particular workload was this - dbench? (I can try PREEMPT_RT on an
8-way, such effects will show up tenfold.)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Arjan van de Ven
On Wed, 2005-03-16 at 02:26 -0800, Andrew Morton wrote:
 
 The hold times are short, and a context switch hurts rather ore than a
 quick
 spin.

so we need a spinaphore ;)


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Andrew Morton
Ingo Molnar [EMAIL PROTECTED] wrote:

 
 * Andrew Morton [EMAIL PROTECTED] wrote:
 
 There's a little lock ranking diagram in jbd.h which tells us that
 these locks nest inside j_list_lock and j_state_lock.  So I guess
 you'll need to turn those into semaphores.
   
indeed. I did this (see the three followup patches, against BK-curr),
and it builds/boots/works just fine on an ext3 box. Do we want to try
this in -mm?
  
  ooh, I'd rather not.  I spent an intense three days removing all the
  sleeping locks from ext3 (and three months debugging the result). 
  Ended up gaining 1000% on 16-way.
  
  Putting them back in will really hurt the SMP performance.
 
 seems like turning the bitlocks into spinlocks is the best option then. 
 We'd need one lock in buffer_head (j_state_lock, renamed to something
 more sensible like b_private_lock), and one lock in journal_head
 (j_list_lock) i guess.

Those two are in the journal, actually.  You refer to jbd_lock_bh_state()
and jbd_lock_bh_journal_head().  I think they both need to be in the
buffer_head.  jbd_lock_bh_journal_head() can probably go away (just use
caller's jbd_lock_bh_state()).

Or make them global, or put them in the journal.

 How much would the +4/+8 bytes size increase in
 buffer_head [on SMP] be frowned upon? 

It wouldn't be the end of the world.  I'm not clear on what bits of the
rt-super-low-latency stuff is intended for mainline though?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Andrew Morton
Ingo Molnar [EMAIL PROTECTED] wrote:

 
 * Andrew Morton [EMAIL PROTECTED] wrote:
 
  I forget how much of the 1000% came from that, but it was quite a lot.
  
  Removing the BKL was the first step.  That took the context switch
  rate under high load from ~10,000/sec up to ~300,000/sec.  Because the
  first thing a CPU hit on entry to the fs was then a semaphore. 
  Performance rather took a dive.
  
  Of course the locks also became much finer-grained, so the contention
  opportunities lessened.  But j_list_lock and j_state_lock have fs-wide
  scope, so I'd expect the context switch rate to go up quite a lot
  again.
  
  The hold times are short, and a context switch hurts rather ore than a
  quick spin.
 
 which particular workload was this - dbench? (I can try PREEMPT_RT on an
 8-way, such effects will show up tenfold.)
 

Oh gee, that was back in the days when Martin was being useful.  SDET, I
think.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Ingo Molnar

* Andrew Morton [EMAIL PROTECTED] wrote:

  How much would the +4/+8 bytes size increase in
  buffer_head [on SMP] be frowned upon? 
 
 It wouldn't be the end of the world.  I'm not clear on what bits of
 the rt-super-low-latency stuff is intended for mainline though?

in the long run, most of it. There are no conceptual barriers so far,
the -RT tree consists of lots of small details and the PREEMPT_RT
framework itself. We are trying to solve (and merge) the small details
first (in upstream), so that PREEMPT_RT itself becomes uncontroversial.

(and it's not really the low latency that matters mainly - more valuable
is the fact that under PREEMPT_RT high latencies are statistically much
more unlikely [you need to do some really intentional and easy to see
things to introduce high latencies], while in the current upstream
kernel, high latencies are often side-effects of pretty normal kernel
coding activities, so low latencies are always a catch-up game that can
never be truly won for sure. So yes, while a 10 usec worst-case latency
under arbitrary Linux workloads [on the right hardware] is indeed sexy,
more important is that things are much more deterministic and hence much
more trustable from a hard-RT POV.)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Steven Rostedt


On Wed, 16 Mar 2005, Andrew Morton wrote:


 Those two are in the journal, actually.  You refer to jbd_lock_bh_state()
 and jbd_lock_bh_journal_head().  I think they both need to be in the
 buffer_head.  jbd_lock_bh_journal_head() can probably go away (just use
 caller's jbd_lock_bh_state()).

 Or make them global, or put them in the journal.

The jbd_lock_bh_journal_head can be one global lock without a problem. But
when I made jbd_lock_bh_state a global lock, I believe it deadlocked on
me.  So this one has to go into the buffer head.  What do you mean with
put them in the journal, do you mean the journal_s structure? Is there a
safe way to get to that structure from the buffer head?  The state lock is
used quite a bit and it gets tricky trying to figure out how to use other
structures wrt buffer_heads at all the locations that use
jbd_lock_bh_state.

-- Steve
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Andrew Morton
Steven Rostedt [EMAIL PROTECTED] wrote:

 
 
 On Wed, 16 Mar 2005, Andrew Morton wrote:
 
 
  Those two are in the journal, actually.  You refer to jbd_lock_bh_state()
  and jbd_lock_bh_journal_head().  I think they both need to be in the
  buffer_head.  jbd_lock_bh_journal_head() can probably go away (just use
  caller's jbd_lock_bh_state()).
 
  Or make them global, or put them in the journal.
 
 The jbd_lock_bh_journal_head can be one global lock without a problem.

As I say, we can probably eliminate it.

 But
 when I made jbd_lock_bh_state a global lock, I believe it deadlocked on
 me.

That's a worry.

  So this one has to go into the buffer head.  What do you mean with
 put them in the journal, do you mean the journal_s structure?

Yes.

 Is there a
 safe way to get to that structure from the buffer head?

No convenient way, iirc.  But there's usually a fairly straightforward way
to get at the journal from within JBD code.

  The state lock is
 used quite a bit and it gets tricky trying to figure out how to use other
 structures wrt buffer_heads at all the locations that use
 jbd_lock_bh_state.

That one should go into the buffer_head, I guess.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Steven Rostedt


On Wed, 16 Mar 2005, Andrew Morton wrote:


  But
  when I made jbd_lock_bh_state a global lock, I believe it deadlocked on
  me.

 That's a worry.


OK, I'm wrong here. I just tried it again and it didn't deadlock (that
must have been another lock I was dealing with).  But it does test if the
buffer head is locked or not, and asserts if it is. I'm running the
following patch with on problems so far. I still use the lock bits to
determine if the bh state is locked.

Do you and Ingo think that this would have too much contention.

Ingo, I still get the following bug because of the added BUFFER_FNS and
DESKTOP_PREEMPT.  I haven't tried this with RT yet. I'll see if this shows
a deadlock there.


BUG: Unable to handle kernel NULL pointer dereference at virtual address

 printing eip:
c0214888
*pde = 
Oops:  [#1]
Modules linked in: ipv6 af_packet tsdev mousedev evdev floppy psmouse
pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm
snd_timer snd soundcore snd_page_alloc shpchp pci_hotplug ehci_hcd
intel_agp agpgart uhci_hcd usbcore e100 mii ide_cd cdrom unix
CPU:0
EIP:0060:[c0214888]Not tainted VLI
EFLAGS: 00010286   (2.6.11-RT-V0.7.40-00)
EIP is at vt_ioctl+0x18/0x1ab0
eax:    ebx: 5603   ecx: 5603   edx: cec18c80
esi: c0214870   edi: cb49e000   ebp: cb479f18   esp: cb479e48
ds: 007b   es: 007b   ss: 0068   preempt: 
Process XFree86 (pid: 4744, threadinfo=cb478000 task=cb403530)
Stack: cb403680 cb478000 cb403530 c034594c cb403530 0246 cb479e7c
c0117217
   c0345954 0006 0001   cb479ebc cefa1c04
c13e1000
   ced6b9b8   cb479ed4 c01707f1 ced6b9b8 0007

Call Trace:
 [c0103cdf] show_stack+0x7f/0xa0 (28)
 [c0103e95] show_registers+0x165/0x1d0 (56)
 [c0104088] die+0xc8/0x150 (64)
 [c0115376] do_page_fault+0x356/0x6c4 (216)
 [c0103973] error_code+0x2b/0x30 (268)
 [c020fd6b] tty_ioctl+0x34b/0x490 (52)
 [c016837f] do_ioctl+0x4f/0x70 (32)
 [c0168582] vfs_ioctl+0x62/0x1d0 (40)
 [c0168751] sys_ioctl+0x61/0x90 (40)
 [c0102ec3] syscall_call+0x7/0xb (-8124)
Code: ff ff 8d 05 88 5d 34 c0 e8 f6 60 0a 00 e9 3a ff ff ff 90 55 89 e5 57
56 53 81 ec c4 00 00 00 8b 7d 08 8b 5d 10 8b 87 7c 09 00 00 8b 30 89 34
24 8b 04 b5 e0 b7 3c c0 89 45 8c e8 a4 6a 00 00 85




Here's the patch (on Ingo's -40 kernel).

diff -ur linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c 
linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c
--- linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c 2005-03-02 
02:37:49.0 -0500
+++ linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c  2005-03-16 
07:47:50.0 -0500
@@ -82,6 +82,9 @@

 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);

+spinlock_t jbd_state_lock = SPIN_LOCK_UNLOCKED;
+spinlock_t jbd_journal_lock = SPIN_LOCK_UNLOCKED;
+
 /*
  * Helper function used to manage commit timeouts
  */
diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h 
linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h
--- linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h  2005-03-02 
02:38:19.0 -0500
+++ linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h   2005-03-16 
08:51:27.292105187 -0500
@@ -313,6 +313,8 @@
 BUFFER_FNS(RevokeValid, revokevalid)
 TAS_BUFFER_FNS(RevokeValid, revokevalid)
 BUFFER_FNS(Freed, freed)
+BUFFER_FNS(State,state)
+BUFFER_FNS(JournalHead,journal)

 static inline struct buffer_head *jh2bh(struct journal_head *jh)
 {
@@ -324,34 +326,50 @@
return bh-b_private;
 }

+extern spinlock_t jbd_state_lock;
+extern spinlock_t jbd_journal_lock;
+
 static inline void jbd_lock_bh_state(struct buffer_head *bh)
 {
-   bit_spin_lock(BH_State, bh-b_state);
+   spin_lock(jbd_state_lock);
+   BUG_ON(buffer_state(bh));
+   set_buffer_state(bh);
 }

 static inline int jbd_trylock_bh_state(struct buffer_head *bh)
 {
-   return bit_spin_trylock(BH_State, bh-b_state);
+   if (spin_trylock(jbd_state_lock)) {
+   BUG_ON(buffer_state(bh));
+   set_buffer_state(bh);
+   return 1;
+   }
+   return 0;
 }

 static inline int jbd_is_locked_bh_state(struct buffer_head *bh)
 {
-   return bit_spin_is_locked(BH_State, bh-b_state);
+   return buffer_state(bh); //spin_is_locked(jbd_state_lock);
 }

 static inline void jbd_unlock_bh_state(struct buffer_head *bh)
 {
-   bit_spin_unlock(BH_State, bh-b_state);
+   BUG_ON(!buffer_state(bh));
+   clear_buffer_state(bh);
+   spin_unlock(jbd_state_lock);
 }

 static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
 {
-   bit_spin_lock(BH_JournalHead, bh-b_state);
+   spin_lock(jbd_journal_lock);
+   BUG_ON(buffer_journal(bh));
+   set_buffer_journal(bh);
 }

 static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh)
 {
-   bit_spin_unlock(BH_JournalHead, bh-b_state);
+   BUG_ON(!buffer_journal(bh));
+   clear_buffer_journal(bh);
+   

Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Steven Rostedt


On Wed, 16 Mar 2005, Steven Rostedt wrote:

 Ingo, I still get the following bug because of the added BUFFER_FNS and
 DESKTOP_PREEMPT.  I haven't tried this with RT yet. I'll see if this shows
 a deadlock there.



Hi Ingo,

I just ran this with PREEMPT_RT and it works fine.  Now is this the best
solution, or adding a lock to the buffer head?  This works but I don't
have anything more than a 2X CPU to test this on.  If either you or Andrew
can try this on the 8x or 16x that would be great..

Also, I only get the BUG with PREEMPT_DESKTOP.  I really don't understand
why this happens. I sent you a test patch earlier with just adding
BUFFER_FNS(JournalHead,journalhead) in jbd.h, and under PREEMPT_DESKTOP
that causes this bug as well. No other changes, just adding the BUFFER_FNS
call causes this. I can't find any other reference to buffer_journal
(besides reiser_fs).  What do you think, and are you getting the same bug?

-- Steve


 BUG: Unable to handle kernel NULL pointer dereference at virtual address
 
  printing eip:
 c0214888
 *pde = 
 Oops:  [#1]
 Modules linked in: ipv6 af_packet tsdev mousedev evdev floppy psmouse
 pcspkr snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm
 snd_timer snd soundcore snd_page_alloc shpchp pci_hotplug ehci_hcd
 intel_agp agpgart uhci_hcd usbcore e100 mii ide_cd cdrom unix
 CPU:0
 EIP:0060:[c0214888]Not tainted VLI
 EFLAGS: 00010286   (2.6.11-RT-V0.7.40-00)
 EIP is at vt_ioctl+0x18/0x1ab0
 eax:    ebx: 5603   ecx: 5603   edx: cec18c80
 esi: c0214870   edi: cb49e000   ebp: cb479f18   esp: cb479e48
 ds: 007b   es: 007b   ss: 0068   preempt: 
 Process XFree86 (pid: 4744, threadinfo=cb478000 task=cb403530)
 Stack: cb403680 cb478000 cb403530 c034594c cb403530 0246 cb479e7c
 c0117217
c0345954 0006 0001   cb479ebc cefa1c04
 c13e1000
ced6b9b8   cb479ed4 c01707f1 ced6b9b8 0007
 
 Call Trace:
  [c0103cdf] show_stack+0x7f/0xa0 (28)
  [c0103e95] show_registers+0x165/0x1d0 (56)
  [c0104088] die+0xc8/0x150 (64)
  [c0115376] do_page_fault+0x356/0x6c4 (216)
  [c0103973] error_code+0x2b/0x30 (268)
  [c020fd6b] tty_ioctl+0x34b/0x490 (52)
  [c016837f] do_ioctl+0x4f/0x70 (32)
  [c0168582] vfs_ioctl+0x62/0x1d0 (40)
  [c0168751] sys_ioctl+0x61/0x90 (40)
  [c0102ec3] syscall_call+0x7/0xb (-8124)
 Code: ff ff 8d 05 88 5d 34 c0 e8 f6 60 0a 00 e9 3a ff ff ff 90 55 89 e5 57
 56 53 81 ec c4 00 00 00 8b 7d 08 8b 5d 10 8b 87 7c 09 00 00 8b 30 89 34
 24 8b 04 b5 e0 b7 3c c0 89 45 8c e8 a4 6a 00 00 85




 Here's the patch (on Ingo's -40 kernel).

 diff -ur linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c 
 linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c
 --- linux-2.6.11-final-V0.7.40-00.orig/fs/jbd/journal.c   2005-03-02 
 02:37:49.0 -0500
 +++ linux-2.6.11-final-V0.7.40-00/fs/jbd/journal.c2005-03-16 
 07:47:50.0 -0500
 @@ -82,6 +82,9 @@

  static int journal_convert_superblock_v1(journal_t *, journal_superblock_t 
 *);

 +spinlock_t jbd_state_lock = SPIN_LOCK_UNLOCKED;
 +spinlock_t jbd_journal_lock = SPIN_LOCK_UNLOCKED;
 +
  /*
   * Helper function used to manage commit timeouts
   */
 diff -ur linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h 
 linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h
 --- linux-2.6.11-final-V0.7.40-00.orig/include/linux/jbd.h2005-03-02 
 02:38:19.0 -0500
 +++ linux-2.6.11-final-V0.7.40-00/include/linux/jbd.h 2005-03-16 
 08:51:27.292105187 -0500
 @@ -313,6 +313,8 @@
  BUFFER_FNS(RevokeValid, revokevalid)
  TAS_BUFFER_FNS(RevokeValid, revokevalid)
  BUFFER_FNS(Freed, freed)
 +BUFFER_FNS(State,state)
 +BUFFER_FNS(JournalHead,journal)

  static inline struct buffer_head *jh2bh(struct journal_head *jh)
  {
 @@ -324,34 +326,50 @@
   return bh-b_private;
  }

 +extern spinlock_t jbd_state_lock;
 +extern spinlock_t jbd_journal_lock;
 +
  static inline void jbd_lock_bh_state(struct buffer_head *bh)
  {
 - bit_spin_lock(BH_State, bh-b_state);
 + spin_lock(jbd_state_lock);
 + BUG_ON(buffer_state(bh));
 + set_buffer_state(bh);
  }

  static inline int jbd_trylock_bh_state(struct buffer_head *bh)
  {
 - return bit_spin_trylock(BH_State, bh-b_state);
 + if (spin_trylock(jbd_state_lock)) {
 + BUG_ON(buffer_state(bh));
 + set_buffer_state(bh);
 + return 1;
 + }
 + return 0;
  }

  static inline int jbd_is_locked_bh_state(struct buffer_head *bh)
  {
 - return bit_spin_is_locked(BH_State, bh-b_state);
 + return buffer_state(bh); //spin_is_locked(jbd_state_lock);
  }

  static inline void jbd_unlock_bh_state(struct buffer_head *bh)
  {
 - bit_spin_unlock(BH_State, bh-b_state);
 + BUG_ON(!buffer_state(bh));
 + clear_buffer_state(bh);
 + spin_unlock(jbd_state_lock);
  }

  static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
  {
 - bit_spin_lock(BH_JournalHead, bh-b_state);
 + 

Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Steven Rostedt


On Wed, 16 Mar 2005, Steven Rostedt wrote:


 Hi Ingo,

 I just ran this with PREEMPT_RT and it works fine.

Not quite, and I will assume that some of the other patches I sent have
this same problem.  The jbd_trylock_bh_state really scares me. It seems
that in fs/jbd/commit.c in journal_commit_transaction we have the
following code:


write_out_data:
cond_resched();
spin_lock(journal-j_list_lock);

while (commit_transaction-t_sync_datalist) {
struct buffer_head *bh;

jh = commit_transaction-t_sync_datalist;
commit_transaction-t_sync_datalist = jh-b_tnext;
bh = jh2bh(jh);
if (buffer_locked(bh)) {
BUFFER_TRACE(bh, locked);
if (!inverted_lock(journal, bh))
goto write_out_data;


where invert_data simply is:


/*
 * Try to acquire jbd_lock_bh_state() against the buffer, when j_list_lock
is
 * held.  For ranking reasons we must trylock.  If we lose, schedule away
and
 * return 0.  j_list_lock is dropped in this case.
 */
static int inverted_lock(journal_t *journal, struct buffer_head *bh)
{
if (!jbd_trylock_bh_state(bh)) {
spin_unlock(journal-j_list_lock);
schedule();
return 0;
}
return 1;
}


So, with kjournal running as a FIFO, it may hit this (as it did with my
last test) and not get the lock. All it does is release another lock
(ranking reasons) and calls schedule and tries again.  With kjournal the
highest running process on the system (UP) it deadlocks since whoever has
the lock will never get a chance to run.  There's a couple of places that
jbd_trylock_bh_state is used in checkpoint.c, but this is the one place
that it definitely deadlocks the system.  I believe that the
code in checkpoint.c also has this problem.

I guess one way to solve this is to add a wait queue here (before
schedule()), and have the one holding the lock to wake up all on the
waitqueue when they release it.

-- Steve

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Lee Revell
On Wed, 2005-03-16 at 12:47 -0500, Steven Rostedt wrote:
 
 On Wed, 16 Mar 2005, Steven Rostedt wrote:
 
 
  Hi Ingo,
 
  I just ran this with PREEMPT_RT and it works fine.
 
 Not quite, and I will assume that some of the other patches I sent have
 this same problem.  The jbd_trylock_bh_state really scares me. It seems
 that in fs/jbd/commit.c in journal_commit_transaction we have the
 following code:

I am a bit confused, big surprise.  Does this thread still have anything
to do with this trace from my Latency regressions bug report?

http://www.alsa-project.org/~rlrevell/2912us

The problem only is apparent with PREEMPT_DESKTOP and data=ordered.

PREEMPT_RT has always worked perfectly.

Lee

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Andrew Morton
Steven Rostedt [EMAIL PROTECTED] wrote:

 /*
   * Try to acquire jbd_lock_bh_state() against the buffer, when j_list_lock
  is
   * held.  For ranking reasons we must trylock.  If we lose, schedule away
  and
   * return 0.  j_list_lock is dropped in this case.
   */
  static int inverted_lock(journal_t *journal, struct buffer_head *bh)
  {
   if (!jbd_trylock_bh_state(bh)) {
   spin_unlock(journal-j_list_lock);
   schedule();
   return 0;
   }
   return 1;
  }
 

That's very lame code, that.  The old I don't know what the heck to do now
so I'll schedule trick.  Sorry.

  I guess one way to solve this is to add a wait queue here (before
  schedule()), and have the one holding the lock to wake up all on the
  waitqueue when they release it.

yup.  A patch against mainline would be appropriate, please.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-16 Thread Steven Rostedt


On Wed, 16 Mar 2005, Lee Revell wrote:

 I am a bit confused, big surprise.  Does this thread still have anything
 to do with this trace from my Latency regressions bug report?

Don't worry, I've been in a state of confusion for a long time now ;-)


 http://www.alsa-project.org/~rlrevell/2912us

 The problem only is apparent with PREEMPT_DESKTOP and data=ordered.

 PREEMPT_RT has always worked perfectly.


I'm surprise that PREEMPT_RT does work.  I'm no longer sure that this does
affect your latency anymore.  It probably does indirectly somehow.  I
still think it has to do with the bitspinlocks.  But I'm not sure. Just
let me know if you want to be taken off this thread and I'll remove you
from my CC list.  Until then, I'll keep you on.

-- Steve

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/