subject:"\[Patch\] Pushing the global kernel lock \(aka BKL\)"

[Patch] Pushing the global kernel lock (aka BKL)

2001-01-25 Thread Mark Hemment


Hi,

  Several places in the kernel run holding the global kernel lock when it
isn't needed.  This usually occurs when where is a function which can be
called via many different paths; some holding the lock, others not.

  Now, if a function can block (and hence drop the kernel lock) the caller
cannot rely on the kernel lock for its own integerity.  Such a functon
_may_ be a candidate for dropping the lock (if it is held) on entry, and
retaken on exit.  This improves SMP scalability.

  A good example of this is generic_make_request().  This function can be
arrived at by several different paths, some of which will be holding the
lock, and some not.  It (or, rather the functions it can call) may block
and a caller has no control over this.  generic_make_request() does not
need the kernel lock itself, and any functions it calls which do require
the lock should be taking it (as they cannot guarantee it is already
held).  This makes it a good candidate for dropping the lock early (rather
than only dropping when blocking).

  Dropping the kernel lock, even for a short period, helps
scalability.  Note, if the lock is held when an interrupt arrives, the
interrupt handler runs holding the lock and so do any bottom-half handlers
run on the back of the interrupt.  So, less time it is held, the smaller
the chance of being interrupted while holding it, the better the
scalability.
  As the current nfsd code isn't threaded, it runs holding the kernel
lock.  Any reduction in holding the lock helps nfs server scalability.

  The current macros used to drop and retake the kernel lock in
schedule() cannot be used in many cases as they do not nest.

  The attached patch (against 2.4.1-pre10) defines two new macros for x86
(save_and_drop_kernel_lock(x) and restore_kernel_lock(x)) and a new
declaration macro (DECLARE_KERNEL_LOCK_COUNTER(x)).  These allow "pushing"
and "poping" of the kernel lock.
  The patch also contains some examples of usage (although the one in
nfsd/vfs.c could be done with an unlock_kernel()/lock_kernel() pair).

  If the idea is acceptable, I'll tidy up the patch and add the necessary
macros for other archs.

  My questions are;
a) Does this make sense?
b) Is it acceptable in the kernel?
c) Any better suggestions for the macro names?

  I'd be interested in any results from those applying this patch
and running benchmarks on SMP boxes - mainly filesystem benchmarks.

  I admit this is not the cleanest of ideas.

Mark




diff -urN -X dontdiff vxfs-2.4.1-pre10/drivers/block/ll_rw_blk.c 
markhe-2.4.1-pre10/drivers/block/ll_rw_blk.c
--- vxfs-2.4.1-pre10/drivers/block/ll_rw_blk.c  Wed Jan 24 10:56:23 2001
+++ markhe-2.4.1-pre10/drivers/block/ll_rw_blk.cThu Jan 25 09:47:09 2001
@@ -907,10 +907,24 @@
 {
int major = MAJOR(bh->b_rdev);
request_queue_t *q;
+   DECLARE_KERNEL_LOCK_COUNTER(lock_depth)
 
if (!bh->b_end_io)
BUG();
 
+   /*
+* The caller cannot make any assumptions as to whether this
+* function (or, rather, the funcs called from here) will block
+* or not.
+* Also, we can be called with or without the global kernel lock.
+* As the kernel lock isn't need here, and should be taken by
+* any functions called from here which need it, it is safe to
+* drop the lock.  This helps scalability (and _really_ helps
+* scalability when there is a threaded volume manager sitting
+* below).
+*/
+   save_and_drop_kernel_lock(lock_depth);
+
if (blk_size[major]) {
unsigned long maxsector = (blk_size[major][MINOR(bh->b_rdev)] << 1) + 
1;
unsigned long sector = bh->b_rsector;
@@ -931,6 +945,7 @@
   blk_size[major][MINOR(bh->b_rdev)]);
}
bh->b_end_io(bh, 0);
+   restore_kernel_lock(lock_depth);
return;
}
}
@@ -953,6 +968,8 @@
break;
}
} while (q->make_request_fn(q, rw, bh));
+
+   restore_kernel_lock(lock_depth);
 }
 
 
diff -urN -X dontdiff vxfs-2.4.1-pre10/fs/nfsd/vfs.c markhe-2.4.1-pre10/fs/nfsd/vfs.c
--- vxfs-2.4.1-pre10/fs/nfsd/vfs.c  Fri Dec 29 22:07:23 2000
+++ markhe-2.4.1-pre10/fs/nfsd/vfs.cThu Jan 25 10:01:51 2001
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #define __NO_VERSION__
 #include 
@@ -596,6 +597,7 @@
mm_segment_toldfs;
int err;
struct file file;
+   DECLARE_KERNEL_LOCK_COUNTER(lock_depth)
 
err = nfsd_open(rqstp, fhp, S_IFREG, MAY_READ, );
if (err)
@@ -618,12 +620,25 @@
file.f_ralen = ra->p_ralen;
file.f_rawin = ra->p_rawin;
}
+
+   /*
+* The nfs daemons run holding the global

[Patch] Pushing the global kernel lock (aka BKL)

2001-01-25 Thread Mark Hemment


Hi,

  Several places in the kernel run holding the global kernel lock when it
isn't needed.  This usually occurs when where is a function which can be
called via many different paths; some holding the lock, others not.

  Now, if a function can block (and hence drop the kernel lock) the caller
cannot rely on the kernel lock for its own integerity.  Such a functon
_may_ be a candidate for dropping the lock (if it is held) on entry, and
retaken on exit.  This improves SMP scalability.

  A good example of this is generic_make_request().  This function can be
arrived at by several different paths, some of which will be holding the
lock, and some not.  It (or, rather the functions it can call) may block
and a caller has no control over this.  generic_make_request() does not
need the kernel lock itself, and any functions it calls which do require
the lock should be taking it (as they cannot guarantee it is already
held).  This makes it a good candidate for dropping the lock early (rather
than only dropping when blocking).

  Dropping the kernel lock, even for a short period, helps
scalability.  Note, if the lock is held when an interrupt arrives, the
interrupt handler runs holding the lock and so do any bottom-half handlers
run on the back of the interrupt.  So, less time it is held, the smaller
the chance of being interrupted while holding it, the better the
scalability.
  As the current nfsd code isn't threaded, it runs holding the kernel
lock.  Any reduction in holding the lock helps nfs server scalability.

  The current macros used to drop and retake the kernel lock in
schedule() cannot be used in many cases as they do not nest.

  The attached patch (against 2.4.1-pre10) defines two new macros for x86
(save_and_drop_kernel_lock(x) and restore_kernel_lock(x)) and a new
declaration macro (DECLARE_KERNEL_LOCK_COUNTER(x)).  These allow "pushing"
and "poping" of the kernel lock.
  The patch also contains some examples of usage (although the one in
nfsd/vfs.c could be done with an unlock_kernel()/lock_kernel() pair).

  If the idea is acceptable, I'll tidy up the patch and add the necessary
macros for other archs.

  My questions are;
a) Does this make sense?
b) Is it acceptable in the kernel?
c) Any better suggestions for the macro names?

  I'd be interested in any results from those applying this patch
and running benchmarks on SMP boxes - mainly filesystem benchmarks.

  I admit this is not the cleanest of ideas.

Mark




diff -urN -X dontdiff vxfs-2.4.1-pre10/drivers/block/ll_rw_blk.c 
markhe-2.4.1-pre10/drivers/block/ll_rw_blk.c
--- vxfs-2.4.1-pre10/drivers/block/ll_rw_blk.c  Wed Jan 24 10:56:23 2001
+++ markhe-2.4.1-pre10/drivers/block/ll_rw_blk.cThu Jan 25 09:47:09 2001
@@ -907,10 +907,24 @@
 {
int major = MAJOR(bh-b_rdev);
request_queue_t *q;
+   DECLARE_KERNEL_LOCK_COUNTER(lock_depth)
 
if (!bh-b_end_io)
BUG();
 
+   /*
+* The caller cannot make any assumptions as to whether this
+* function (or, rather, the funcs called from here) will block
+* or not.
+* Also, we can be called with or without the global kernel lock.
+* As the kernel lock isn't need here, and should be taken by
+* any functions called from here which need it, it is safe to
+* drop the lock.  This helps scalability (and _really_ helps
+* scalability when there is a threaded volume manager sitting
+* below).
+*/
+   save_and_drop_kernel_lock(lock_depth);
+
if (blk_size[major]) {
unsigned long maxsector = (blk_size[major][MINOR(bh-b_rdev)]  1) + 
1;
unsigned long sector = bh-b_rsector;
@@ -931,6 +945,7 @@
   blk_size[major][MINOR(bh-b_rdev)]);
}
bh-b_end_io(bh, 0);
+   restore_kernel_lock(lock_depth);
return;
}
}
@@ -953,6 +968,8 @@
break;
}
} while (q-make_request_fn(q, rw, bh));
+
+   restore_kernel_lock(lock_depth);
 }
 
 
diff -urN -X dontdiff vxfs-2.4.1-pre10/fs/nfsd/vfs.c markhe-2.4.1-pre10/fs/nfsd/vfs.c
--- vxfs-2.4.1-pre10/fs/nfsd/vfs.c  Fri Dec 29 22:07:23 2000
+++ markhe-2.4.1-pre10/fs/nfsd/vfs.cThu Jan 25 10:01:51 2001
@@ -30,6 +30,7 @@
 #include linux/net.h
 #include linux/unistd.h
 #include linux/malloc.h
+#include linux/smp_lock.h
 #include linux/in.h
 #define __NO_VERSION__
 #include linux/module.h
@@ -596,6 +597,7 @@
mm_segment_toldfs;
int err;
struct file file;
+   DECLARE_KERNEL_LOCK_COUNTER(lock_depth)
 
err = nfsd_open(rqstp, fhp, S_IFREG, MAY_READ, file);
if (err)
@@ -618,12 +620,25 @@
file.f_ralen = ra-p_ralen;
file.f_rawin = ra-p_rawin;

[Patch] Pushing the global kernel lock (aka BKL)

[Patch] Pushing the global kernel lock (aka BKL)

2 matches

Site Navigation

Mail list logo

Footer information