[Patch] Pushing the global kernel lock (aka BKL)
Hi, Several places in the kernel run holding the global kernel lock when it isn't needed. This usually occurs when where is a function which can be called via many different paths; some holding the lock, others not. Now, if a function can block (and hence drop the kernel lock) the caller cannot rely on the kernel lock for its own integerity. Such a functon _may_ be a candidate for dropping the lock (if it is held) on entry, and retaken on exit. This improves SMP scalability. A good example of this is generic_make_request(). This function can be arrived at by several different paths, some of which will be holding the lock, and some not. It (or, rather the functions it can call) may block and a caller has no control over this. generic_make_request() does not need the kernel lock itself, and any functions it calls which do require the lock should be taking it (as they cannot guarantee it is already held). This makes it a good candidate for dropping the lock early (rather than only dropping when blocking). Dropping the kernel lock, even for a short period, helps scalability. Note, if the lock is held when an interrupt arrives, the interrupt handler runs holding the lock and so do any bottom-half handlers run on the back of the interrupt. So, less time it is held, the smaller the chance of being interrupted while holding it, the better the scalability. As the current nfsd code isn't threaded, it runs holding the kernel lock. Any reduction in holding the lock helps nfs server scalability. The current macros used to drop and retake the kernel lock in schedule() cannot be used in many cases as they do not nest. The attached patch (against 2.4.1-pre10) defines two new macros for x86 (save_and_drop_kernel_lock(x) and restore_kernel_lock(x)) and a new declaration macro (DECLARE_KERNEL_LOCK_COUNTER(x)). These allow "pushing" and "poping" of the kernel lock. The patch also contains some examples of usage (although the one in nfsd/vfs.c could be done with an unlock_kernel()/lock_kernel() pair). If the idea is acceptable, I'll tidy up the patch and add the necessary macros for other archs. My questions are; a) Does this make sense? b) Is it acceptable in the kernel? c) Any better suggestions for the macro names? I'd be interested in any results from those applying this patch and running benchmarks on SMP boxes - mainly filesystem benchmarks. I admit this is not the cleanest of ideas. Mark diff -urN -X dontdiff vxfs-2.4.1-pre10/drivers/block/ll_rw_blk.c markhe-2.4.1-pre10/drivers/block/ll_rw_blk.c --- vxfs-2.4.1-pre10/drivers/block/ll_rw_blk.c Wed Jan 24 10:56:23 2001 +++ markhe-2.4.1-pre10/drivers/block/ll_rw_blk.cThu Jan 25 09:47:09 2001 @@ -907,10 +907,24 @@ { int major = MAJOR(bh->b_rdev); request_queue_t *q; + DECLARE_KERNEL_LOCK_COUNTER(lock_depth) if (!bh->b_end_io) BUG(); + /* +* The caller cannot make any assumptions as to whether this +* function (or, rather, the funcs called from here) will block +* or not. +* Also, we can be called with or without the global kernel lock. +* As the kernel lock isn't need here, and should be taken by +* any functions called from here which need it, it is safe to +* drop the lock. This helps scalability (and _really_ helps +* scalability when there is a threaded volume manager sitting +* below). +*/ + save_and_drop_kernel_lock(lock_depth); + if (blk_size[major]) { unsigned long maxsector = (blk_size[major][MINOR(bh->b_rdev)] << 1) + 1; unsigned long sector = bh->b_rsector; @@ -931,6 +945,7 @@ blk_size[major][MINOR(bh->b_rdev)]); } bh->b_end_io(bh, 0); + restore_kernel_lock(lock_depth); return; } } @@ -953,6 +968,8 @@ break; } } while (q->make_request_fn(q, rw, bh)); + + restore_kernel_lock(lock_depth); } diff -urN -X dontdiff vxfs-2.4.1-pre10/fs/nfsd/vfs.c markhe-2.4.1-pre10/fs/nfsd/vfs.c --- vxfs-2.4.1-pre10/fs/nfsd/vfs.c Fri Dec 29 22:07:23 2000 +++ markhe-2.4.1-pre10/fs/nfsd/vfs.cThu Jan 25 10:01:51 2001 @@ -30,6 +30,7 @@ #include #include #include +#include #include #define __NO_VERSION__ #include @@ -596,6 +597,7 @@ mm_segment_toldfs; int err; struct file file; + DECLARE_KERNEL_LOCK_COUNTER(lock_depth) err = nfsd_open(rqstp, fhp, S_IFREG, MAY_READ, ); if (err) @@ -618,12 +620,25 @@ file.f_ralen = ra->p_ralen; file.f_rawin = ra->p_rawin; } + + /* +* The nfs daemons run holding the global
[Patch] Pushing the global kernel lock (aka BKL)
Hi, Several places in the kernel run holding the global kernel lock when it isn't needed. This usually occurs when where is a function which can be called via many different paths; some holding the lock, others not. Now, if a function can block (and hence drop the kernel lock) the caller cannot rely on the kernel lock for its own integerity. Such a functon _may_ be a candidate for dropping the lock (if it is held) on entry, and retaken on exit. This improves SMP scalability. A good example of this is generic_make_request(). This function can be arrived at by several different paths, some of which will be holding the lock, and some not. It (or, rather the functions it can call) may block and a caller has no control over this. generic_make_request() does not need the kernel lock itself, and any functions it calls which do require the lock should be taking it (as they cannot guarantee it is already held). This makes it a good candidate for dropping the lock early (rather than only dropping when blocking). Dropping the kernel lock, even for a short period, helps scalability. Note, if the lock is held when an interrupt arrives, the interrupt handler runs holding the lock and so do any bottom-half handlers run on the back of the interrupt. So, less time it is held, the smaller the chance of being interrupted while holding it, the better the scalability. As the current nfsd code isn't threaded, it runs holding the kernel lock. Any reduction in holding the lock helps nfs server scalability. The current macros used to drop and retake the kernel lock in schedule() cannot be used in many cases as they do not nest. The attached patch (against 2.4.1-pre10) defines two new macros for x86 (save_and_drop_kernel_lock(x) and restore_kernel_lock(x)) and a new declaration macro (DECLARE_KERNEL_LOCK_COUNTER(x)). These allow "pushing" and "poping" of the kernel lock. The patch also contains some examples of usage (although the one in nfsd/vfs.c could be done with an unlock_kernel()/lock_kernel() pair). If the idea is acceptable, I'll tidy up the patch and add the necessary macros for other archs. My questions are; a) Does this make sense? b) Is it acceptable in the kernel? c) Any better suggestions for the macro names? I'd be interested in any results from those applying this patch and running benchmarks on SMP boxes - mainly filesystem benchmarks. I admit this is not the cleanest of ideas. Mark diff -urN -X dontdiff vxfs-2.4.1-pre10/drivers/block/ll_rw_blk.c markhe-2.4.1-pre10/drivers/block/ll_rw_blk.c --- vxfs-2.4.1-pre10/drivers/block/ll_rw_blk.c Wed Jan 24 10:56:23 2001 +++ markhe-2.4.1-pre10/drivers/block/ll_rw_blk.cThu Jan 25 09:47:09 2001 @@ -907,10 +907,24 @@ { int major = MAJOR(bh-b_rdev); request_queue_t *q; + DECLARE_KERNEL_LOCK_COUNTER(lock_depth) if (!bh-b_end_io) BUG(); + /* +* The caller cannot make any assumptions as to whether this +* function (or, rather, the funcs called from here) will block +* or not. +* Also, we can be called with or without the global kernel lock. +* As the kernel lock isn't need here, and should be taken by +* any functions called from here which need it, it is safe to +* drop the lock. This helps scalability (and _really_ helps +* scalability when there is a threaded volume manager sitting +* below). +*/ + save_and_drop_kernel_lock(lock_depth); + if (blk_size[major]) { unsigned long maxsector = (blk_size[major][MINOR(bh-b_rdev)] 1) + 1; unsigned long sector = bh-b_rsector; @@ -931,6 +945,7 @@ blk_size[major][MINOR(bh-b_rdev)]); } bh-b_end_io(bh, 0); + restore_kernel_lock(lock_depth); return; } } @@ -953,6 +968,8 @@ break; } } while (q-make_request_fn(q, rw, bh)); + + restore_kernel_lock(lock_depth); } diff -urN -X dontdiff vxfs-2.4.1-pre10/fs/nfsd/vfs.c markhe-2.4.1-pre10/fs/nfsd/vfs.c --- vxfs-2.4.1-pre10/fs/nfsd/vfs.c Fri Dec 29 22:07:23 2000 +++ markhe-2.4.1-pre10/fs/nfsd/vfs.cThu Jan 25 10:01:51 2001 @@ -30,6 +30,7 @@ #include linux/net.h #include linux/unistd.h #include linux/malloc.h +#include linux/smp_lock.h #include linux/in.h #define __NO_VERSION__ #include linux/module.h @@ -596,6 +597,7 @@ mm_segment_toldfs; int err; struct file file; + DECLARE_KERNEL_LOCK_COUNTER(lock_depth) err = nfsd_open(rqstp, fhp, S_IFREG, MAY_READ, file); if (err) @@ -618,12 +620,25 @@ file.f_ralen = ra-p_ralen; file.f_rawin = ra-p_rawin;