Re: Still More Secrets of Buffer Cache Enlargement.
On Sun, Jun 09, 2013 at 12:37:26PM -0600, Bob Beck wrote: Greetings all, Here's an up to date version of the buffer flipper that installs on post hackathon -current. This diff (~beck/viagra.diff15) contains one important change from the previous version - In the old cache, as buffers were never freed, we would put B_INVAL buffers in the cache at the head of the clean LRU. (B_INVAL buffers do not contain cachable data - so for example when a remove happens and a file's link count drops to 0, all it's buffers are marked B_INVAL). I noticed after some work with tedu at the end of the hackathon that we kept a lot of data in cache for removed files - it was because of this - and moving to the head of the LRU (behaviour that has been retained since the old static buffer cache) does not make sense with the modern dynamic one - so this diff has changed it to free the B_INVAL buffers right away instead of cacheing them. I'm running this on multiple arches and on my nfs servers feeding them. -Bob No issues so far! At 101% of last port (chromium) on bufferflipper crashing laptop. Ken
Re: Still More Secrets of Buffer Cache Enlargement.
No issues so far! At 101% of last port (chromium) on bufferflipper crashing laptop. Such a nasty name for a laptop that just happened to run a version of my diff with a bug :)
Still More Secrets of Buffer Cache Enlargement.
Greetings all, Here's an up to date version of the buffer flipper that installs on post hackathon -current. This diff (~beck/viagra.diff15) contains one important change from the previous version - In the old cache, as buffers were never freed, we would put B_INVAL buffers in the cache at the head of the clean LRU. (B_INVAL buffers do not contain cachable data - so for example when a remove happens and a file's link count drops to 0, all it's buffers are marked B_INVAL). I noticed after some work with tedu at the end of the hackathon that we kept a lot of data in cache for removed files - it was because of this - and moving to the head of the LRU (behaviour that has been retained since the old static buffer cache) does not make sense with the modern dynamic one - so this diff has changed it to free the B_INVAL buffers right away instead of cacheing them. I'm running this on multiple arches and on my nfs servers feeding them. -Bob On Mon, Jun 03, 2013 at 09:20:08AM -0600, Bob Beck wrote: Here's a new version of the buffer flipper that fixes a problem found by krw@. - All comments from before still apply: You too can have a GIANT buffer cache etc. etc... After much bug fighting in the midlayer and now uvm over the last 6 months in a number of places, I think it's about time to shop this around again. This will only make a difference on amd64 - if you have 4 GB or more of RAM. What it does is allows the high (non-DMA reachable) memory to be used for buffer cache pages. It will use your set buffer cache percentage of both dma'able, and above dma'able pages for the cache, migrating the oldest cache pages into high memory. pages are flipped back into dma'able memory if they are needed for IO. Notwithstanding that it only matters on amd64, it does change how the world works a bit, and therefore requires testing everywhere. It has survived multiple make build/make release test cycles now on my machines (amd64,i386,zaurus,sparc,sparc64,hppa) (with various settings of bufcachepercent) and is running on my NFS server (bufcachepercent=90) without any complaints throughout that - it's been running on my laptop for a long time now. If you try it, and have troubles (i.e. any new regressions), please ensure you have your machine's console accessible (check to see if you have ddb.console=1 in /etc/sysctl.conf) and if you have problems please try to get trace ps show bcstats show uvm from ddb if at all possible. Please let me know how you do with it, and most importantly what you try it on/with. -Bob Index: sys/kern/kern_sysctl.c === RCS file: /cvs/src/sys/kern/kern_sysctl.c,v retrieving revision 1.236 diff -u -p -r1.236 kern_sysctl.c --- sys/kern/kern_sysctl.c 9 Jun 2013 13:10:19 - 1.236 +++ sys/kern/kern_sysctl.c 9 Jun 2013 15:27:04 - @@ -110,6 +110,7 @@ extern struct disklist_head disklist; extern fixpt_t ccpu; extern long numvnodes; extern u_int mcllivelocks; +extern psize_t b_dmapages_total, b_highpages_total, b_dmamaxpages; extern void nmbclust_update(void); @@ -566,8 +567,8 @@ kern_sysctl(int *name, u_int namelen, vo return (sysctl_cptime2(name + 1, namelen -1, oldp, oldlenp, newp, newlen)); case KERN_CACHEPCT: { - u_int64_t dmapages; - int opct, pgs; + psize_t pgs; + int opct; opct = bufcachepercent; error = sysctl_int(oldp, oldlenp, newp, newlen, bufcachepercent); @@ -577,9 +578,11 @@ kern_sysctl(int *name, u_int namelen, vo bufcachepercent = opct; return (EINVAL); } - dmapages = uvm_pagecount(dma_constraint); if (bufcachepercent != opct) { - pgs = bufcachepercent * dmapages / 100; + pgs = (b_highpages_total + b_dmapages_total) + * bufcachepercent / 100; + b_dmamaxpages = b_dmapages_total * bufcachepercent + / 100; bufadjust(pgs); /* adjust bufpages */ bufhighpages = bufpages; /* set high water mark */ } Index: sys/kern/spec_vnops.c === RCS file: /cvs/src/sys/kern/spec_vnops.c,v retrieving revision 1.71 diff -u -p -r1.71 spec_vnops.c --- sys/kern/spec_vnops.c 28 Mar 2013 03:29:44 - 1.71 +++ sys/kern/spec_vnops.c 3 Jun 2013 14:51:14 - @@ -457,7 +457,9 @@ spec_strategy(void *v) struct vop_strategy_args *ap = v; struct buf *bp = ap-a_bp; int maj = major(bp-b_dev); - + + if (!ISSET(bp-b_flags, B_DMA) ISSET(bp-b_flags, B_BC)) + panic(bogus buf %p
More Secrets of Buffer Cache Enlargement.
Here's a new version of the buffer flipper that fixes a problem found by krw@. - All comments from before still apply: You too can have a GIANT buffer cache etc. etc... After much bug fighting in the midlayer and now uvm over the last 6 months in a number of places, I think it's about time to shop this around again. This will only make a difference on amd64 - if you have 4 GB or more of RAM. What it does is allows the high (non-DMA reachable) memory to be used for buffer cache pages. It will use your set buffer cache percentage of both dma'able, and above dma'able pages for the cache, migrating the oldest cache pages into high memory. pages are flipped back into dma'able memory if they are needed for IO. Notwithstanding that it only matters on amd64, it does change how the world works a bit, and therefore requires testing everywhere. It has survived multiple make build/make release test cycles now on my machines (amd64,i386,zaurus,sparc,sparc64,hppa) (with various settings of bufcachepercent) and is running on my NFS server (bufcachepercent=90) without any complaints throughout that - it's been running on my laptop for a long time now. If you try it, and have troubles (i.e. any new regressions), please ensure you have your machine's console accessible (check to see if you have ddb.console=1 in /etc/sysctl.conf) and if you have problems please try to get trace ps show bcstats show uvm from ddb if at all possible. Please let me know how you do with it, and most importantly what you try it on/with. -Bob (diff also in ~beck/viagra.diff14 on cvs) Index: sys/kern/kern_sysctl.c === RCS file: /cvs/src/sys/kern/kern_sysctl.c,v retrieving revision 1.234 diff -u -p -r1.234 kern_sysctl.c --- sys/kern/kern_sysctl.c 6 Apr 2013 03:44:34 - 1.234 +++ sys/kern/kern_sysctl.c 3 Jun 2013 14:51:14 - @@ -110,6 +110,7 @@ extern struct disklist_head disklist; extern fixpt_t ccpu; extern long numvnodes; extern u_int mcllivelocks; +extern psize_t b_dmapages_total, b_highpages_total, b_dmamaxpages; extern void nmbclust_update(void); @@ -564,8 +565,8 @@ kern_sysctl(int *name, u_int namelen, vo return (sysctl_cptime2(name + 1, namelen -1, oldp, oldlenp, newp, newlen)); case KERN_CACHEPCT: { - u_int64_t dmapages; - int opct, pgs; + psize_t pgs; + int opct; opct = bufcachepercent; error = sysctl_int(oldp, oldlenp, newp, newlen, bufcachepercent); @@ -575,9 +576,11 @@ kern_sysctl(int *name, u_int namelen, vo bufcachepercent = opct; return (EINVAL); } - dmapages = uvm_pagecount(dma_constraint); if (bufcachepercent != opct) { - pgs = bufcachepercent * dmapages / 100; + pgs = (b_highpages_total + b_dmapages_total) + * bufcachepercent / 100; + b_dmamaxpages = b_dmapages_total * bufcachepercent + / 100; bufadjust(pgs); /* adjust bufpages */ bufhighpages = bufpages; /* set high water mark */ } Index: sys/kern/spec_vnops.c === RCS file: /cvs/src/sys/kern/spec_vnops.c,v retrieving revision 1.71 diff -u -p -r1.71 spec_vnops.c --- sys/kern/spec_vnops.c 28 Mar 2013 03:29:44 - 1.71 +++ sys/kern/spec_vnops.c 3 Jun 2013 14:51:14 - @@ -457,7 +457,9 @@ spec_strategy(void *v) struct vop_strategy_args *ap = v; struct buf *bp = ap-a_bp; int maj = major(bp-b_dev); - + + if (!ISSET(bp-b_flags, B_DMA) ISSET(bp-b_flags, B_BC)) + panic(bogus buf %p passed to spec_strategy, bp); if (LIST_FIRST(bp-b_dep) != NULL) buf_start(bp); Index: sys/kern/vfs_bio.c === RCS file: /cvs/src/sys/kern/vfs_bio.c,v retrieving revision 1.146 diff -u -p -r1.146 vfs_bio.c --- sys/kern/vfs_bio.c 17 Feb 2013 17:39:29 - 1.146 +++ sys/kern/vfs_bio.c 3 Jun 2013 14:59:18 - @@ -63,12 +63,17 @@ /* * Definitions for the buffer free lists. */ -#defineBQUEUES 2 /* number of free buffer queues */ +#defineBQUEUES 3 /* number of free buffer queues */ #defineBQ_DIRTY0 /* LRU queue with dirty buffers */ -#defineBQ_CLEAN1 /* LRU queue with clean buffers */ +#defineBQ_CLEANL 1 /* LRU queue with clean low buffers */ +#defineBQ_CLEANH 2 /* LRU queue with clean high buffers */ TAILQ_HEAD(bqueues, buf)