Greetings all,
Here's an up to date version of the buffer flipper that installs
on post hackathon -current.
This diff (~beck/viagra.diff15) contains one important change from
the previous version - In the old cache, as buffers were never freed,
we would put B_INVAL buffers in the cache at the head of the clean LRU.
(B_INVAL buffers do not contain cachable data - so for example when a
remove happens and a file's link count drops to 0, all it's buffers
are marked B_INVAL).
I noticed after some work with tedu at the end of the hackathon that
we kept a lot of data in cache for removed files - it was because of
this - and moving to the head of the LRU (behaviour that has been
retained since the old static buffer cache) does not make sense with
the modern dynamic one - so this diff has changed it to free the
B_INVAL buffers right away instead of cacheing them.
I'm running this on multiple arches and on my nfs servers feeding them.
-Bob
On Mon, Jun 03, 2013 at 09:20:08AM -0600, Bob Beck wrote:
Here's a new version of the buffer flipper that fixes
a problem found by krw@. - All comments from before still apply:
You too can have a GIANT buffer cache etc. etc...
After much bug fighting in the midlayer and now uvm over the last 6
months in a number of places, I think it's about time to shop this
around again.
This will only make a difference on amd64 - if you have 4 GB or more
of RAM. What it does is allows the high (non-DMA reachable) memory to
be used for buffer cache pages. It will use your set buffer
cache percentage of both dma'able, and above dma'able pages for the
cache, migrating the oldest cache pages into high memory. pages
are flipped back into dma'able memory if they are needed for IO.
Notwithstanding that it only matters on amd64, it does change how
the world works a bit, and therefore requires testing everywhere. It
has survived multiple make build/make release test cycles now on my
machines (amd64,i386,zaurus,sparc,sparc64,hppa) (with various settings
of bufcachepercent) and is running on my NFS server
(bufcachepercent=90) without any complaints throughout that - it's
been running on my laptop for a long time now.
If you try it, and have troubles (i.e. any new regressions), please
ensure you have your machine's console accessible (check to see if you
have ddb.console=1 in /etc/sysctl.conf) and if you have problems
please try to get
trace
ps
show bcstats
show uvm
from ddb if at all possible.
Please let me know how you do with it, and most importantly what
you try it on/with.
-Bob
Index: sys/kern/kern_sysctl.c
===
RCS file: /cvs/src/sys/kern/kern_sysctl.c,v
retrieving revision 1.236
diff -u -p -r1.236 kern_sysctl.c
--- sys/kern/kern_sysctl.c 9 Jun 2013 13:10:19 - 1.236
+++ sys/kern/kern_sysctl.c 9 Jun 2013 15:27:04 -
@@ -110,6 +110,7 @@ extern struct disklist_head disklist;
extern fixpt_t ccpu;
extern long numvnodes;
extern u_int mcllivelocks;
+extern psize_t b_dmapages_total, b_highpages_total, b_dmamaxpages;
extern void nmbclust_update(void);
@@ -566,8 +567,8 @@ kern_sysctl(int *name, u_int namelen, vo
return (sysctl_cptime2(name + 1, namelen -1, oldp, oldlenp,
newp, newlen));
case KERN_CACHEPCT: {
- u_int64_t dmapages;
- int opct, pgs;
+ psize_t pgs;
+ int opct;
opct = bufcachepercent;
error = sysctl_int(oldp, oldlenp, newp, newlen,
bufcachepercent);
@@ -577,9 +578,11 @@ kern_sysctl(int *name, u_int namelen, vo
bufcachepercent = opct;
return (EINVAL);
}
- dmapages = uvm_pagecount(dma_constraint);
if (bufcachepercent != opct) {
- pgs = bufcachepercent * dmapages / 100;
+ pgs = (b_highpages_total + b_dmapages_total)
+ * bufcachepercent / 100;
+ b_dmamaxpages = b_dmapages_total * bufcachepercent
+ / 100;
bufadjust(pgs); /* adjust bufpages */
bufhighpages = bufpages; /* set high water mark */
}
Index: sys/kern/spec_vnops.c
===
RCS file: /cvs/src/sys/kern/spec_vnops.c,v
retrieving revision 1.71
diff -u -p -r1.71 spec_vnops.c
--- sys/kern/spec_vnops.c 28 Mar 2013 03:29:44 - 1.71
+++ sys/kern/spec_vnops.c 3 Jun 2013 14:51:14 -
@@ -457,7 +457,9 @@ spec_strategy(void *v)
struct vop_strategy_args *ap = v;
struct buf *bp = ap-a_bp;
int maj = major(bp-b_dev);
-
+
+ if (!ISSET(bp-b_flags, B_DMA) ISSET(bp-b_flags, B_BC))
+ panic(bogus buf %p