Reclaim some wasted pages on amd64

Matthew Dempsky Thu, 03 Jul 2014 17:21:23 -0700

I spent yesterday trying to really grok early pmap initialization on
amd64, and I noticed what I believe to be wasted physical pages of
memory:


1. In locore.S, we setup both an "identity" and an "actual" mapping
for the kernel, to help us bootstrap to executing in high memory.
Also, we reserve pages as though these mappings are independent; but
actually the mappings cleverly (accidentally??) reuse the same L2 and
L3 tables, so we can safely reserve just one set of pages.  (Just the
same, I've added checks to make sure this is safe, and also to sanity
check that we don't overflow the page tables.)

2. In pmap_bootstrap(), we setup the rest of the direct physical
memory mapping.  locore.S has already mapped the first 4GB of memory,
and we don't remove or replace those mappings; so we can save another
couple pages by only allocating page directories for the rest of
physical memory.


In total, this saves a whopping 5 pages (20kB) of physical memory on
my amd64 laptop, as measured by dmesg's "avail mem".

So not terribly important, but I have a patch to enable >512GB of
physical address space that somewhat builds on top of this, and I
think it will be easier to review if these changes are already in
place.

This could probably use some testing under memory-intensive workloads,
especially on systems with >4GB of RAM.


Index: amd64/locore.S
===================================================================
RCS file: /home/matthew/anoncvs/cvs/src/sys/arch/amd64/amd64/locore.S,v
retrieving revision 1.54
diff -u -p -r1.54 locore.S
--- amd64/locore.S      10 Nov 2012 09:45:05 -0000      1.54
+++ amd64/locore.S      3 Jul 2014 23:33:25 -0000
@@ -358,29 +358,48 @@ bi_size_ok:
  *                           0          1       2      3
  */
 
-#if L2_SLOT_KERNBASE > 0
-#define TABLE_L2_ENTRIES (2 * (NKL2_KIMG_ENTRIES + 1))
-#else
-#define TABLE_L2_ENTRIES (NKL2_KIMG_ENTRIES + 1)
-#endif
+/*
+ * We now want to enable paging, move the kernel to high virtual memory, and
+ * jump there; but unfortunately, we can't do that atomically.  Instead, we
+ * setup a page table that maps the kernel pages to their eventual virtual
+ * address ranges, but also includes an "identity map" that keeps those pages
+ * available via their physical addresses even once paging is enabled.  Later,
+ * once we're executing out of the the eventual VA range, we remove the
+ * identity map.
+ *
+ * As an additional trick to conserve a few pages, the actual map and the
+ * identity map share L2 and L3 tables.  This works as long as they don't
+ * require inexactly overlapping table entries, which we check for below.
+ * (It also means the kernel may end up mapped multiple times in virtual
+ * memory, but only until we tear down the identity map.)
+ */
+/* XXX(matthew): Why do we always add 1 to NKL2_KIMG_ENTRIES? */
 
-#if L3_SLOT_KERNBASE > 0
-#define TABLE_L3_ENTRIES (2 * NKL3_KIMG_ENTRIES)
-#else
-#define TABLE_L3_ENTRIES NKL3_KIMG_ENTRIES
+#if L2_SLOT_KERNBASE > 0 && L2_SLOT_KERNBASE < NKL2_KIMG_ENTRIES + 1
+#error L2 table collision between identity and actual map entries
+#endif
+#if L3_SLOT_KERNBASE > 0 && L3_SLOT_KERNBASE < NKL3_KIMG_ENTRIES
+#error L3 table collision between identity and actual map entries
 #endif
 
+/* Sanity check. */
+#if L2_SLOT_KERNBASE + NKL2_KIMG_ENTRIES + 1 > NPDPG
+#error L2 table overflow
+#endif
+#if L3_SLOT_KERNBASE + NKL3_KIMG_ENTRIES > NPDPG
+#error L3 table overflow
+#endif
 
 #define PROC0_PML4_OFF 0
 #define PROC0_STK_OFF  (PROC0_PML4_OFF + NBPG)
 #define PROC0_PTP3_OFF (PROC0_STK_OFF + UPAGES * NBPG)
 #define PROC0_PTP2_OFF (PROC0_PTP3_OFF + NKL4_KIMG_ENTRIES * NBPG)
-#define PROC0_PTP1_OFF (PROC0_PTP2_OFF + TABLE_L3_ENTRIES * NBPG)
-#define        PROC0_DMP3_OFF  (PROC0_PTP1_OFF + TABLE_L2_ENTRIES * NBPG)
+#define PROC0_PTP1_OFF (PROC0_PTP2_OFF + NKL3_KIMG_ENTRIES * NBPG)
+#define        PROC0_DMP3_OFF  (PROC0_PTP1_OFF + (NKL2_KIMG_ENTRIES + 1) * 
NBPG)
 #define PROC0_DMP2_OFF (PROC0_DMP3_OFF + NDML3_ENTRIES * NBPG)
 #define TABLESIZE \
-    ((NKL4_KIMG_ENTRIES + TABLE_L3_ENTRIES + TABLE_L2_ENTRIES + 1 + UPAGES + \
-       NDML3_ENTRIES + NDML2_ENTRIES) * NBPG)
+    ((NKL4_KIMG_ENTRIES + NKL3_KIMG_ENTRIES + (NKL2_KIMG_ENTRIES + 1) + 1 + \
+       UPAGES + NDML3_ENTRIES + NDML2_ENTRIES) * NBPG)
 
 #define fillkpt        \
 1:     movl    %eax,(%ebx)     ;       /* store phys addr */ \
Index: amd64/pmap.c
===================================================================
RCS file: /home/matthew/anoncvs/cvs/src/sys/arch/amd64/amd64/pmap.c,v
retrieving revision 1.70
diff -u -p -r1.70 pmap.c
--- amd64/pmap.c        15 Jun 2014 11:43:24 -0000      1.70
+++ amd64/pmap.c        3 Jul 2014 23:41:34 -0000
@@ -510,10 +510,9 @@ pmap_bootstrap(paddr_t first_avail, padd
 {
        vaddr_t kva, kva_end, kva_start = VM_MIN_KERNEL_ADDRESS;
        struct pmap *kpm;
-       int i;
+       size_t i, ndmpdp;
        unsigned long p1i;
        pt_entry_t pg_nx = (cpu_feature & CPUID_NXE? PG_NX : 0);
-       long ndmpdp;
        paddr_t dmpd, dmpdp;
 
        /*
@@ -596,9 +595,17 @@ pmap_bootstrap(paddr_t first_avail, padd
 
        dmpdp = kpm->pm_pdir[PDIR_SLOT_DIRECT] & PG_FRAME;
 
-       dmpd = first_avail; first_avail += ndmpdp * PAGE_SIZE;
+       /*
+        * locore.S has already allocated and configured NDML2_ENTRIES page
+        * directories to cover the first 4GB, so here we need only to allocate
+        * and configure any *additional* page directories.  To make address
+        * calculations below easier, we offset dmpd as if we'd allocated
+        * ndmpdp pages, but then avoid accessing the first NDML2_ENTRIES.
+        */
+       dmpd = first_avail - (NDML2_ENTRIES * PAGE_SIZE);
+       first_avail += (ndmpdp - NDML2_ENTRIES) * PAGE_SIZE;
 
-       for (i = NDML2_ENTRIES; i < NPDPG * ndmpdp; i++) {
+       for (i = NPDPG * NDML2_ENTRIES; i < NPDPG * ndmpdp; i++) {
                paddr_t pdp;
                vaddr_t va;

Reclaim some wasted pages on amd64

Reply via email to