Re: arm64: allocate an empty user map for kernel mode.
> Date: Fri, 17 Mar 2017 16:12:13 -0400 > From: Dale Rahn> > When the kernel switches to the kernel pmap, the pmap had been previously > leaving the ttbr0 unmodified as the kernel pmap space is all accesse thru > ttbr1. Well at one point the ttbr0 was updated with the ttbr1 content, but > that caused MASSIVE failures. > > This diff add an allocation of a zeroed/empty L0/L1 page table to put into > ttbr0 when switching to the kernel pmap. A zeroed top level buffer will > indicate there are no valid L1/L2 entries and cause any accesses to > low addresses to cause a data abort. > > This table is also used when a core is about to switch to a new asid, if > the pmap determines that it needs to reuse an asid, this will prevent > speculative accesses on the old ttbr0 after tlb invalidation. > > also another minor change invalid asid in pmap is always flagged as > 'pmap_asid == -1' checking for 'pm_asid < 0' could cause confusion, > and was change to be matching the invalidation value. > > A possible improvement in the below code it to allocate and zero a buffer > on a size name different than Lx_TABLE_ALIGN, but there appears to be no > appropriate define present. > > diff --git a/sys/arch/arm64/arm64/pmap.c b/sys/arch/arm64/arm64/pmap.c > index d1fa358b85f..18aff664beb 100644 > --- a/sys/arch/arm64/arm64/pmap.c > +++ b/sys/arch/arm64/arm64/pmap.c > @@ -1112,6 +1112,7 @@ CTASSERT(sizeof(struct pmapvp0) == 8192); > > int mappings_allocated = 0; > int pted_allocated = 0; > +paddr_t pted_kernel_ttbr0; > > vaddr_t > pmap_bootstrap(long kvo, paddr_t lpt1, long kernelstart, long kernelend, > @@ -1214,7 +1215,12 @@ pmap_bootstrap(long kvo, paddr_t lpt1, long > kernelstart, long kernelend, > } > } > > - // XXX should this extend the l2 bootstrap mappings for kernel entries? > + /* allocate an empty ttbr0 for when we are switched to kernel pmap */ > + pted_kernel_ttbr0 = pmap_steal_avail(Lx_TABLE_ALIGN, Lx_TABLE_ALIGN, > + ); > + /* zero by pa, va may not be mapped */ > + bzero((void *)pa, Lx_TABLE_ALIGN); You're zeroing the wrong bit of memory here. And I think it is best to use memset these days. I wonder if it makes more sense to set pmap_kernel()->pm_pt0pa here and use a global variable for the pagetables pointed at by tt1br. Then you could initialize pmap_kernel()->pm_asid to 0 and we wouldn't need to handle the kernel pmap special in pmap_setttb(). > /* now that we have mapping space for everything, lets map it */ > /* all of these mappings are ram -> kernel va */ > @@ -2311,7 +2317,12 @@ pmap_setttb(struct proc *p, paddr_t pagedir, struct > pcb *pcb) > //int oasid = pm->pm_asid; > > if (pm != pmap_kernel()) { > - if (pm->pm_asid < 0) { > + if (pm->pm_asid == -1) { > + /* > + * We are running in kernel, about to switch to an > + * unknown asid, move to an unused ttbr0 entry > + */ > + cpu_setttb(pted_kernel_ttbr0); I don't see why this is necessary. The old ASID is perfectly valid until we switch, and the delayed cpu_tlb_flush() will make sure we get rid of any cached mappings of the old ASID. This works fine even if we had to roll over the ASIDs and happened to pick the exact same ASID as we used before. > pmap_allocate_asid(pm); > pcb->pcb_pagedir = ((uint64_t)pm->pm_asid << 48) | > pm->pm_pt0pa; > @@ -2326,6 +2337,7 @@ pmap_setttb(struct proc *p, paddr_t pagedir, struct pcb > *pcb) > cpu_tlb_flush(); > } > } else { > - // XXX what to do if switching to kernel pmap !?!? > + /* switch userland to empty mapping page */ > + cpu_setttb(pted_kernel_ttbr0); > } > } > Dale Rahn dr...@dalerahn.com > >
Re: arm64: Remove early ASID allocation
> Date: Fri, 17 Mar 2017 16:00:53 -0400 > From: Dale Rahn> > Do not preallocate the asid, wait until it is allcoated in pmap_setttb() > Simplifies the code by only having one location that does the asid > allocation. > > There is no need to allocate the ASID direclty in pmap_activate as > the code in pmap setttb will do that allocation so this code is just > duplication that could potentially cause a race in the future. I wondered if this was supposed to be an optimization, to prevent running a potentially expensive operation with interrupts disabled. But that doesn't really make a lot of sense. ok kettenis@ P.S. I'm looking into using the full 16-bits to make ASID allocation simpler. First test shows that the full 16 bits work on the rpi3. > diff --git a/sys/arch/arm64/arm64/pmap.c b/sys/arch/arm64/arm64/pmap.c > index 4d0c58c3663..d1fa358b85f 100644 > --- a/sys/arch/arm64/arm64/pmap.c > +++ b/sys/arch/arm64/arm64/pmap.c > @@ -1403,14 +1403,6 @@ pmap_activate(struct proc *p) > pcb = >p_addr->u_pcb; > > // printf("%s: called on proc %p %p\n", __func__, p, pcb->pcb_pagedir); > - if (pm != pmap_kernel() && pm->pm_asid == -1) { > - // this userland process has no asid, allocate one. > - pmap_allocate_asid(pm); > - } > - > - if (pm != pmap_kernel()) > - pcb->pcb_pagedir = ((uint64_t)pm->pm_asid << 48) | pm->pm_pt0pa; > - > psw = disable_interrupts(); > if (p == curproc && pm != pmap_kernel() && pm != curcpu()->ci_curpm) { > // clean up old process > Dale Rahn dr...@dalerahn.com > >
Re: arm64 kernel W^X
> Date: Fri, 17 Mar 2017 19:07:16 +0100 > From: Patrick Wildt> > On Fri, Mar 17, 2017 at 06:42:38PM +0100, Mark Kettenis wrote: > > The diff below introduces kernel W^X on arm64. It changes the linker > > script to align the RX, R and RW segments on a 2MB boundary such that > > we can continue to use block mappings for the kernel. Then it adds > > some additional linker generated symbols that allow us to determine > > whether a block is .text, .rodata or .data/.bss. > > > > ok? > > > > > > Index: arch/arm64/arm64/pmap.c > > === > > RCS file: /cvs/src/sys/arch/arm64/arm64/pmap.c,v > > retrieving revision 1.26 > > diff -u -p -r1.26 pmap.c > > --- arch/arm64/arm64/pmap.c 16 Mar 2017 20:15:07 - 1.26 > > +++ arch/arm64/arm64/pmap.c 17 Mar 2017 17:39:05 - > > @@ -255,6 +255,27 @@ const struct kmem_pa_mode kp_lN = { > > .kp_zero = 1, > > }; > > > > +const uint64_t ap_bits_user[8] = { > > + [PROT_NONE] = ATTR_nG|ATTR_PXN|ATTR_UXN|ATTR_AP(2), > > + [PROT_READ] = > > ATTR_nG|ATTR_PXN|ATTR_UXN|ATTR_AF|ATTR_AP(3), > > + [PROT_WRITE]= > > ATTR_nG|ATTR_PXN|ATTR_UXN|ATTR_AF|ATTR_AP(1), > > + [PROT_WRITE|PROT_READ] = > > ATTR_nG|ATTR_PXN|ATTR_UXN|ATTR_AF|ATTR_AP(1), > > + [PROT_EXEC] = ATTR_nG|ATTR_PXN|ATTR_AF|ATTR_AP(2), > > + [PROT_EXEC|PROT_READ] = ATTR_nG|ATTR_PXN|ATTR_AF|ATTR_AP(3), > > + [PROT_EXEC|PROT_WRITE] = ATTR_nG|ATTR_PXN|ATTR_AF|ATTR_AP(1), > > + [PROT_EXEC|PROT_WRITE|PROT_READ]= ATTR_nG|ATTR_PXN|ATTR_AF|ATTR_AP(1), > > +}; > > + > > +const uint64_t ap_bits_kern[8] = { > > + [PROT_NONE] = ATTR_PXN|ATTR_UXN|ATTR_AP(2), > > + [PROT_READ] = > > ATTR_PXN|ATTR_UXN|ATTR_AF|ATTR_AP(2), > > + [PROT_WRITE]= > > ATTR_PXN|ATTR_UXN|ATTR_AF|ATTR_AP(0), > > + [PROT_WRITE|PROT_READ] = > > ATTR_PXN|ATTR_UXN|ATTR_AF|ATTR_AP(0), > > + [PROT_EXEC] = ATTR_UXN|ATTR_AF|ATTR_AP(2), > > + [PROT_EXEC|PROT_READ] = ATTR_UXN|ATTR_AF|ATTR_AP(2), > > + [PROT_EXEC|PROT_WRITE] = ATTR_UXN|ATTR_AF|ATTR_AP(0), > > + [PROT_EXEC|PROT_WRITE|PROT_READ]= ATTR_UXN|ATTR_AF|ATTR_AP(0), > > +}; > > Is this only a move of the lists or is there also a change involved? > It's a bit hard to compare. Only a move, but I did make it const and dropped the STATIC. > > /* > > * This is used for pmap_kernel() mappings, they are not to be removed > > @@ -1222,7 +1243,9 @@ pmap_bootstrap(long kvo, paddr_t lpt1, > > // enable mappings for existing 'allocated' mapping in the bootstrap > > // page tables > > extern uint64_t *pagetable; > > - extern int *_end; > > + extern char __text_start[], _etext[]; > > + extern char __rodata_start[], _erodata[]; > > + extern char _end[]; > > vp2 = (void *)((long) + kvo); > > struct mem_region *mp; > > ssize_t size; > > @@ -1234,11 +1257,21 @@ pmap_bootstrap(long kvo, paddr_t lpt1, > > { > > paddr_t mappa = pa & ~(L2_SIZE-1); > > vaddr_t mapva = mappa - kvo; > > - if (mapva < (vaddr_t)&_end) > > + int prot = PROT_READ | PROT_WRITE; > > + > > + if (mapva < (vaddr_t)_end) > > continue; > > + > > + if (mapva >= (vaddr_t)__text_start && > > + mapva < (vaddr_t)_etext) > > + prot = PROT_READ | PROT_EXEC; > > + else if (mapva >= (vaddr_t)__rodata_start && > > +mapva < (vaddr_t)_erodata) > > The spacing looks off (but at least aligned to the line above). > Reads fine to me otherwise. Good catch.
arm64: allocate an empty user map for kernel mode.
When the kernel switches to the kernel pmap, the pmap had been previously leaving the ttbr0 unmodified as the kernel pmap space is all accesse thru ttbr1. Well at one point the ttbr0 was updated with the ttbr1 content, but that caused MASSIVE failures. This diff add an allocation of a zeroed/empty L0/L1 page table to put into ttbr0 when switching to the kernel pmap. A zeroed top level buffer will indicate there are no valid L1/L2 entries and cause any accesses to low addresses to cause a data abort. This table is also used when a core is about to switch to a new asid, if the pmap determines that it needs to reuse an asid, this will prevent speculative accesses on the old ttbr0 after tlb invalidation. also another minor change invalid asid in pmap is always flagged as 'pmap_asid == -1' checking for 'pm_asid < 0' could cause confusion, and was change to be matching the invalidation value. A possible improvement in the below code it to allocate and zero a buffer on a size name different than Lx_TABLE_ALIGN, but there appears to be no appropriate define present. diff --git a/sys/arch/arm64/arm64/pmap.c b/sys/arch/arm64/arm64/pmap.c index d1fa358b85f..18aff664beb 100644 --- a/sys/arch/arm64/arm64/pmap.c +++ b/sys/arch/arm64/arm64/pmap.c @@ -1112,6 +1112,7 @@ CTASSERT(sizeof(struct pmapvp0) == 8192); int mappings_allocated = 0; int pted_allocated = 0; +paddr_t pted_kernel_ttbr0; vaddr_t pmap_bootstrap(long kvo, paddr_t lpt1, long kernelstart, long kernelend, @@ -1214,7 +1215,12 @@ pmap_bootstrap(long kvo, paddr_t lpt1, long kernelstart, long kernelend, } } - // XXX should this extend the l2 bootstrap mappings for kernel entries? + /* allocate an empty ttbr0 for when we are switched to kernel pmap */ + pted_kernel_ttbr0 = pmap_steal_avail(Lx_TABLE_ALIGN, Lx_TABLE_ALIGN, + ); + /* zero by pa, va may not be mapped */ + bzero((void *)pa, Lx_TABLE_ALIGN); + /* now that we have mapping space for everything, lets map it */ /* all of these mappings are ram -> kernel va */ @@ -2311,7 +2317,12 @@ pmap_setttb(struct proc *p, paddr_t pagedir, struct pcb *pcb) //int oasid = pm->pm_asid; if (pm != pmap_kernel()) { - if (pm->pm_asid < 0) { + if (pm->pm_asid == -1) { + /* +* We are running in kernel, about to switch to an +* unknown asid, move to an unused ttbr0 entry +*/ + cpu_setttb(pted_kernel_ttbr0); pmap_allocate_asid(pm); pcb->pcb_pagedir = ((uint64_t)pm->pm_asid << 48) | pm->pm_pt0pa; @@ -2326,6 +2337,7 @@ pmap_setttb(struct proc *p, paddr_t pagedir, struct pcb *pcb) cpu_tlb_flush(); } } else { - // XXX what to do if switching to kernel pmap !?!? + /* switch userland to empty mapping page */ + cpu_setttb(pted_kernel_ttbr0); } } Dale Rahn dr...@dalerahn.com
arm64: Remove early ASID allocation
Do not preallocate the asid, wait until it is allcoated in pmap_setttb() Simplifies the code by only having one location that does the asid allocation. There is no need to allocate the ASID direclty in pmap_activate as the code in pmap setttb will do that allocation so this code is just duplication that could potentially cause a race in the future. diff --git a/sys/arch/arm64/arm64/pmap.c b/sys/arch/arm64/arm64/pmap.c index 4d0c58c3663..d1fa358b85f 100644 --- a/sys/arch/arm64/arm64/pmap.c +++ b/sys/arch/arm64/arm64/pmap.c @@ -1403,14 +1403,6 @@ pmap_activate(struct proc *p) pcb = >p_addr->u_pcb; // printf("%s: called on proc %p %p\n", __func__, p, pcb->pcb_pagedir); - if (pm != pmap_kernel() && pm->pm_asid == -1) { - // this userland process has no asid, allocate one. - pmap_allocate_asid(pm); - } - - if (pm != pmap_kernel()) - pcb->pcb_pagedir = ((uint64_t)pm->pm_asid << 48) | pm->pm_pt0pa; - psw = disable_interrupts(); if (p == curproc && pm != pmap_kernel() && pm != curcpu()->ci_curpm) { // clean up old process Dale Rahn dr...@dalerahn.com
Re: arm64 kernel W^X
On Fri, Mar 17, 2017 at 06:42:38PM +0100, Mark Kettenis wrote: > The diff below introduces kernel W^X on arm64. It changes the linker > script to align the RX, R and RW segments on a 2MB boundary such that > we can continue to use block mappings for the kernel. Then it adds > some additional linker generated symbols that allow us to determine > whether a block is .text, .rodata or .data/.bss. > > ok? > > > Index: arch/arm64/arm64/pmap.c > === > RCS file: /cvs/src/sys/arch/arm64/arm64/pmap.c,v > retrieving revision 1.26 > diff -u -p -r1.26 pmap.c > --- arch/arm64/arm64/pmap.c 16 Mar 2017 20:15:07 - 1.26 > +++ arch/arm64/arm64/pmap.c 17 Mar 2017 17:39:05 - > @@ -255,6 +255,27 @@ const struct kmem_pa_mode kp_lN = { > .kp_zero = 1, > }; > > +const uint64_t ap_bits_user[8] = { > + [PROT_NONE] = ATTR_nG|ATTR_PXN|ATTR_UXN|ATTR_AP(2), > + [PROT_READ] = > ATTR_nG|ATTR_PXN|ATTR_UXN|ATTR_AF|ATTR_AP(3), > + [PROT_WRITE]= > ATTR_nG|ATTR_PXN|ATTR_UXN|ATTR_AF|ATTR_AP(1), > + [PROT_WRITE|PROT_READ] = > ATTR_nG|ATTR_PXN|ATTR_UXN|ATTR_AF|ATTR_AP(1), > + [PROT_EXEC] = ATTR_nG|ATTR_PXN|ATTR_AF|ATTR_AP(2), > + [PROT_EXEC|PROT_READ] = ATTR_nG|ATTR_PXN|ATTR_AF|ATTR_AP(3), > + [PROT_EXEC|PROT_WRITE] = ATTR_nG|ATTR_PXN|ATTR_AF|ATTR_AP(1), > + [PROT_EXEC|PROT_WRITE|PROT_READ]= ATTR_nG|ATTR_PXN|ATTR_AF|ATTR_AP(1), > +}; > + > +const uint64_t ap_bits_kern[8] = { > + [PROT_NONE] = ATTR_PXN|ATTR_UXN|ATTR_AP(2), > + [PROT_READ] = > ATTR_PXN|ATTR_UXN|ATTR_AF|ATTR_AP(2), > + [PROT_WRITE]= > ATTR_PXN|ATTR_UXN|ATTR_AF|ATTR_AP(0), > + [PROT_WRITE|PROT_READ] = > ATTR_PXN|ATTR_UXN|ATTR_AF|ATTR_AP(0), > + [PROT_EXEC] = ATTR_UXN|ATTR_AF|ATTR_AP(2), > + [PROT_EXEC|PROT_READ] = ATTR_UXN|ATTR_AF|ATTR_AP(2), > + [PROT_EXEC|PROT_WRITE] = ATTR_UXN|ATTR_AF|ATTR_AP(0), > + [PROT_EXEC|PROT_WRITE|PROT_READ]= ATTR_UXN|ATTR_AF|ATTR_AP(0), > +}; Is this only a move of the lists or is there also a change involved? It's a bit hard to compare. > > /* > * This is used for pmap_kernel() mappings, they are not to be removed > @@ -1222,7 +1243,9 @@ pmap_bootstrap(long kvo, paddr_t lpt1, > // enable mappings for existing 'allocated' mapping in the bootstrap > // page tables > extern uint64_t *pagetable; > - extern int *_end; > + extern char __text_start[], _etext[]; > + extern char __rodata_start[], _erodata[]; > + extern char _end[]; > vp2 = (void *)((long) + kvo); > struct mem_region *mp; > ssize_t size; > @@ -1234,11 +1257,21 @@ pmap_bootstrap(long kvo, paddr_t lpt1, > { > paddr_t mappa = pa & ~(L2_SIZE-1); > vaddr_t mapva = mappa - kvo; > - if (mapva < (vaddr_t)&_end) > + int prot = PROT_READ | PROT_WRITE; > + > + if (mapva < (vaddr_t)_end) > continue; > + > + if (mapva >= (vaddr_t)__text_start && > + mapva < (vaddr_t)_etext) > + prot = PROT_READ | PROT_EXEC; > + else if (mapva >= (vaddr_t)__rodata_start && > + mapva < (vaddr_t)_erodata) The spacing looks off (but at least aligned to the line above). Reads fine to me otherwise. > + prot = PROT_READ; > + > vp2->l2[VP_IDX2(mapva)] = mappa | L2_BLOCK | > - ATTR_AF | ATTR_IDX(PTE_ATTR_WB) | > - ATTR_SH(SH_INNER); > + ATTR_IDX(PTE_ATTR_WB) | ATTR_SH(SH_INNER) | > + ap_bits_kern[prot]; > } > } > > @@ -1572,28 +1605,6 @@ pmap_proc_iflush(struct process *pr, vad > if (pr == curproc->p_p) > cpu_icache_sync_range(va, len); > } > - > -STATIC uint64_t ap_bits_user[8] = { > - [PROT_NONE] = ATTR_nG|ATTR_PXN|ATTR_UXN|ATTR_AP(2), > - [PROT_READ] = > ATTR_nG|ATTR_PXN|ATTR_UXN|ATTR_AF|ATTR_AP(3), > - [PROT_WRITE]= > ATTR_nG|ATTR_PXN|ATTR_UXN|ATTR_AF|ATTR_AP(1), > - [PROT_WRITE|PROT_READ] = > ATTR_nG|ATTR_PXN|ATTR_UXN|ATTR_AF|ATTR_AP(1), > - [PROT_EXEC] = ATTR_nG|ATTR_PXN|ATTR_AF|ATTR_AP(2), > - [PROT_EXEC|PROT_READ] = ATTR_nG|ATTR_PXN|ATTR_AF|ATTR_AP(3), > - [PROT_EXEC|PROT_WRITE] = ATTR_nG|ATTR_PXN|ATTR_AF|ATTR_AP(1), > - [PROT_EXEC|PROT_WRITE|PROT_READ]= ATTR_nG|ATTR_PXN|ATTR_AF|ATTR_AP(1), > -}; > - > -STATIC uint64_t ap_bits_kern[8] =
arm64 kernel W^X
The diff below introduces kernel W^X on arm64. It changes the linker script to align the RX, R and RW segments on a 2MB boundary such that we can continue to use block mappings for the kernel. Then it adds some additional linker generated symbols that allow us to determine whether a block is .text, .rodata or .data/.bss. ok? Index: arch/arm64/arm64/pmap.c === RCS file: /cvs/src/sys/arch/arm64/arm64/pmap.c,v retrieving revision 1.26 diff -u -p -r1.26 pmap.c --- arch/arm64/arm64/pmap.c 16 Mar 2017 20:15:07 - 1.26 +++ arch/arm64/arm64/pmap.c 17 Mar 2017 17:39:05 - @@ -255,6 +255,27 @@ const struct kmem_pa_mode kp_lN = { .kp_zero = 1, }; +const uint64_t ap_bits_user[8] = { + [PROT_NONE] = ATTR_nG|ATTR_PXN|ATTR_UXN|ATTR_AP(2), + [PROT_READ] = ATTR_nG|ATTR_PXN|ATTR_UXN|ATTR_AF|ATTR_AP(3), + [PROT_WRITE]= ATTR_nG|ATTR_PXN|ATTR_UXN|ATTR_AF|ATTR_AP(1), + [PROT_WRITE|PROT_READ] = ATTR_nG|ATTR_PXN|ATTR_UXN|ATTR_AF|ATTR_AP(1), + [PROT_EXEC] = ATTR_nG|ATTR_PXN|ATTR_AF|ATTR_AP(2), + [PROT_EXEC|PROT_READ] = ATTR_nG|ATTR_PXN|ATTR_AF|ATTR_AP(3), + [PROT_EXEC|PROT_WRITE] = ATTR_nG|ATTR_PXN|ATTR_AF|ATTR_AP(1), + [PROT_EXEC|PROT_WRITE|PROT_READ]= ATTR_nG|ATTR_PXN|ATTR_AF|ATTR_AP(1), +}; + +const uint64_t ap_bits_kern[8] = { + [PROT_NONE] = ATTR_PXN|ATTR_UXN|ATTR_AP(2), + [PROT_READ] = ATTR_PXN|ATTR_UXN|ATTR_AF|ATTR_AP(2), + [PROT_WRITE]= ATTR_PXN|ATTR_UXN|ATTR_AF|ATTR_AP(0), + [PROT_WRITE|PROT_READ] = ATTR_PXN|ATTR_UXN|ATTR_AF|ATTR_AP(0), + [PROT_EXEC] = ATTR_UXN|ATTR_AF|ATTR_AP(2), + [PROT_EXEC|PROT_READ] = ATTR_UXN|ATTR_AF|ATTR_AP(2), + [PROT_EXEC|PROT_WRITE] = ATTR_UXN|ATTR_AF|ATTR_AP(0), + [PROT_EXEC|PROT_WRITE|PROT_READ]= ATTR_UXN|ATTR_AF|ATTR_AP(0), +}; /* * This is used for pmap_kernel() mappings, they are not to be removed @@ -1222,7 +1243,9 @@ pmap_bootstrap(long kvo, paddr_t lpt1, // enable mappings for existing 'allocated' mapping in the bootstrap // page tables extern uint64_t *pagetable; - extern int *_end; + extern char __text_start[], _etext[]; + extern char __rodata_start[], _erodata[]; + extern char _end[]; vp2 = (void *)((long) + kvo); struct mem_region *mp; ssize_t size; @@ -1234,11 +1257,21 @@ pmap_bootstrap(long kvo, paddr_t lpt1, { paddr_t mappa = pa & ~(L2_SIZE-1); vaddr_t mapva = mappa - kvo; - if (mapva < (vaddr_t)&_end) + int prot = PROT_READ | PROT_WRITE; + + if (mapva < (vaddr_t)_end) continue; + + if (mapva >= (vaddr_t)__text_start && + mapva < (vaddr_t)_etext) + prot = PROT_READ | PROT_EXEC; + else if (mapva >= (vaddr_t)__rodata_start && +mapva < (vaddr_t)_erodata) + prot = PROT_READ; + vp2->l2[VP_IDX2(mapva)] = mappa | L2_BLOCK | - ATTR_AF | ATTR_IDX(PTE_ATTR_WB) | - ATTR_SH(SH_INNER); + ATTR_IDX(PTE_ATTR_WB) | ATTR_SH(SH_INNER) | + ap_bits_kern[prot]; } } @@ -1572,28 +1605,6 @@ pmap_proc_iflush(struct process *pr, vad if (pr == curproc->p_p) cpu_icache_sync_range(va, len); } - -STATIC uint64_t ap_bits_user[8] = { - [PROT_NONE] = ATTR_nG|ATTR_PXN|ATTR_UXN|ATTR_AP(2), - [PROT_READ] = ATTR_nG|ATTR_PXN|ATTR_UXN|ATTR_AF|ATTR_AP(3), - [PROT_WRITE]= ATTR_nG|ATTR_PXN|ATTR_UXN|ATTR_AF|ATTR_AP(1), - [PROT_WRITE|PROT_READ] = ATTR_nG|ATTR_PXN|ATTR_UXN|ATTR_AF|ATTR_AP(1), - [PROT_EXEC] = ATTR_nG|ATTR_PXN|ATTR_AF|ATTR_AP(2), - [PROT_EXEC|PROT_READ] = ATTR_nG|ATTR_PXN|ATTR_AF|ATTR_AP(3), - [PROT_EXEC|PROT_WRITE] = ATTR_nG|ATTR_PXN|ATTR_AF|ATTR_AP(1), - [PROT_EXEC|PROT_WRITE|PROT_READ]= ATTR_nG|ATTR_PXN|ATTR_AF|ATTR_AP(1), -}; - -STATIC uint64_t ap_bits_kern[8] = { - [PROT_NONE] = ATTR_PXN|ATTR_UXN|ATTR_AP(2), - [PROT_READ] = ATTR_PXN|ATTR_UXN|ATTR_AF|ATTR_AP(2), - [PROT_WRITE]= ATTR_PXN|ATTR_UXN|ATTR_AF|ATTR_AP(0), - [PROT_WRITE|PROT_READ] = ATTR_PXN|ATTR_UXN|ATTR_AF|ATTR_AP(0), -
Re: [PATCH] socppc discontinued in 61.html
On Fri, Mar 17, 2017 at 05:28:59PM +0100, Ingo Schwarze wrote: > Hi Bryan, > > since nobody else answered, and even though i don't know that > much about hardware support: I don't think your patch should be > committed. > > https://www.openbsd.org/socppc.html > > says: > The OpenBSD/socppc port was discontinued after the 5.8 release. > > In that sense, the 6.1 release does not retire socppc. > It was already gone before. I appreciate your response to my message. I also received a reply privately from tj@ which said basically the same thing and I agree. Although 59.html or 60.html never stated that OpenBSD/socppc was discontinued, there is not much value in putting that in for 61.html which is not really accurate as you said. Thanks again. Bryan
Re: [PATCH] socppc discontinued in 61.html
Hi Bryan, since nobody else answered, and even though i don't know that much about hardware support: I don't think your patch should be committed. https://www.openbsd.org/socppc.html says: The OpenBSD/socppc port was discontinued after the 5.8 release. In that sense, the 6.1 release does not retire socppc. It was already gone before. Yours, Ingo Bryan Vyhmeister wrote on Mon, Mar 13, 2017 at 07:04:22PM -0700: > I am not sure what happened to the patch tj@ committed but it is not > there for 61.html. Perhaps it was clobbered by another commit or was > this not left in because socppc was essentially retired after 5.8 since > there was no 5.9 or 6.0 release? > > For reference: > > http://marc.info/?l=openbsd-cvs=148890947117155=2 > > http://marc.info/?l=openbsd-cvs=148899205909012=2 > > Bryan > > > > Index: 61.html > === > RCS file: /cvs/www/61.html,v > retrieving revision 1.37 > diff -u -p -r1.37 61.html > --- 61.html 11 Mar 2017 16:33:30 - 1.37 > +++ 61.html 14 Mar 2017 01:54:24 - > @@ -69,6 +69,7 @@ to 6.1. > ... > The following platforms were retired: > https://www.openbsd.org/armish.html;>armish, > +https://www.openbsd.org/socppc.html;>socppc, > https://www.openbsd.org/sparc.html;>sparc, > https://www.openbsd.org/zaurus.html;>zaurus > ... >
Re: Unneeded splnet()/splx() in carp(4)
On Tue, Mar 07, 2017 at 11:08:43AM +0100, Martin Pieuchot wrote: > carp(4), as a pseudo-interface, is always executed in the 'softnet' > thread. Using splnet()/splx() might have been relevant when link-state > handlers where directly executed from hardware interrupt handlers. But > nowadays everything is run under the NET_LOCK() in a thread context, so > let's get rid of these superfluous splnet()/splx() dances. > > ok? OK bluhm@ > > Index: netinet/ip_carp.c > === > RCS file: /cvs/src/sys/netinet/ip_carp.c,v > retrieving revision 1.302 > diff -u -p -r1.302 ip_carp.c > --- netinet/ip_carp.c 20 Feb 2017 06:29:42 - 1.302 > +++ netinet/ip_carp.c 7 Mar 2017 10:05:08 - > @@ -898,7 +898,6 @@ carpdetach(struct carp_softc *sc) > { > struct ifnet *ifp0; > struct carp_if *cif; > - int s; > > carp_del_all_timeouts(sc); > > @@ -926,7 +925,6 @@ carpdetach(struct carp_softc *sc) > /* Restore previous input handler. */ > if_ih_remove(ifp0, carp_input, cif); > > - s = splnet(); > if (sc->lh_cookie != NULL) > hook_disestablish(ifp0->if_linkstatehooks, sc->lh_cookie); > > @@ -938,7 +936,6 @@ carpdetach(struct carp_softc *sc) > free(cif, M_IFADDR, sizeof(*cif)); > } > sc->sc_carpdev = NULL; > - splx(s); > } > > /* Detach an interface from the carp. */ > @@ -1680,7 +1677,6 @@ carp_set_ifp(struct carp_softc *sc, stru > struct carp_if *cif, *ncif = NULL; > struct carp_softc *vr, *last = NULL, *after = NULL; > int myself = 0, error = 0; > - int s; > > KASSERT(ifp0 != sc->sc_carpdev); > KERNEL_ASSERT_LOCKED(); /* touching vhif_vrs */ > @@ -1754,9 +1750,7 @@ carp_set_ifp(struct carp_softc *sc, stru > /* Change input handler of the physical interface. */ > if_ih_insert(ifp0, carp_input, cif); > > - s = splnet(); > carp_carpdev_state(ifp0); > - splx(s); > > return (0); > }
Re: Unneeded splnet()/splx() in carp(4)
On 7 March 2017 at 11:08, Martin Pieuchotwrote: > carp(4), as a pseudo-interface, is always executed in the 'softnet' > thread. Using splnet()/splx() might have been relevant when link-state > handlers where directly executed from hardware interrupt handlers. But > nowadays everything is run under the NET_LOCK() in a thread context, so > let's get rid of these superfluous splnet()/splx() dances. > > ok? > Looks good to me. Hooray for the last pair of splnets in netinet going the way of the Dodo.
Re: Sync nfs_connect w/ sys_connect
On Tue, Mar 07, 2017 at 10:57:52AM +0100, Martin Pieuchot wrote: > This code is mostly a copy of what's done in sys_connect(), so sync it > to use solock()/sosleep()/sounlock() instead of splsoftnet()/splx(). > > ok? OK bluhm@ > > Index: nfs/nfs_socket.c > === > RCS file: /cvs/src/sys/nfs/nfs_socket.c,v > retrieving revision 1.114 > diff -u -p -r1.114 nfs_socket.c > --- nfs/nfs_socket.c 3 Mar 2017 09:41:20 - 1.114 > +++ nfs/nfs_socket.c 7 Mar 2017 09:53:02 - > @@ -306,25 +306,24 @@ nfs_connect(struct nfsmount *nmp, struct >* connect system call but with the wait timing out so >* that interruptible mounts don't hang here for a long time. >*/ > - s = splsoftnet(); > + s = solock(so); > while ((so->so_state & SS_ISCONNECTING) && so->so_error == 0) { > - (void) tsleep((caddr_t)>so_timeo, PSOCK, > - "nfscon", 2 * hz); > + sosleep(so, >so_timeo, PSOCK, "nfscon", 2 * hz); > if ((so->so_state & SS_ISCONNECTING) && > so->so_error == 0 && rep && > (error = nfs_sigintr(nmp, rep, rep->r_procp)) != 0){ > so->so_state &= ~SS_ISCONNECTING; > - splx(s); > + sounlock(s); > goto bad; > } > } > if (so->so_error) { > error = so->so_error; > so->so_error = 0; > - splx(s); > + sounlock(s); > goto bad; > } > - splx(s); > + sounlock(s); > } > /* >* Always set receive timeout to detect server crash and reconnect.
Re: roff(7) man page not rendering properly in its entirety on man.openbsd.org
Hi, Raf Czlonka wrote on Thu, Mar 16, 2017 at 10:56:23PM +: > While looking at several manual pages on man.openbsd.org, Thanks for testing, ... > I've noticed that roff(7) man page does not render properly > in its entirety, ... and for reporting. > No idea what might be causing the issue, I'm afraid. And thanks to Martin Natano for telling me what caused this. Fixed by the commit below. While the print_otag "i" (id) request intentionally accepts NULL such that the same call can write the tag with or without id, the tag must not be written at all if we don't have an id. Yours, Ingo Log Message: --- Fix regression in mdoc_html.c 1.275, man_html 1.134: For .Sh, .Ss, .SH, .SS, only write selflink if an id could be constructed. Crash reported by Raf Czlonka , analysis of root cause by natano@ Modified Files: -- mdocml: man_html.c mdoc_html.c Revision Data - Index: man_html.c === RCS file: /home/cvs/mdocml/mdocml/man_html.c,v retrieving revision 1.134 retrieving revision 1.135 diff -Lman_html.c -Lman_html.c -u -p -r1.134 -r1.135 --- man_html.c +++ man_html.c @@ -440,7 +440,8 @@ man_SH_pre(MAN_ARGS) if (n->type == ROFFT_HEAD) { id = html_make_id(n); print_otag(h, TAG_H1, "cTi", "Sh", id); - print_otag(h, TAG_A, "chR", "selflink", id); + if (id != NULL) + print_otag(h, TAG_A, "chR", "selflink", id); free(id); } return 1; @@ -509,7 +510,8 @@ man_SS_pre(MAN_ARGS) if (n->type == ROFFT_HEAD) { id = html_make_id(n); print_otag(h, TAG_H2, "cTi", "Ss", id); - print_otag(h, TAG_A, "chR", "selflink", id); + if (id != NULL) + print_otag(h, TAG_A, "chR", "selflink", id); free(id); } return 1; Index: mdoc_html.c === RCS file: /home/cvs/mdocml/mdocml/mdoc_html.c,v retrieving revision 1.277 retrieving revision 1.278 diff -Lmdoc_html.c -Lmdoc_html.c -u -p -r1.277 -r1.278 --- mdoc_html.c +++ mdoc_html.c @@ -501,7 +501,8 @@ mdoc_sh_pre(MDOC_ARGS) case ROFFT_HEAD: id = html_make_id(n); print_otag(h, TAG_H1, "cTi", "Sh", id); - print_otag(h, TAG_A, "chR", "selflink", id); + if (id != NULL) + print_otag(h, TAG_A, "chR", "selflink", id); free(id); break; case ROFFT_BODY: @@ -524,7 +525,8 @@ mdoc_ss_pre(MDOC_ARGS) id = html_make_id(n); print_otag(h, TAG_H2, "cTi", "Ss", id); - print_otag(h, TAG_A, "chR", "selflink", id); + if (id != NULL) + print_otag(h, TAG_A, "chR", "selflink", id); free(id); return 1; }
pf: percpu anchor stacks
Hi, to step into and out of pf(4) anchors, pf(4) uses a temporary stack that is a global variable. Now once we run pf_test_rule() in parallel and an anchor is evaluated in parallel, the global stack would be used concurrently, which can lead to panics. To solve this issue this diff allocates a per-cpu stack using the cpumem API. Opinions? Patrick diff --git a/sys/net/pf.c b/sys/net/pf.c index 3ddcd7726d2..78316ae0009 100644 --- a/sys/net/pf.c +++ b/sys/net/pf.c @@ -119,12 +119,7 @@ u_char pf_tcp_secret[16]; int pf_tcp_secret_init; int pf_tcp_iss_off; -struct pf_anchor_stackframe { - struct pf_ruleset *rs; - struct pf_rule *r; - struct pf_anchor_node *parent; - struct pf_anchor*child; -} pf_anchor_stack[64]; +struct cpumem *pf_anchor_stacks; struct pool pf_src_tree_pl, pf_rule_pl, pf_queue_pl; struct pool pf_state_pl, pf_state_key_pl, pf_state_item_pl; @@ -3003,22 +2998,26 @@ void pf_step_into_anchor(int *depth, struct pf_ruleset **rs, struct pf_rule **r, struct pf_rule **a) { + struct pf_anchor_stack *s; struct pf_anchor_stackframe *f; - if (*depth >= sizeof(pf_anchor_stack) / - sizeof(pf_anchor_stack[0])) { + s = cpumem_enter(pf_anchor_stacks); + + if (*depth >= sizeof(*s) / sizeof(s->frame[0])) { log(LOG_ERR, "pf: anchor stack overflow\n"); *r = TAILQ_NEXT(*r, entries); + cpumem_leave(pf_anchor_stacks, s); return; } else if (a != NULL) *a = *r; - f = pf_anchor_stack + (*depth)++; + f = s->frame + (*depth)++; f->rs = *rs; f->r = *r; if ((*r)->anchor_wildcard) { f->parent = &(*r)->anchor->children; if ((f->child = RB_MIN(pf_anchor_node, f->parent)) == NULL) { *r = NULL; + cpumem_leave(pf_anchor_stacks, s); return; } *rs = >child->ruleset; @@ -3028,19 +3027,23 @@ pf_step_into_anchor(int *depth, struct pf_ruleset **rs, *rs = &(*r)->anchor->ruleset; } *r = TAILQ_FIRST((*rs)->rules.active.ptr); + cpumem_leave(pf_anchor_stacks, s); } int pf_step_out_of_anchor(int *depth, struct pf_ruleset **rs, struct pf_rule **r, struct pf_rule **a, int *match) { + struct pf_anchor_stack *s; struct pf_anchor_stackframe *f; int quick = 0; + s = cpumem_enter(pf_anchor_stacks); + do { if (*depth <= 0) break; - f = pf_anchor_stack + *depth - 1; + f = s->frame + *depth - 1; if (f->parent != NULL && f->child != NULL) { f->child = RB_NEXT(pf_anchor_node, f->parent, f->child); if (f->child != NULL) { @@ -3066,6 +3069,7 @@ pf_step_out_of_anchor(int *depth, struct pf_ruleset **rs, *r = TAILQ_NEXT(f->r, entries); } while (*r == NULL); + cpumem_leave(pf_anchor_stacks, s); return (quick); } diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c index 56a43a55ab8..869ca3eaa1d 100644 --- a/sys/net/pf_ioctl.c +++ b/sys/net/pf_ioctl.c @@ -161,6 +161,10 @@ pfattach(int num) IPL_SOFTNET, 0, "pfruleitem", NULL); pool_init(_queue_pl, sizeof(struct pf_queuespec), 0, IPL_SOFTNET, 0, "pfqueue", NULL); + + pf_anchor_stacks = cpumem_malloc(sizeof(struct pf_anchor_stack), + M_TEMP); + hfsc_initialize(); pfr_initialize(); pfi_initialize(); diff --git a/sys/net/pfvar.h b/sys/net/pfvar.h index 00e9b790a91..62a183727b3 100644 --- a/sys/net/pfvar.h +++ b/sys/net/pfvar.h @@ -1872,6 +1872,19 @@ void pf_send_tcp(const struct pf_rule *, sa_family_t, u_int8_t, u_int16_t, u_int16_t, u_int8_t, int, u_int16_t, u_int); +struct pf_anchor_stackframe { + struct pf_ruleset *rs; + struct pf_rule *r; + struct pf_anchor_node *parent; + struct pf_anchor*child; +}; + +struct pf_anchor_stack { + struct pf_anchor_stackframe frame[64]; +}; + +struct cpumem; +extern struct cpumem *pf_anchor_stacks; #endif /* _KERNEL */ #endif /* _NET_PFVAR_H_ */
Re: syslogd log_debug
On Fri, Mar 17, 2017 at 08:07:38AM +0100, Martin Pieuchot wrote: > On 17/03/17(Fri) 02:09, Alexander Bluhm wrote: > > Replace logdebug() with generic log_debug() from log.c. Implement > > log_debugadd() to construct debug message incrementally. > > FWIW the idea was to keep log.c identical in all daemons and put > per-daemon log functions into logmsg.c. I don't know how much this > is applicable to syslogd(8) :) syslogd is special, it needs a different implementation: - It cannot use openlog(3) and syslog(3). log.c must use syslogd(8) internal functions as backend. - The backend has to be switched from stderr logging to internal logging as soon as the latter has been set up. This gives additional functions in the API. - I want to avoid malloc(3) in the logging path to make global error reporting reliable. - syslogd uses more serverity levels than other daemons as it has been written with that in mind. This API change could be adapted by the others. log_info(LOG_NOTICE, ...) - syslogd uses incremental debug messages because it can. I don't want to change that now. Perhaps this could be usefull for others. I will delay the syslogd logging refactoring after 6.1 tagging. It touches too much code. bluhm
Re: SVM instructions
> Date: Fri, 17 Mar 2017 07:00:08 +0100 > From: Joerg Sonnenberger> > On Wed, Mar 15, 2017 at 09:05:56PM +0100, Mark Kettenis wrote: > > Clang only accepts SVM instructions with explicit operands, for > > example: > > > > vmload %rax > > You might want to look at the aliases e.g. for monitor/mwait. Should be > pretty easy to extend as long as we are talkign about pure register > operands. Memory operands would be more tricky. To be honest, I think forcing people to use the explicit form as clang does right now is a good idea. It makes reading the code much easier.
Re: Sync nfs_connect w/ sys_connect
On 07/03/17(Tue) 10:57, Martin Pieuchot wrote: > This code is mostly a copy of what's done in sys_connect(), so sync it > to use solock()/sosleep()/sounlock() instead of splsoftnet()/splx(). > > ok? Anyone? > > Index: nfs/nfs_socket.c > === > RCS file: /cvs/src/sys/nfs/nfs_socket.c,v > retrieving revision 1.114 > diff -u -p -r1.114 nfs_socket.c > --- nfs/nfs_socket.c 3 Mar 2017 09:41:20 - 1.114 > +++ nfs/nfs_socket.c 7 Mar 2017 09:53:02 - > @@ -306,25 +306,24 @@ nfs_connect(struct nfsmount *nmp, struct >* connect system call but with the wait timing out so >* that interruptible mounts don't hang here for a long time. >*/ > - s = splsoftnet(); > + s = solock(so); > while ((so->so_state & SS_ISCONNECTING) && so->so_error == 0) { > - (void) tsleep((caddr_t)>so_timeo, PSOCK, > - "nfscon", 2 * hz); > + sosleep(so, >so_timeo, PSOCK, "nfscon", 2 * hz); > if ((so->so_state & SS_ISCONNECTING) && > so->so_error == 0 && rep && > (error = nfs_sigintr(nmp, rep, rep->r_procp)) != 0){ > so->so_state &= ~SS_ISCONNECTING; > - splx(s); > + sounlock(s); > goto bad; > } > } > if (so->so_error) { > error = so->so_error; > so->so_error = 0; > - splx(s); > + sounlock(s); > goto bad; > } > - splx(s); > + sounlock(s); > } > /* >* Always set receive timeout to detect server crash and reconnect. >
Re: Unneeded splnet()/splx() in carp(4)
On 07/03/17(Tue) 11:08, Martin Pieuchot wrote: > carp(4), as a pseudo-interface, is always executed in the 'softnet' > thread. Using splnet()/splx() might have been relevant when link-state > handlers where directly executed from hardware interrupt handlers. But > nowadays everything is run under the NET_LOCK() in a thread context, so > let's get rid of these superfluous splnet()/splx() dances. > > ok? Anyone? > > Index: netinet/ip_carp.c > === > RCS file: /cvs/src/sys/netinet/ip_carp.c,v > retrieving revision 1.302 > diff -u -p -r1.302 ip_carp.c > --- netinet/ip_carp.c 20 Feb 2017 06:29:42 - 1.302 > +++ netinet/ip_carp.c 7 Mar 2017 10:05:08 - > @@ -898,7 +898,6 @@ carpdetach(struct carp_softc *sc) > { > struct ifnet *ifp0; > struct carp_if *cif; > - int s; > > carp_del_all_timeouts(sc); > > @@ -926,7 +925,6 @@ carpdetach(struct carp_softc *sc) > /* Restore previous input handler. */ > if_ih_remove(ifp0, carp_input, cif); > > - s = splnet(); > if (sc->lh_cookie != NULL) > hook_disestablish(ifp0->if_linkstatehooks, sc->lh_cookie); > > @@ -938,7 +936,6 @@ carpdetach(struct carp_softc *sc) > free(cif, M_IFADDR, sizeof(*cif)); > } > sc->sc_carpdev = NULL; > - splx(s); > } > > /* Detach an interface from the carp. */ > @@ -1680,7 +1677,6 @@ carp_set_ifp(struct carp_softc *sc, stru > struct carp_if *cif, *ncif = NULL; > struct carp_softc *vr, *last = NULL, *after = NULL; > int myself = 0, error = 0; > - int s; > > KASSERT(ifp0 != sc->sc_carpdev); > KERNEL_ASSERT_LOCKED(); /* touching vhif_vrs */ > @@ -1754,9 +1750,7 @@ carp_set_ifp(struct carp_softc *sc, stru > /* Change input handler of the physical interface. */ > if_ih_insert(ifp0, carp_input, cif); > > - s = splnet(); > carp_carpdev_state(ifp0); > - splx(s); > > return (0); > } >
Re: SVM instructions
On Wed, Mar 15, 2017 at 09:05:56PM +0100, Mark Kettenis wrote: > Clang only accepts SVM instructions with explicit operands, for > example: > > vmload %rax You might want to look at the aliases e.g. for monitor/mwait. Should be pretty easy to extend as long as we are talkign about pure register operands. Memory operands would be more tricky. Joerg