Re: Infinite loop in uvm protection mapping

2020-10-26 Thread Tom Rollet

On 26/10/2020 03:31, Tom Rollet wrote:

On 20/10/2020 06:16, Philip Guenther wrote:
On Mon, Oct 19, 2020 at 3:13 PM Tom Rollet > wrote:


Hi,

I'm starting to help in the development of the dt device.

I'm stuck on permission handling of memory. I'm trying to allocate a
page in kernel with read/write protections, fill the allocated page
with data then change the permissions to  read/exec.

Snippet of my code:

  addr = uvm_km_alloc(kernel_map, PAGE_SIZE);

 [...] (memcpy data in allocated page)

  uvm_map_protect(kernel_map, addr, addr + PAGE_SIZE, PROT_READ
                                         | PROT_EXEC, FALSE)))



This is same usage as seen in the 'sti' driver...which is on hppa only, so while

it's presumably the correct usage of uvm_km_alloc() and uvm_map_protect()
I don't think uvm_map_protect() has been used on kernel-space on amd64
(or possibly all non-hppa archs) before in OpenBSD. Whee?


At least for my case (amd64), this function is never called from kernel space.



It triggers the following error at boot time when executing
the uvm_map_protect function.

uvm_fault(0x81fb2c90, 0x7ffec0008000, 0, 2) -> e kernel: page fault
trap, code=0 Stopped at    pmap_write_protect+0x1f5:  lock andq
$-0x3,0(%rdi)

Trace:

pmap_write_protect(82187b28,80002255b000,80002255c000,
 5,50e8b70481f4f622,fd81b6567e70) at pmap_write_protect+0x212
uvm_map_protect(82129ae0,80002255b000,80002255c000
 ,5,0,82129ae0) at uvm_map_protect+0x501
dt_alloc_kprobe(815560e0,80173900,e7ef01a2855152cc,
 82395c98,0,815560e0) at dt_alloc_kprobe+0x1ff
dt_prov_kprobe_init(2333e28db00d3edd,0,82121150,0,0,
 824d9008) at dt_prov_kprobe_init+0x1d9
dtattach(1,821fb384,f,1,c2ee1c3f472154e,2dda28) at dtattach+0x5d
main(0,0,0,0,0,1) at main+0x419

The problem comes from the loop in pmap_write_protect
(sys/arch/amd64/amd64/pmap.c:2108) that is executed
infinity in my case.

Entry of function pmap_write_protect:
     sva:  80002250A000
     eva:  80002250B000

After &= PG_FRAME (line 2098-2099)
     sva= F80002250A000
     eva= F80002250B000

  loop:  (ligne 2108)

      first iteration:
         va       = F80002250A000
         eva     = F80002250B000
         blockend = 080012240

...

Does anyone have an idea how to fix this issue?

So, blockend is clearly wrong for va and eva.  I suspect the use of L2_FRAME 
here:

               blockend = (va & L2_FRAME) + NBPD_L2;

is wrong here and it should be
               blockend = (va & VA_SIGN_NEG(L2_FRAME)) + NBPD_L2;

or some equivalent expression to keep all the bits above the frame.


It fixes the problem more cleanly so thank you! But I doesn't solve the
issue with the OS freezing when jumping on this area.
The jump is done at the end of the amd64 breakpoint handler, by
replacing the initial address on the stack with the address
of the allocated area.

It did put a KASSERT in the page fault handler that trigger for the
address of the allocated area (0x80002255b000).
Resulting trace:

panic(81df1079) at panic+0x12a
__assert(81e59b6b,81e990a2,4f0,81e841a3) at 
__assert+0x2b
uvm_fault(82185078,80002255b000,0,4) at uvm_fault+0x150d
kpageflttrap(800035f52a30,80002255b000) at kpageflttrap+0x13a
kerntrap(800035f52a30) at kerntrap+0x91
alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b
80002255b000(800035f52bc0,800035f52bc0,2faba22f47fde3a6,0,
                                8000fffef220,0) at 0x80002255b000
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0x7f7cc3f0, count: -9

Could someone explain to me the cases where
alltraps_kern_meltdown is called?
That would help me find why this address traps
even with EXEC protections.



 The freeze can be explain by the fact that uvm_fault doesn't
 find the cause of the fault.
 Resulting in a loop of fault on the same instruction while holding,
 most of the time, the KERNEL_LOCK.

 One problem is that in the kernel map (vm_map), all pages used in
 pagination have the non execute (NX) bit set.  So only clearing the NX
 bit from a PTE is useless and apparently it is also not catched by the
 handler of faults. All upper pages also need to be cleared of the NX bit.

 After writing on the new allocated page, I now clear all NX bits from
 the 4 pages, and then flush it from the TLB.
 This is probably not safe have W pages,
  but it's good enough for a local POC.

 It is done with this code:

 struct pmap *pmap= kernel_map->pmap;

 pt_entry_t l1; l1 = PTE_BASE[pl1_i(addr & PG_FRAME)];
 x86_atomic_clearbits_u64(, PG_NX);

 pd_entry_t l2; l2 = L2_BASE[pl2_i(addr & PG_FRAME)];
 x86_atomic_clearbits_u64(, PG_NX);

 

Re: Infinite loop in uvm protection mapping

2020-10-25 Thread Tom Rollet

On 20/10/2020 06:16, Philip Guenther wrote:
On Mon, Oct 19, 2020 at 3:13 PM Tom Rollet > wrote:


Hi,

I'm starting to help in the development of the dt device.

I'm stuck on permission handling of memory. I'm trying to allocate a
page in kernel with read/write protections, fill the allocated page
with data then change the permissions to  read/exec.

Snippet of my code:

  addr = uvm_km_alloc(kernel_map, PAGE_SIZE);

 [...] (memcpy data in allocated page)

  uvm_map_protect(kernel_map, addr, addr + PAGE_SIZE, PROT_READ
                                         | PROT_EXEC, FALSE)))


This is same usage as seen in the 'sti' driver...which is on hppa only, so 
while it's presumably the correct usage of uvm_km_alloc() and 
uvm_map_protect(), I don't think uvm_map_protect() has been used on 
kernel-space on amd64 (or possibly all non-hppa archs) before in OpenBSD.  Whee?

At least for my case (amd64), this function is never called from kernel space.


It triggers the following error at boot time when executing
the uvm_map_protect function.

uvm_fault(0x81fb2c90, 0x7ffec0008000, 0, 2) -> e kernel: page fault
trap, code=0 Stopped at    pmap_write_protect+0x1f5:  lock andq
$-0x3,0(%rdi)

Trace:

pmap_write_protect(82187b28,80002255b000,80002255c000,
 5,50e8b70481f4f622,fd81b6567e70) at pmap_write_protect+0x212
uvm_map_protect(82129ae0,80002255b000,80002255c000
 ,5,0,82129ae0) at uvm_map_protect+0x501
dt_alloc_kprobe(815560e0,80173900,e7ef01a2855152cc,
 82395c98,0,815560e0) at dt_alloc_kprobe+0x1ff
dt_prov_kprobe_init(2333e28db00d3edd,0,82121150,0,0,
 824d9008) at dt_prov_kprobe_init+0x1d9
dtattach(1,821fb384,f,1,c2ee1c3f472154e,2dda28) at dtattach+0x5d
main(0,0,0,0,0,1) at main+0x419

The problem comes from the loop in pmap_write_protect
(sys/arch/amd64/amd64/pmap.c:2108) that is executed
infinity in my case.

Entry of function pmap_write_protect:
     sva:  80002250A000
     eva:  80002250B000

After &= PG_FRAME (line 2098-2099)
     sva= F80002250A000
     eva= F80002250B000

  loop:  (ligne 2108)

      first iteration:
         va       = F80002250A000
         eva     = F80002250B000
         blockend = 080012240

...

Does anyone have an idea how to fix this issue?

So, blockend is clearly wrong for va and eva.  I suspect the use of L2_FRAME 
here:
               blockend = (va & L2_FRAME) + NBPD_L2;

is wrong here and it should be
               blockend = (va & VA_SIGN_NEG(L2_FRAME)) + NBPD_L2;

or some equivalent expression to keep all the bits above the frame.


It fixes the problem more cleanly so thank you! But I doesn't solve the
issue with the OS freezing when jumping on this area.
The jump is done at the end of the amd64 breakpoint handler, by
replacing the initial address on the stack with the address
of the allocated area.

It did put a KASSERT in the page fault handler that trigger for the
address of the allocated area (0x80002255b000).
Resulting trace:

panic(81df1079) at panic+0x12a
__assert(81e59b6b,81e990a2,4f0,81e841a3) at 
__assert+0x2b
uvm_fault(82185078,80002255b000,0,4) at uvm_fault+0x150d
kpageflttrap(800035f52a30,80002255b000) at kpageflttrap+0x13a
kerntrap(800035f52a30) at kerntrap+0x91
alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b
80002255b000(800035f52bc0,800035f52bc0,2faba22f47fde3a6,0,8000fffef220,0) 
at 0x80002255b000

Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0x7f7cc3f0, count: -9

Could someone explain to me the cases where
alltraps_kern_meltdown is called?
That would help me find why this address traps
even with EXEC protections.

Tom Rollet


Re: Infinite loop in uvm protection mapping

2020-10-19 Thread Philip Guenther
On Mon, Oct 19, 2020 at 3:13 PM Tom Rollet  wrote:

> Hi,
>
> I'm starting to help in the development of the dt device.
>
> I'm stuck on permission handling of memory. I'm trying to allocate a
> page in kernel with read/write protections, fill the allocated page
> with data then change the permissions to  read/exec.
>
> Snippet of my code:
>
>   addr = uvm_km_alloc(kernel_map, PAGE_SIZE);
>
>  [...] (memcpy data in allocated page)
>
>   uvm_map_protect(kernel_map, addr, addr + PAGE_SIZE, PROT_READ
>  | PROT_EXEC, FALSE)))
>

This is same usage as seen in the 'sti' driver...which is on hppa only, so
while it's presumably the correct usage of uvm_km_alloc() and
uvm_map_protect(), I don't think uvm_map_protect() has been used on
kernel-space on amd64 (or possibly all non-hppa archs) before in OpenBSD.
Whee?


It triggers the following error at boot time when executing
> the uvm_map_protect function.
>
> uvm_fault(0x81fb2c90, 0x7ffec0008000, 0, 2) -> e kernel: page fault
> trap, code=0 Stopped atpmap_write_protect+0x1f5:  lock andq
> $-0x3,0(%rdi)
>
> Trace:
>
> pmap_write_protect(82187b28,80002255b000,80002255c000,
>  5,50e8b70481f4f622,fd81b6567e70) at pmap_write_protect+0x212
> uvm_map_protect(82129ae0,80002255b000,80002255c000
>  ,5,0,82129ae0) at uvm_map_protect+0x501
> dt_alloc_kprobe(815560e0,80173900,e7ef01a2855152cc,
>  82395c98,0,815560e0) at dt_alloc_kprobe+0x1ff
> dt_prov_kprobe_init(2333e28db00d3edd,0,82121150,0,0,
>  824d9008) at dt_prov_kprobe_init+0x1d9
> dtattach(1,821fb384,f,1,c2ee1c3f472154e,2dda28) at dtattach+0x5d
> main(0,0,0,0,0,1) at main+0x419
>
> The problem comes from the loop in pmap_write_protect
> (sys/arch/amd64/amd64/pmap.c:2108) that is executed
> infinity in my case.
>
> Entry of function pmap_write_protect:
>  sva:  80002250A000
>  eva:  80002250B000
>
> After &= PG_FRAME (line 2098-2099)
>  sva= F80002250A000
>  eva= F80002250B000
>
>   loop:  (ligne 2108)
>
>   first iteration:
>  va   = F80002250A000
>  eva = F80002250B000
>  blockend = 080012240
>
...

> Does anyone have an idea how to fix this issue?


So, blockend is clearly wrong for va and eva.  I suspect the use of
L2_FRAME here:
   blockend = (va & L2_FRAME) + NBPD_L2;

is wrong here and it should be
   blockend = (va & VA_SIGN_NEG(L2_FRAME)) + NBPD_L2;

or some equivalent expression to keep all the bits above the frame.


Philip Guenther


Infinite loop in uvm protection mapping

2020-10-19 Thread Tom Rollet

Hi,

I'm starting to help in the development of the dt device.

I'm stuck on permission handling of memory. I'm trying to allocate a
page in kernel with read/write protections, fill the allocated page
with data then change the permissions to  read/exec.

Snippet of my code:

 addr = uvm_km_alloc(kernel_map, PAGE_SIZE);

    [...] (memcpy data in allocated page)

 uvm_map_protect(kernel_map, addr, addr + PAGE_SIZE, PROT_READ
                                            | PROT_EXEC, FALSE)))

It triggers the following error at boot time when executing
the uvm_map_protect function.

uvm_fault(0x81fb2c90, 0x7ffec0008000, 0, 2) -> e kernel: page fault
trap, code=0 Stopped at    pmap_write_protect+0x1f5:  lock andq
$-0x3,0(%rdi)

Trace:

pmap_write_protect(82187b28,80002255b000,80002255c000,
    5,50e8b70481f4f622,fd81b6567e70) at pmap_write_protect+0x212
uvm_map_protect(82129ae0,80002255b000,80002255c000
    ,5,0,82129ae0) at uvm_map_protect+0x501
dt_alloc_kprobe(815560e0,80173900,e7ef01a2855152cc,
    82395c98,0,815560e0) at dt_alloc_kprobe+0x1ff
dt_prov_kprobe_init(2333e28db00d3edd,0,82121150,0,0,
    824d9008) at dt_prov_kprobe_init+0x1d9
dtattach(1,821fb384,f,1,c2ee1c3f472154e,2dda28) at dtattach+0x5d
main(0,0,0,0,0,1) at main+0x419

The problem comes from the loop in pmap_write_protect
(sys/arch/amd64/amd64/pmap.c:2108) that is executed
infinity in my case.

Entry of function pmap_write_protect:
    sva:  80002250A000
    eva:  80002250B000

After &= PG_FRAME (line 2098-2099)
    sva= F80002250A000
    eva= F80002250B000

 loop:  (ligne 2108)

     first iteration:
        va       = F80002250A000
        eva     = F80002250B000
        blockend = 080012240

    second iteration:
        va       = 080012240
        eva     = F80002250B000
        blockend = 080022240

We can see that the problem is that the "va" variable has lost
her 4 upper set bits (48 to 51).
So now the comparison between "va" and "eva" is always false and it
loops indefinitely because "va" 4 upper bits are clean each iteration.

The fix that I found is to clear the 4 same bits in "eva".
It can be done with this mask:
      new_mask =  F000
      PG_FRAME =  000FF000


Diff with quick dirty fix:

diff --git a/sys/arch/amd64/amd64/pmap.c b/sys/arch/amd64/amd64/pmap.c
index 10ab3da2949..2783c9d26a5 100644
--- a/sys/arch/amd64/amd64/pmap.c
+++ b/sys/arch/amd64/amd64/pmap.c
@@ -2105,10 +2105,12 @@ pmap_write_protect(struct pmap *pmap, vaddr_t sva, 
vaddr_t eva, vm_prot_t prot)

    if ((eva - sva > 32 * PAGE_SIZE) && sva < VM_MIN_KERNEL_ADDRESS)
    shootall = 1;

-   for (va = sva; va < eva ; va = blockend) {
-   blockend = (va & L2_FRAME) + NBPD_L2;
-   if (blockend > eva)
-   blockend = eva;
+    vaddr_t tmp_eva = eva & 0xF000UL;
+    vaddr_t tmp_sva = sva & 0xF000UL;
+    for (va = tmp_sva; va < tmp_eva ; va = blockend) {
+    blockend = (va & L2_FRAME) + NBPD_L2;
+    if (blockend > tmp_eva)
+    blockend = tmp_eva;

    /*
 * XXXCDC: our PTE mappings should never be write-protected!


The problem is solved at boot time (I am able to boot without crashing).
But during the execution, we I jump on the page
(with read and exec permissions), the OS completely freeze without
error message, I can't launch the kernel debugger and I don't know how
debug it further.

Does anyone have an idea how to fix this issue?

Thanks

--
Tom Rollet