Re: [PATCH v8 1/5] kasan: support backing vmalloc space with real shadow memory

2019-10-18 Thread Andrey Ryabinin



On 10/16/19 4:22 PM, Mark Rutland wrote:
> Hi Andrey,
> 
> On Wed, Oct 16, 2019 at 03:19:50PM +0300, Andrey Ryabinin wrote:
>> On 10/14/19 4:57 PM, Daniel Axtens wrote:
> + /*
> +  * Ensure poisoning is visible before the shadow is made visible
> +  * to other CPUs.
> +  */
> + smp_wmb();

 I'm not quite understand what this barrier do and why it needed.
 And if it's really needed there should be a pairing barrier
 on the other side which I don't see.
>>>
>>> Mark might be better able to answer this, but my understanding is that
>>> we want to make sure that we never have a situation where the writes are
>>> reordered so that PTE is installed before all the poisioning is written
>>> out. I think it follows the logic in __pte_alloc() in mm/memory.c:
>>>
>>> /*
>>>  * Ensure all pte setup (eg. pte page lock and page clearing) are
>>>  * visible before the pte is made visible to other CPUs by being
>>>  * put into page tables.
>>>  *
>>>  * The other side of the story is the pointer chasing in the page
>>>  * table walking code (when walking the page table without locking;
>>>  * ie. most of the time). Fortunately, these data accesses consist
>>>  * of a chain of data-dependent loads, meaning most CPUs (alpha
>>>  * being the notable exception) will already guarantee loads are
>>>  * seen in-order. See the alpha page table accessors for the
>>>  * smp_read_barrier_depends() barriers in page table walking code.
>>>  */
>>> smp_wmb(); /* Could be smp_wmb__xxx(before|after)_spin_lock */
>>>
>>> I can clarify the comment.
>>
>> I don't see how is this relevant here.
> 
> The problem isn't quite the same, but it's a similar shape. See below
> for more details.
> 
>> barrier in __pte_alloc() for very the following case:
>>
>> CPU 0CPU 1
>> __pte_alloc():  
>> pte_offset_kernel(pmd_t * dir, unsigned long address):
>>  pgtable_t new = pte_alloc_one(mm);pte_t *new = 
>> (pte_t *) pmd_page_vaddr(*dir) + ((address >> PAGE_SHIFT) & (PTRS_PER_PAGE - 
>> 1));  
>>  smp_wmb();
>> smp_read_barrier_depends();
>>  pmd_populate(mm, pmd, new);
>>  /* do something with 
>> pte, e.g. check if (pte_none(*new)) */
>>
>>
>> It's needed to ensure that if CPU1 sees pmd_populate() it also sees 
>> initialized contents of the 'new'.
>>
>> In our case the barrier would have been needed if we had the other side like 
>> this:
>>
>> if (!pte_none(*vmalloc_shadow_pte)) {
>>  shadow_addr = (unsigned long)__va(pte_pfn(*vmalloc_shadow_pte) << 
>> PAGE_SHIFT);
>>  smp_read_barrier_depends();
>>  *shadow_addr; /* read the shadow, barrier ensures that if we see 
>> installed pte, we will see initialized shadow memory. */
>> }
>>
>>
>> Without such other side the barrier is pointless.
> 
> The barrier isn't pointless, but we are relying on a subtlety that is
> not captured in LKMM, as one of the observers involved is the TLB (and
> associated page table walkers) of the CPU.
> 
> Until the PTE written by CPU 0 has been observed by the TLB of CPU 1, it
> is not possible for CPU 1 to satisfy loads from the memory that PTE
> maps, as it doesn't yet know which memory that is.
> 
> Once the PTE written by CPU has been observed by the TLB of CPU 1, it is
> possible for CPU 1 to satisfy those loads. At this instant, CPU 1 must
> respect the smp_wmb() before the PTE was written, and hence sees zeroes
 
s/zeroes/poison values

> written before this. Note that if this were not true, we could not
> safely swap userspace memory.
> 
> There is the risk (as laid out in [1]) that CPU 1 attempts to hoist the
> loads of the shadow memory above the load of the PTE, samples a stale
> (faulting) status from the TLB, then performs the load of the PTE and
> sees a valid value. In this case (on arm64) a spurious fault could be
> taken when the access is architecturally performed.
> 
> It is possible on arm64 to use a barrier here to prevent the spurious
> fault, but this is not smp_read_barrier_depends(), as that does nothing
> for everyone but alpha. On arm64 We have a spurious fault handler to fix
> this up.
>  

None of that really explains how the race looks like.
Please, describe concrete race race condition diagram starting with something 
like

CPU0   CPU1
p0 = vmalloc() p1 = vmalloc()
...




Or let me put it this way. Let's assume that CPU0 accesses shadow and CPU1 did 
the memset() and installed pte.
CPU0 may not observe memset() only if it dereferences completely random vmalloc 
addresses
or it performs out-of-bounds access which crosses KASAN_SHADOW_SCALE*PAGE_SIZE 
boundary, i.e. access to shadow crosses page boundary.
In both cases it 

Re: [PATCH v8 1/5] kasan: support backing vmalloc space with real shadow memory

2019-10-16 Thread Mark Rutland
Hi Andrey,

On Wed, Oct 16, 2019 at 03:19:50PM +0300, Andrey Ryabinin wrote:
> On 10/14/19 4:57 PM, Daniel Axtens wrote:
> >>> + /*
> >>> +  * Ensure poisoning is visible before the shadow is made visible
> >>> +  * to other CPUs.
> >>> +  */
> >>> + smp_wmb();
> >>
> >> I'm not quite understand what this barrier do and why it needed.
> >> And if it's really needed there should be a pairing barrier
> >> on the other side which I don't see.
> > 
> > Mark might be better able to answer this, but my understanding is that
> > we want to make sure that we never have a situation where the writes are
> > reordered so that PTE is installed before all the poisioning is written
> > out. I think it follows the logic in __pte_alloc() in mm/memory.c:
> > 
> > /*
> >  * Ensure all pte setup (eg. pte page lock and page clearing) are
> >  * visible before the pte is made visible to other CPUs by being
> >  * put into page tables.
> >  *
> >  * The other side of the story is the pointer chasing in the page
> >  * table walking code (when walking the page table without locking;
> >  * ie. most of the time). Fortunately, these data accesses consist
> >  * of a chain of data-dependent loads, meaning most CPUs (alpha
> >  * being the notable exception) will already guarantee loads are
> >  * seen in-order. See the alpha page table accessors for the
> >  * smp_read_barrier_depends() barriers in page table walking code.
> >  */
> > smp_wmb(); /* Could be smp_wmb__xxx(before|after)_spin_lock */
> > 
> > I can clarify the comment.
> 
> I don't see how is this relevant here.

The problem isn't quite the same, but it's a similar shape. See below
for more details.

> barrier in __pte_alloc() for very the following case:
> 
> CPU 0 CPU 1
> __pte_alloc():  
> pte_offset_kernel(pmd_t * dir, unsigned long address):
>  pgtable_t new = pte_alloc_one(mm);pte_t *new = 
> (pte_t *) pmd_page_vaddr(*dir) + ((address >> PAGE_SHIFT) & (PTRS_PER_PAGE - 
> 1));  
>  smp_wmb();
> smp_read_barrier_depends();
>  pmd_populate(mm, pmd, new);
>   /* do something with 
> pte, e.g. check if (pte_none(*new)) */
> 
> 
> It's needed to ensure that if CPU1 sees pmd_populate() it also sees 
> initialized contents of the 'new'.
> 
> In our case the barrier would have been needed if we had the other side like 
> this:
> 
> if (!pte_none(*vmalloc_shadow_pte)) {
>   shadow_addr = (unsigned long)__va(pte_pfn(*vmalloc_shadow_pte) << 
> PAGE_SHIFT);
>   smp_read_barrier_depends();
>   *shadow_addr; /* read the shadow, barrier ensures that if we see 
> installed pte, we will see initialized shadow memory. */
> }
> 
> 
> Without such other side the barrier is pointless.

The barrier isn't pointless, but we are relying on a subtlety that is
not captured in LKMM, as one of the observers involved is the TLB (and
associated page table walkers) of the CPU.

Until the PTE written by CPU 0 has been observed by the TLB of CPU 1, it
is not possible for CPU 1 to satisfy loads from the memory that PTE
maps, as it doesn't yet know which memory that is.

Once the PTE written by CPU has been observed by the TLB of CPU 1, it is
possible for CPU 1 to satisfy those loads. At this instant, CPU 1 must
respect the smp_wmb() before the PTE was written, and hence sees zeroes
written before this. Note that if this were not true, we could not
safely swap userspace memory.

There is the risk (as laid out in [1]) that CPU 1 attempts to hoist the
loads of the shadow memory above the load of the PTE, samples a stale
(faulting) status from the TLB, then performs the load of the PTE and
sees a valid value. In this case (on arm64) a spurious fault could be
taken when the access is architecturally performed.

It is possible on arm64 to use a barrier here to prevent the spurious
fault, but this is not smp_read_barrier_depends(), as that does nothing
for everyone but alpha. On arm64 We have a spurious fault handler to fix
this up.

Thanks,
Mark.

[1] 
https://lore.kernel.org/linux-arm-kernel/20190827131818.14724-1-w...@kernel.org/
[2] 
https://lore.kernel.org/linux-mm/20191014152717.ga20...@lakrids.cambridge.arm.com/


Re: [PATCH v8 1/5] kasan: support backing vmalloc space with real shadow memory

2019-10-16 Thread Andrey Ryabinin


On 10/14/19 4:57 PM, Daniel Axtens wrote:
> Hi Andrey,
> 
> 
>>> +   /*
>>> +* Ensure poisoning is visible before the shadow is made visible
>>> +* to other CPUs.
>>> +*/
>>> +   smp_wmb();
>>
>> I'm not quite understand what this barrier do and why it needed.
>> And if it's really needed there should be a pairing barrier
>> on the other side which I don't see.
> 
> Mark might be better able to answer this, but my understanding is that
> we want to make sure that we never have a situation where the writes are
> reordered so that PTE is installed before all the poisioning is written
> out. I think it follows the logic in __pte_alloc() in mm/memory.c:
> 
>   /*
>* Ensure all pte setup (eg. pte page lock and page clearing) are
>* visible before the pte is made visible to other CPUs by being
>* put into page tables.
>*
>* The other side of the story is the pointer chasing in the page
>* table walking code (when walking the page table without locking;
>* ie. most of the time). Fortunately, these data accesses consist
>* of a chain of data-dependent loads, meaning most CPUs (alpha
>* being the notable exception) will already guarantee loads are
>* seen in-order. See the alpha page table accessors for the
>* smp_read_barrier_depends() barriers in page table walking code.
>*/
>   smp_wmb(); /* Could be smp_wmb__xxx(before|after)_spin_lock */
> 
> I can clarify the comment.
> 

I don't see how is this relevant here.

barrier in __pte_alloc() for very the following case:

CPU 0   CPU 1
__pte_alloc():  pte_offset_kernel(pmd_t 
* dir, unsigned long address):
 pgtable_t new = pte_alloc_one(mm);pte_t *new = 
(pte_t *) pmd_page_vaddr(*dir) + ((address >> PAGE_SHIFT) & (PTRS_PER_PAGE - 
1));  
 smp_wmb();
smp_read_barrier_depends();
 pmd_populate(mm, pmd, new);
/* do something with 
pte, e.g. check if (pte_none(*new)) */


It's needed to ensure that if CPU1 sees pmd_populate() it also sees initialized 
contents of the 'new'.

In our case the barrier would have been needed if we had the other side like 
this:

if (!pte_none(*vmalloc_shadow_pte)) {
shadow_addr = (unsigned long)__va(pte_pfn(*vmalloc_shadow_pte) << 
PAGE_SHIFT);
smp_read_barrier_depends();
*shadow_addr; /* read the shadow, barrier ensures that if we see 
installed pte, we will see initialized shadow memory. */
}


Without such other side the barrier is pointless.


Re: [PATCH v8 1/5] kasan: support backing vmalloc space with real shadow memory

2019-10-15 Thread Daniel Axtens


> There is a potential problem here, as Will Deacon wrote up at:
>
>   
> https://lore.kernel.org/linux-arm-kernel/20190827131818.14724-1-w...@kernel.org/
>
> ... in the section starting:
>
> | *** Other architecture maintainers -- start here! ***
>
> ... whereby the CPU can spuriously fault on an access after observing a
> valid PTE.
>
> For arm64 we handle the spurious fault, and it looks like x86 would need
> something like its vmalloc_fault() applying to the shadow region to
> cater for this.

I'm not really up on x86 - my first thought would be that their stronger
memory ordering might be sufficient but I really don't know. Reading the
thread I see arm and powerpc discussions but nothing from anyone else,
so I'm none the wiser there...

Andy, do you have any thoughts?

Regards,
Daniel

>
> Thanks,
> Mark.


Re: [PATCH v8 1/5] kasan: support backing vmalloc space with real shadow memory

2019-10-15 Thread Daniel Axtens
>>> @@ -2497,6 +2533,9 @@ void *__vmalloc_node_range(unsigned long size, 
>>> unsigned long align,
>>> if (!addr)
>>> return NULL;
>>>  
>>> +   if (kasan_populate_vmalloc(real_size, area))
>>> +   return NULL;
>>> +
>>
>> KASAN itself uses __vmalloc_node_range() to allocate and map shadow in 
>> memory online callback.
>> So we should either skip non-vmalloc and non-module addresses here or teach 
>> kasan's memory online/offline
>> callbacks to not use __vmalloc_node_range() (do something similar to 
>> kasan_populate_vmalloc() perhaps?). 
>
> Ah, right you are. I haven't been testing that.
>
> I am a bit nervous about further restricting kasan_populate_vmalloc: I
> seem to remember having problems with code using the vmalloc family of
> functions to map memory that doesn't lie within vmalloc space but which
> still has instrumented accesses.

I was wrong or remembering early implementation bugs.

If the memory we're allocating in __vmalloc_node_range falls outside of
vmalloc and module space, it shouldn't be getting shadow mapped for it
by kasan_populate_vmalloc. For v9, I've guarded the call with
is_vmalloc_or_module. It seems to work fine when tested with hotplugged
memory.

Thanks again.

Regards,
Daniel

> On the other hand, I'm not keen on rewriting any of the memory
> on/offline code if I can avoid it!
>
> I'll have a look and get back you as soon as I can.
>
> Thanks for catching this.
>
> Kind regards,
> Daniel
>
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "kasan-dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to kasan-dev+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/kasan-dev/352cb4fa-2e57-7e3b-23af-898e113bbe22%40virtuozzo.com.


Re: [PATCH v8 1/5] kasan: support backing vmalloc space with real shadow memory

2019-10-15 Thread Daniel Axtens
Mark Rutland  writes:

> On Tue, Oct 01, 2019 at 04:58:30PM +1000, Daniel Axtens wrote:
>> Hook into vmalloc and vmap, and dynamically allocate real shadow
>> memory to back the mappings.
>> 
>> Most mappings in vmalloc space are small, requiring less than a full
>> page of shadow space. Allocating a full shadow page per mapping would
>> therefore be wasteful. Furthermore, to ensure that different mappings
>> use different shadow pages, mappings would have to be aligned to
>> KASAN_SHADOW_SCALE_SIZE * PAGE_SIZE.
>> 
>> Instead, share backing space across multiple mappings. Allocate a
>> backing page when a mapping in vmalloc space uses a particular page of
>> the shadow region. This page can be shared by other vmalloc mappings
>> later on.
>> 
>> We hook in to the vmap infrastructure to lazily clean up unused shadow
>> memory.
>> 
>> To avoid the difficulties around swapping mappings around, this code
>> expects that the part of the shadow region that covers the vmalloc
>> space will not be covered by the early shadow page, but will be left
>> unmapped. This will require changes in arch-specific code.
>> 
>> This allows KASAN with VMAP_STACK, and may be helpful for architectures
>> that do not have a separate module space (e.g. powerpc64, which I am
>> currently working on). It also allows relaxing the module alignment
>> back to PAGE_SIZE.
>> 
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202009
>> Acked-by: Vasily Gorbik 
>> Signed-off-by: Daniel Axtens 
>> [Mark: rework shadow allocation]
>> Signed-off-by: Mark Rutland 
>
> Sorry to point this out so late, but your S-o-B should come last in the
> chain per Documentation/process/submitting-patches.rst. Judging by the
> rest of that, I think you want something like:
>
> Co-developed-by: Mark Rutland 
> Signed-off-by: Mark Rutland  [shadow rework]
> Signed-off-by: Daniel Axtens 
>
> ... leaving yourself as the Author in the headers.

no worries, I wasn't really sure how best to arrange them, so thanks for
clarifying!

>
> Sorry to have made that more complicated!
>
> [...]
>
>> +static int kasan_depopulate_vmalloc_pte(pte_t *ptep, unsigned long addr,
>> +void *unused)
>> +{
>> +unsigned long page;
>> +
>> +page = (unsigned long)__va(pte_pfn(*ptep) << PAGE_SHIFT);
>> +
>> +spin_lock(_mm.page_table_lock);
>> +
>> +if (likely(!pte_none(*ptep))) {
>> +pte_clear(_mm, addr, ptep);
>> +free_page(page);
>> +}
>
> There should be TLB maintenance between clearing the PTE and freeing the
> page here.

Fixed for v9.

Regards,
Daniel

>
> Thanks,
> Mark.


Re: [PATCH v8 1/5] kasan: support backing vmalloc space with real shadow memory

2019-10-14 Thread Mark Rutland
On Tue, Oct 01, 2019 at 04:58:30PM +1000, Daniel Axtens wrote:
> Hook into vmalloc and vmap, and dynamically allocate real shadow
> memory to back the mappings.
> 
> Most mappings in vmalloc space are small, requiring less than a full
> page of shadow space. Allocating a full shadow page per mapping would
> therefore be wasteful. Furthermore, to ensure that different mappings
> use different shadow pages, mappings would have to be aligned to
> KASAN_SHADOW_SCALE_SIZE * PAGE_SIZE.
> 
> Instead, share backing space across multiple mappings. Allocate a
> backing page when a mapping in vmalloc space uses a particular page of
> the shadow region. This page can be shared by other vmalloc mappings
> later on.
> 
> We hook in to the vmap infrastructure to lazily clean up unused shadow
> memory.
> 
> To avoid the difficulties around swapping mappings around, this code
> expects that the part of the shadow region that covers the vmalloc
> space will not be covered by the early shadow page, but will be left
> unmapped. This will require changes in arch-specific code.
> 
> This allows KASAN with VMAP_STACK, and may be helpful for architectures
> that do not have a separate module space (e.g. powerpc64, which I am
> currently working on). It also allows relaxing the module alignment
> back to PAGE_SIZE.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202009
> Acked-by: Vasily Gorbik 
> Signed-off-by: Daniel Axtens 
> [Mark: rework shadow allocation]
> Signed-off-by: Mark Rutland 

Sorry to point this out so late, but your S-o-B should come last in the
chain per Documentation/process/submitting-patches.rst. Judging by the
rest of that, I think you want something like:

Co-developed-by: Mark Rutland 
Signed-off-by: Mark Rutland  [shadow rework]
Signed-off-by: Daniel Axtens 

... leaving yourself as the Author in the headers.

Sorry to have made that more complicated!

[...]

> +static int kasan_depopulate_vmalloc_pte(pte_t *ptep, unsigned long addr,
> + void *unused)
> +{
> + unsigned long page;
> +
> + page = (unsigned long)__va(pte_pfn(*ptep) << PAGE_SHIFT);
> +
> + spin_lock(_mm.page_table_lock);
> +
> + if (likely(!pte_none(*ptep))) {
> + pte_clear(_mm, addr, ptep);
> + free_page(page);
> + }

There should be TLB maintenance between clearing the PTE and freeing the
page here.

Thanks,
Mark.


Re: [PATCH v8 1/5] kasan: support backing vmalloc space with real shadow memory

2019-10-14 Thread Mark Rutland
On Tue, Oct 15, 2019 at 12:57:44AM +1100, Daniel Axtens wrote:
> Hi Andrey,
> 
> 
> >> +  /*
> >> +   * Ensure poisoning is visible before the shadow is made visible
> >> +   * to other CPUs.
> >> +   */
> >> +  smp_wmb();
> >
> > I'm not quite understand what this barrier do and why it needed.
> > And if it's really needed there should be a pairing barrier
> > on the other side which I don't see.
> 
> Mark might be better able to answer this, but my understanding is that
> we want to make sure that we never have a situation where the writes are
> reordered so that PTE is installed before all the poisioning is written
> out. I think it follows the logic in __pte_alloc() in mm/memory.c:
> 
>   /*
>* Ensure all pte setup (eg. pte page lock and page clearing) are
>* visible before the pte is made visible to other CPUs by being
>* put into page tables.

Yup. We need to ensure that if a thread sees a populated shadow PTE, the
corresponding shadow memory has been zeroed. Thus, we need to ensure
that the zeroing is observed by other CPUs before we update the PTE.

We're relying on the absence of a TLB entry preventing another CPU from
loading the corresponding shadow shadow memory until its PTE has been
populated (after the zeroing is visible). Consequently there is no
barrier on the other side, and just a control-dependency (which would be
insufficient on its own).

There is a potential problem here, as Will Deacon wrote up at:

  
https://lore.kernel.org/linux-arm-kernel/20190827131818.14724-1-w...@kernel.org/

... in the section starting:

| *** Other architecture maintainers -- start here! ***

... whereby the CPU can spuriously fault on an access after observing a
valid PTE.

For arm64 we handle the spurious fault, and it looks like x86 would need
something like its vmalloc_fault() applying to the shadow region to
cater for this.

Thanks,
Mark.


Re: [PATCH v8 1/5] kasan: support backing vmalloc space with real shadow memory

2019-10-14 Thread Daniel Axtens
Hi Andrey,


>> +/*
>> + * Ensure poisoning is visible before the shadow is made visible
>> + * to other CPUs.
>> + */
>> +smp_wmb();
>
> I'm not quite understand what this barrier do and why it needed.
> And if it's really needed there should be a pairing barrier
> on the other side which I don't see.

Mark might be better able to answer this, but my understanding is that
we want to make sure that we never have a situation where the writes are
reordered so that PTE is installed before all the poisioning is written
out. I think it follows the logic in __pte_alloc() in mm/memory.c:

/*
 * Ensure all pte setup (eg. pte page lock and page clearing) are
 * visible before the pte is made visible to other CPUs by being
 * put into page tables.
 *
 * The other side of the story is the pointer chasing in the page
 * table walking code (when walking the page table without locking;
 * ie. most of the time). Fortunately, these data accesses consist
 * of a chain of data-dependent loads, meaning most CPUs (alpha
 * being the notable exception) will already guarantee loads are
 * seen in-order. See the alpha page table accessors for the
 * smp_read_barrier_depends() barriers in page table walking code.
 */
smp_wmb(); /* Could be smp_wmb__xxx(before|after)_spin_lock */

I can clarify the comment.

>> +
>> +spin_lock(_mm.page_table_lock);
>> +if (likely(pte_none(*ptep))) {
>> +set_pte_at(_mm, addr, ptep, pte);
>> +page = 0;
>> +}
>> +spin_unlock(_mm.page_table_lock);
>> +if (page)
>> +free_page(page);
>> +return 0;
>> +}
>> +
>
>
> ...
>
>> @@ -754,6 +769,8 @@ merge_or_add_vmap_area(struct vmap_area *va,
>>  }
>>  
>>  insert:
>> +kasan_release_vmalloc(orig_start, orig_end, va->va_start, va->va_end);
>> +
>>  if (!merged) {
>>  link_va(va, root, parent, link, head);
>>  augment_tree_propagate_from(va);
>> @@ -2068,6 +2085,22 @@ static struct vm_struct *__get_vm_area_node(unsigned 
>> long size,
>>  
>>  setup_vmalloc_vm(area, va, flags, caller);
>>  
>> +/*
>> + * For KASAN, if we are in vmalloc space, we need to cover the shadow
>> + * area with real memory. If we come here through VM_ALLOC, this is
>> + * done by a higher level function that has access to the true size,
>> + * which might not be a full page.
>> + *
>> + * We assume module space comes via VM_ALLOC path.
>> + */
>> +if (is_vmalloc_addr(area->addr) && !(area->flags & VM_ALLOC)) {
>> +if (kasan_populate_vmalloc(area->size, area)) {
>> +unmap_vmap_area(va);
>> +kfree(area);
>> +return NULL;
>> +}
>> +}
>> +
>>  return area;
>>  }
>>  
>> @@ -2245,6 +2278,9 @@ static void __vunmap(const void *addr, int 
>> deallocate_pages)
>>  debug_check_no_locks_freed(area->addr, get_vm_area_size(area));
>>  debug_check_no_obj_freed(area->addr, get_vm_area_size(area));
>>  
>> +if (area->flags & VM_KASAN)
>> +kasan_poison_vmalloc(area->addr, area->size);
>> +
>>  vm_remove_mappings(area, deallocate_pages);
>>  
>>  if (deallocate_pages) {
>> @@ -2497,6 +2533,9 @@ void *__vmalloc_node_range(unsigned long size, 
>> unsigned long align,
>>  if (!addr)
>>  return NULL;
>>  
>> +if (kasan_populate_vmalloc(real_size, area))
>> +return NULL;
>> +
>
> KASAN itself uses __vmalloc_node_range() to allocate and map shadow in memory 
> online callback.
> So we should either skip non-vmalloc and non-module addresses here or teach 
> kasan's memory online/offline
> callbacks to not use __vmalloc_node_range() (do something similar to 
> kasan_populate_vmalloc() perhaps?). 

Ah, right you are. I haven't been testing that.

I am a bit nervous about further restricting kasan_populate_vmalloc: I
seem to remember having problems with code using the vmalloc family of
functions to map memory that doesn't lie within vmalloc space but which
still has instrumented accesses.

On the other hand, I'm not keen on rewriting any of the memory
on/offline code if I can avoid it!

I'll have a look and get back you as soon as I can.

Thanks for catching this.

Kind regards,
Daniel

>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to kasan-dev+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/kasan-dev/352cb4fa-2e57-7e3b-23af-898e113bbe22%40virtuozzo.com.


Re: [PATCH v8 1/5] kasan: support backing vmalloc space with real shadow memory

2019-10-11 Thread Andrey Ryabinin



On 10/1/19 9:58 AM, Daniel Axtens wrote:
 
>  core_initcall(kasan_memhotplug_init);
>  #endif
> +
> +#ifdef CONFIG_KASAN_VMALLOC
> +static int kasan_populate_vmalloc_pte(pte_t *ptep, unsigned long addr,
> +   void *unused)
> +{
> + unsigned long page;
> + pte_t pte;
> +
> + if (likely(!pte_none(*ptep)))
> + return 0;
> +
> + page = __get_free_page(GFP_KERNEL);
> + if (!page)
> + return -ENOMEM;
> +
> + memset((void *)page, KASAN_VMALLOC_INVALID, PAGE_SIZE);
> + pte = pfn_pte(PFN_DOWN(__pa(page)), PAGE_KERNEL);
> +
> + /*
> +  * Ensure poisoning is visible before the shadow is made visible
> +  * to other CPUs.
> +  */
> + smp_wmb();

I'm not quite understand what this barrier do and why it needed.
And if it's really needed there should be a pairing barrier
on the other side which I don't see.

> +
> + spin_lock(_mm.page_table_lock);
> + if (likely(pte_none(*ptep))) {
> + set_pte_at(_mm, addr, ptep, pte);
> + page = 0;
> + }
> + spin_unlock(_mm.page_table_lock);
> + if (page)
> + free_page(page);
> + return 0;
> +}
> +


...

> @@ -754,6 +769,8 @@ merge_or_add_vmap_area(struct vmap_area *va,
>   }
>  
>  insert:
> + kasan_release_vmalloc(orig_start, orig_end, va->va_start, va->va_end);
> +
>   if (!merged) {
>   link_va(va, root, parent, link, head);
>   augment_tree_propagate_from(va);
> @@ -2068,6 +2085,22 @@ static struct vm_struct *__get_vm_area_node(unsigned 
> long size,
>  
>   setup_vmalloc_vm(area, va, flags, caller);
>  
> + /*
> +  * For KASAN, if we are in vmalloc space, we need to cover the shadow
> +  * area with real memory. If we come here through VM_ALLOC, this is
> +  * done by a higher level function that has access to the true size,
> +  * which might not be a full page.
> +  *
> +  * We assume module space comes via VM_ALLOC path.
> +  */
> + if (is_vmalloc_addr(area->addr) && !(area->flags & VM_ALLOC)) {
> + if (kasan_populate_vmalloc(area->size, area)) {
> + unmap_vmap_area(va);
> + kfree(area);
> + return NULL;
> + }
> + }
> +
>   return area;
>  }
>  
> @@ -2245,6 +2278,9 @@ static void __vunmap(const void *addr, int 
> deallocate_pages)
>   debug_check_no_locks_freed(area->addr, get_vm_area_size(area));
>   debug_check_no_obj_freed(area->addr, get_vm_area_size(area));
>  
> + if (area->flags & VM_KASAN)
> + kasan_poison_vmalloc(area->addr, area->size);
> +
>   vm_remove_mappings(area, deallocate_pages);
>  
>   if (deallocate_pages) {
> @@ -2497,6 +2533,9 @@ void *__vmalloc_node_range(unsigned long size, unsigned 
> long align,
>   if (!addr)
>   return NULL;
>  
> + if (kasan_populate_vmalloc(real_size, area))
> + return NULL;
> +

KASAN itself uses __vmalloc_node_range() to allocate and map shadow in memory 
online callback.
So we should either skip non-vmalloc and non-module addresses here or teach 
kasan's memory online/offline
callbacks to not use __vmalloc_node_range() (do something similar to 
kasan_populate_vmalloc() perhaps?). 


Re: [PATCH v8 1/5] kasan: support backing vmalloc space with real shadow memory

2019-10-10 Thread Daniel Axtens
Hi Uladzislau,


> Looking at it one more, i think above part of code is a bit wrong
> and should be separated from merge_or_add_vmap_area() logic. The
> reason is to keep it simple and do only what it is supposed to do:
> merging or adding.
>
> Also the kasan_release_vmalloc() gets called twice there and looks like
> a duplication. Apart of that, merge_or_add_vmap_area() can be called via
> recovery path when vmap/vmaps is/are not even setup. See percpu
> allocator.
>
> I guess your part could be moved directly to the __purge_vmap_area_lazy()
> where all vmaps are lazily freed. To do so, we also need to modify
> merge_or_add_vmap_area() to return merged area:

Thanks for the review. I've integrated your snippet - it seems to work
fine, and I agree that it is much simpler and clearer. so I've rolled it
in to v9 which I will post soon.

Regards,
Daniel

>
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index e92ff5f7dd8b..fecde4312d68 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -683,7 +683,7 @@ insert_vmap_area_augment(struct vmap_area *va,
>   * free area is inserted. If VA has been merged, it is
>   * freed.
>   */
> -static __always_inline void
> +static __always_inline struct vmap_area *
>  merge_or_add_vmap_area(struct vmap_area *va,
> struct rb_root *root, struct list_head *head)
>  {
> @@ -750,7 +750,10 @@ merge_or_add_vmap_area(struct vmap_area *va,
>  
> /* Free vmap_area object. */
> kmem_cache_free(vmap_area_cachep, va);
> -   return;
> +
> +   /* Point to the new merged area. */
> +   va = sibling;
> +   merged = true;
> }
> }
>  
> @@ -759,6 +762,8 @@ merge_or_add_vmap_area(struct vmap_area *va,
> link_va(va, root, parent, link, head);
> augment_tree_propagate_from(va);
> }
> +
> +   return va;
>  }
>  
>  static __always_inline bool
> @@ -1172,7 +1177,7 @@ static void __free_vmap_area(struct vmap_area *va)
> /*
>  * Merge VA with its neighbors, otherwise just add it.
>  */
> -   merge_or_add_vmap_area(va,
> +   (void) merge_or_add_vmap_area(va,
> _vmap_area_root, _vmap_area_list);
>  }
>  
> @@ -1279,15 +1284,20 @@ static bool __purge_vmap_area_lazy(unsigned long 
> start, unsigned long end)
> spin_lock(_area_lock);
> llist_for_each_entry_safe(va, n_va, valist, purge_list) {
> unsigned long nr = (va->va_end - va->va_start) >> PAGE_SHIFT;
> +   unsigned long orig_start = va->va_start;
> +   unsigned long orig_end = va->va_end;
>  
> /*
>  * Finally insert or merge lazily-freed area. It is
>  * detached and there is no need to "unlink" it from
>  * anything.
>  */
> -   merge_or_add_vmap_area(va,
> +   va = merge_or_add_vmap_area(va,
> _vmap_area_root, _vmap_area_list);
>  
> +   kasan_release_vmalloc(orig_start,
> +   orig_end, va->va_start, va->va_end);
> +
> atomic_long_sub(nr, _lazy_nr);
>  
> if (atomic_long_read(_lazy_nr) < resched_threshold)
> 
>
> --
> Vlad Rezki


Re: [PATCH v8 1/5] kasan: support backing vmalloc space with real shadow memory

2019-10-07 Thread Uladzislau Rezki
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index a3c70e275f4e..9fb7a16f42ae 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -690,8 +690,19 @@ merge_or_add_vmap_area(struct vmap_area *va,
>   struct list_head *next;
>   struct rb_node **link;
>   struct rb_node *parent;
> + unsigned long orig_start, orig_end;
>   bool merged = false;
>  
> + /*
> +  * To manage KASAN vmalloc memory usage, we use this opportunity to
> +  * clean up the shadow memory allocated to back this allocation.
> +  * Because a vmalloc shadow page covers several pages, the start or end
> +  * of an allocation might not align with a shadow page. Use the merging
> +  * opportunities to try to extend the region we can release.
> +  */
> + orig_start = va->va_start;
> + orig_end = va->va_end;
> +
>   /*
>* Find a place in the tree where VA potentially will be
>* inserted, unless it is merged with its sibling/siblings.
> @@ -741,6 +752,10 @@ merge_or_add_vmap_area(struct vmap_area *va,
>   if (sibling->va_end == va->va_start) {
>   sibling->va_end = va->va_end;
>  
> + kasan_release_vmalloc(orig_start, orig_end,
> +   sibling->va_start,
> +   sibling->va_end);
> +
>   /* Check and update the tree if needed. */
>   augment_tree_propagate_from(sibling);
>  
> @@ -754,6 +769,8 @@ merge_or_add_vmap_area(struct vmap_area *va,
>   }
>  
>  insert:
> + kasan_release_vmalloc(orig_start, orig_end, va->va_start, va->va_end);
> +
>   if (!merged) {
>   link_va(va, root, parent, link, head);
>   augment_tree_propagate_from(va);
Hello, Daniel.

Looking at it one more, i think above part of code is a bit wrong
and should be separated from merge_or_add_vmap_area() logic. The
reason is to keep it simple and do only what it is supposed to do:
merging or adding.

Also the kasan_release_vmalloc() gets called twice there and looks like
a duplication. Apart of that, merge_or_add_vmap_area() can be called via
recovery path when vmap/vmaps is/are not even setup. See percpu
allocator.

I guess your part could be moved directly to the __purge_vmap_area_lazy()
where all vmaps are lazily freed. To do so, we also need to modify
merge_or_add_vmap_area() to return merged area:


diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index e92ff5f7dd8b..fecde4312d68 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -683,7 +683,7 @@ insert_vmap_area_augment(struct vmap_area *va,
  * free area is inserted. If VA has been merged, it is
  * freed.
  */
-static __always_inline void
+static __always_inline struct vmap_area *
 merge_or_add_vmap_area(struct vmap_area *va,
struct rb_root *root, struct list_head *head)
 {
@@ -750,7 +750,10 @@ merge_or_add_vmap_area(struct vmap_area *va,
 
/* Free vmap_area object. */
kmem_cache_free(vmap_area_cachep, va);
-   return;
+
+   /* Point to the new merged area. */
+   va = sibling;
+   merged = true;
}
}
 
@@ -759,6 +762,8 @@ merge_or_add_vmap_area(struct vmap_area *va,
link_va(va, root, parent, link, head);
augment_tree_propagate_from(va);
}
+
+   return va;
 }
 
 static __always_inline bool
@@ -1172,7 +1177,7 @@ static void __free_vmap_area(struct vmap_area *va)
/*
 * Merge VA with its neighbors, otherwise just add it.
 */
-   merge_or_add_vmap_area(va,
+   (void) merge_or_add_vmap_area(va,
_vmap_area_root, _vmap_area_list);
 }
 
@@ -1279,15 +1284,20 @@ static bool __purge_vmap_area_lazy(unsigned long start, 
unsigned long end)
spin_lock(_area_lock);
llist_for_each_entry_safe(va, n_va, valist, purge_list) {
unsigned long nr = (va->va_end - va->va_start) >> PAGE_SHIFT;
+   unsigned long orig_start = va->va_start;
+   unsigned long orig_end = va->va_end;
 
/*
 * Finally insert or merge lazily-freed area. It is
 * detached and there is no need to "unlink" it from
 * anything.
 */
-   merge_or_add_vmap_area(va,
+   va = merge_or_add_vmap_area(va,
_vmap_area_root, _vmap_area_list);
 
+   kasan_release_vmalloc(orig_start,
+   orig_end, va->va_start, va->va_end);
+
atomic_long_sub(nr, _lazy_nr);
 
if (atomic_long_read(_lazy_nr) < resched_threshold)


--
Vlad Rezki


Re: [PATCH v8 1/5] kasan: support backing vmalloc space with real shadow memory

2019-10-02 Thread Uladzislau Rezki
On Wed, Oct 02, 2019 at 11:23:06AM +1000, Daniel Axtens wrote:
> Hi,
> 
> >>/*
> >> * Find a place in the tree where VA potentially will be
> >> * inserted, unless it is merged with its sibling/siblings.
> >> @@ -741,6 +752,10 @@ merge_or_add_vmap_area(struct vmap_area *va,
> >>if (sibling->va_end == va->va_start) {
> >>sibling->va_end = va->va_end;
> >>  
> >> +  kasan_release_vmalloc(orig_start, orig_end,
> >> +sibling->va_start,
> >> +sibling->va_end);
> >> +
> > The same.
> 
> The call to kasan_release_vmalloc() is a static inline no-op if
> CONFIG_KASAN_VMALLOC is not defined, which I thought was the preferred
> way to do things rather than sprinkling the code with ifdefs?
> 
I agree that is totally correct.

> The complier should be smart enough to eliminate all the
> orig_state/orig_end stuff at compile time because it can see that it's
> not used, so there's no cost in the binary.
> 
It should. I was more thinking about if those two variables can be
considered as unused, resulting in compile warning like "set but not used".
But that is theory and in case of having any warning the test robot will
notify anyway about that.

So, i am totally fine with that if compiler does not complain. If so,
please ignore my comments :)

--
Vlad Rezki


Re: [PATCH v8 1/5] kasan: support backing vmalloc space with real shadow memory

2019-10-02 Thread Christophe Leroy

Daniel Axtens  a écrit :


Hi,


/*
 * Find a place in the tree where VA potentially will be
 * inserted, unless it is merged with its sibling/siblings.
@@ -741,6 +752,10 @@ merge_or_add_vmap_area(struct vmap_area *va,
if (sibling->va_end == va->va_start) {
sibling->va_end = va->va_end;

+   kasan_release_vmalloc(orig_start, orig_end,
+ sibling->va_start,
+ sibling->va_end);
+

The same.


The call to kasan_release_vmalloc() is a static inline no-op if
CONFIG_KASAN_VMALLOC is not defined, which I thought was the preferred
way to do things rather than sprinkling the code with ifdefs?

The complier should be smart enough to eliminate all the
orig_state/orig_end stuff at compile time because it can see that it's
not used, so there's no cost in the binary.




Daniel,

You are entirely right in your way to do i, that's fully in line with  
Linux kernel codying style  
https://www.kernel.org/doc/html/latest/process/coding-style.html#conditional-compilation


Christophe



Re: [PATCH v8 1/5] kasan: support backing vmalloc space with real shadow memory

2019-10-01 Thread Daniel Axtens
Hi,

>>  /*
>>   * Find a place in the tree where VA potentially will be
>>   * inserted, unless it is merged with its sibling/siblings.
>> @@ -741,6 +752,10 @@ merge_or_add_vmap_area(struct vmap_area *va,
>>  if (sibling->va_end == va->va_start) {
>>  sibling->va_end = va->va_end;
>>  
>> +kasan_release_vmalloc(orig_start, orig_end,
>> +  sibling->va_start,
>> +  sibling->va_end);
>> +
> The same.

The call to kasan_release_vmalloc() is a static inline no-op if
CONFIG_KASAN_VMALLOC is not defined, which I thought was the preferred
way to do things rather than sprinkling the code with ifdefs?

The complier should be smart enough to eliminate all the
orig_state/orig_end stuff at compile time because it can see that it's
not used, so there's no cost in the binary.

Regards,
Daniel


Re: [PATCH v8 1/5] kasan: support backing vmalloc space with real shadow memory

2019-10-01 Thread Uladzislau Rezki
Hello, Daniel.

> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index a3c70e275f4e..9fb7a16f42ae 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -690,8 +690,19 @@ merge_or_add_vmap_area(struct vmap_area *va,
>   struct list_head *next;
>   struct rb_node **link;
>   struct rb_node *parent;
> + unsigned long orig_start, orig_end;
Shouldn't that be wrapped around #ifdef CONFIG_KASAN_VMALLOC?

>   bool merged = false;
>  
> + /*
> +  * To manage KASAN vmalloc memory usage, we use this opportunity to
> +  * clean up the shadow memory allocated to back this allocation.
> +  * Because a vmalloc shadow page covers several pages, the start or end
> +  * of an allocation might not align with a shadow page. Use the merging
> +  * opportunities to try to extend the region we can release.
> +  */
> + orig_start = va->va_start;
> + orig_end = va->va_end;
> +
The same.

>   /*
>* Find a place in the tree where VA potentially will be
>* inserted, unless it is merged with its sibling/siblings.
> @@ -741,6 +752,10 @@ merge_or_add_vmap_area(struct vmap_area *va,
>   if (sibling->va_end == va->va_start) {
>   sibling->va_end = va->va_end;
>  
> + kasan_release_vmalloc(orig_start, orig_end,
> +   sibling->va_start,
> +   sibling->va_end);
> +
The same.

>   /* Check and update the tree if needed. */
>   augment_tree_propagate_from(sibling);
>  
> @@ -754,6 +769,8 @@ merge_or_add_vmap_area(struct vmap_area *va,
>   }
>  
>  insert:
> + kasan_release_vmalloc(orig_start, orig_end, va->va_start, va->va_end);
> +
The same + all further changes in this file.
>   if (!merged) {
>   link_va(va, root, parent, link, head);
>   augment_tree_propagate_from(va);
> @@ -2068,6 +2085,22 @@ static struct vm_struct *__get_vm_area_node(unsigned 
> long size,
>  
>   setup_vmalloc_vm(area, va, flags, caller);
>  
> + /*
> +  * For KASAN, if we are in vmalloc space, we need to cover the shadow
> +  * area with real memory. If we come here through VM_ALLOC, this is
> +  * done by a higher level function that has access to the true size,
> +  * which might not be a full page.
> +  *
> +  * We assume module space comes via VM_ALLOC path.
> +  */
> + if (is_vmalloc_addr(area->addr) && !(area->flags & VM_ALLOC)) {
> + if (kasan_populate_vmalloc(area->size, area)) {
> + unmap_vmap_area(va);
> + kfree(area);
> + return NULL;
> + }
> + }
> +
>   return area;
>  }
>  
> @@ -2245,6 +2278,9 @@ static void __vunmap(const void *addr, int 
> deallocate_pages)
>   debug_check_no_locks_freed(area->addr, get_vm_area_size(area));
>   debug_check_no_obj_freed(area->addr, get_vm_area_size(area));
>  
> + if (area->flags & VM_KASAN)
> + kasan_poison_vmalloc(area->addr, area->size);
> +
>   vm_remove_mappings(area, deallocate_pages);
>  
>   if (deallocate_pages) {
> @@ -2497,6 +2533,9 @@ void *__vmalloc_node_range(unsigned long size, unsigned 
> long align,
>   if (!addr)
>   return NULL;
>  
> + if (kasan_populate_vmalloc(real_size, area))
> + return NULL;
> +
>   /*
>* In this function, newly allocated vm_struct has VM_UNINITIALIZED
>* flag. It means that vm_struct is not fully initialized.
> @@ -3351,10 +3390,14 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned 
> long *offsets,
>   spin_unlock(_area_lock);
>  
>   /* insert all vm's */
> - for (area = 0; area < nr_vms; area++)
> + for (area = 0; area < nr_vms; area++) {
>   setup_vmalloc_vm(vms[area], vas[area], VM_ALLOC,
>pcpu_get_vm_areas);
>  
> + /* assume success here */
> + kasan_populate_vmalloc(sizes[area], vms[area]);
> + }
> +
>   kfree(vas);
>   return vms;
>  


--
Vlad Rezki


[PATCH v8 1/5] kasan: support backing vmalloc space with real shadow memory

2019-10-01 Thread Daniel Axtens
Hook into vmalloc and vmap, and dynamically allocate real shadow
memory to back the mappings.

Most mappings in vmalloc space are small, requiring less than a full
page of shadow space. Allocating a full shadow page per mapping would
therefore be wasteful. Furthermore, to ensure that different mappings
use different shadow pages, mappings would have to be aligned to
KASAN_SHADOW_SCALE_SIZE * PAGE_SIZE.

Instead, share backing space across multiple mappings. Allocate a
backing page when a mapping in vmalloc space uses a particular page of
the shadow region. This page can be shared by other vmalloc mappings
later on.

We hook in to the vmap infrastructure to lazily clean up unused shadow
memory.

To avoid the difficulties around swapping mappings around, this code
expects that the part of the shadow region that covers the vmalloc
space will not be covered by the early shadow page, but will be left
unmapped. This will require changes in arch-specific code.

This allows KASAN with VMAP_STACK, and may be helpful for architectures
that do not have a separate module space (e.g. powerpc64, which I am
currently working on). It also allows relaxing the module alignment
back to PAGE_SIZE.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=202009
Acked-by: Vasily Gorbik 
Signed-off-by: Daniel Axtens 
[Mark: rework shadow allocation]
Signed-off-by: Mark Rutland 

--

v2: let kasan_unpoison_shadow deal with ranges that do not use a
full shadow byte.

v3: relax module alignment
rename to kasan_populate_vmalloc which is a much better name
deal with concurrency correctly

v4: Mark's rework
Poision pages on vfree
Handle allocation failures

v5: Per Christophe Leroy, split out test and dynamically free pages.

v6: Guard freeing page properly. Drop WARN_ON_ONCE(pte_none(*ptep)),
 on reflection it's unnecessary debugging cruft with too high a
 false positive rate.

v7: tlb flush, thanks Mark.
explain more clearly how freeing works and is concurrency-safe.
---
 Documentation/dev-tools/kasan.rst |  63 +
 include/linux/kasan.h |  31 +
 include/linux/moduleloader.h  |   2 +-
 include/linux/vmalloc.h   |  12 ++
 lib/Kconfig.kasan |  16 +++
 mm/kasan/common.c | 204 ++
 mm/kasan/generic_report.c |   3 +
 mm/kasan/kasan.h  |   1 +
 mm/vmalloc.c  |  45 ++-
 9 files changed, 375 insertions(+), 2 deletions(-)

diff --git a/Documentation/dev-tools/kasan.rst 
b/Documentation/dev-tools/kasan.rst
index b72d07d70239..bdb92c3de7a5 100644
--- a/Documentation/dev-tools/kasan.rst
+++ b/Documentation/dev-tools/kasan.rst
@@ -215,3 +215,66 @@ brk handler is used to print bug reports.
 A potential expansion of this mode is a hardware tag-based mode, which would
 use hardware memory tagging support instead of compiler instrumentation and
 manual shadow memory manipulation.
+
+What memory accesses are sanitised by KASAN?
+
+
+The kernel maps memory in a number of different parts of the address
+space. This poses something of a problem for KASAN, which requires
+that all addresses accessed by instrumented code have a valid shadow
+region.
+
+The range of kernel virtual addresses is large: there is not enough
+real memory to support a real shadow region for every address that
+could be accessed by the kernel.
+
+By default
+~~
+
+By default, architectures only map real memory over the shadow region
+for the linear mapping (and potentially other small areas). For all
+other areas - such as vmalloc and vmemmap space - a single read-only
+page is mapped over the shadow area. This read-only shadow page
+declares all memory accesses as permitted.
+
+This presents a problem for modules: they do not live in the linear
+mapping, but in a dedicated module space. By hooking in to the module
+allocator, KASAN can temporarily map real shadow memory to cover
+them. This allows detection of invalid accesses to module globals, for
+example.
+
+This also creates an incompatibility with ``VMAP_STACK``: if the stack
+lives in vmalloc space, it will be shadowed by the read-only page, and
+the kernel will fault when trying to set up the shadow data for stack
+variables.
+
+CONFIG_KASAN_VMALLOC
+
+
+With ``CONFIG_KASAN_VMALLOC``, KASAN can cover vmalloc space at the
+cost of greater memory usage. Currently this is only supported on x86.
+
+This works by hooking into vmalloc and vmap, and dynamically
+allocating real shadow memory to back the mappings.
+
+Most mappings in vmalloc space are small, requiring less than a full
+page of shadow space. Allocating a full shadow page per mapping would
+therefore be wasteful. Furthermore, to ensure that different mappings
+use different shadow pages, mappings would have to be aligned to
+``KASAN_SHADOW_SCALE_SIZE * PAGE_SIZE``.
+
+Instead, we share backing space across multiple mappings. We