Re: [PATCH v6 7/9] kfence, Documentation: add KFENCE documentation
On Fri, Oct 30, 2020 at 3:50 AM Jann Horn wrote: > > On Thu, Oct 29, 2020 at 2:17 PM Marco Elver wrote: > > Add KFENCE documentation in dev-tools/kfence.rst, and add to index. > [...] > > +The KFENCE memory pool is of fixed size, and if the pool is exhausted, no > > +further KFENCE allocations occur. With ``CONFIG_KFENCE_NUM_OBJECTS`` > > (default > > +255), the number of available guarded objects can be controlled. Each > > object > > +requires 2 pages, one for the object itself and the other one used as a > > guard > > +page; object pages are interleaved with guard pages, and every object page > > is > > +therefore surrounded by two guard pages. > > + > > +The total memory dedicated to the KFENCE memory pool can be computed as:: > > + > > +( #objects + 1 ) * 2 * PAGE_SIZE > > Plus memory overhead from shattered hugepages. With the default object > count, on x86, we allocate 2MiB of memory pool, but if we have to > shatter a 2MiB hugepage for that, we may cause the allocation of one > extra page table, or 4KiB. Of course that's pretty much negligible. > But on arm64 it's worse, because there we have to disable hugepages in > the linear map completely. So on a device with 4GiB memory, we might > end up with something on the order of 4GiB/2MiB * 0x1000 bytes = 8MiB > of extra L1 page tables that wouldn't have been needed otherwise - > significantly more than the default memory pool size. Note that with CONFIG_RODATA_FULL_DEFAULT_ENABLED (which is on by default now) these hugepages are already disabled (see patch 3/9) > If the memory overhead is documented, this detail should probably be > documented, too. But, yes, documenting that also makes sense. > > +Using the default config, and assuming a page size of 4 KiB, results in > > +dedicating 2 MiB to the KFENCE memory pool. > [...] > > +For such errors, the address where the corruption as well as the invalidly > > nit: "the address where the corruption occurred" or "the address of > the corruption" > > > +written bytes (offset from the address) are shown; in this representation, > > '.' > > +denote untouched bytes. In the example above ``0xac`` is the value written > > to > > +the invalid address at offset 0, and the remaining '.' denote that no > > following > > +bytes have been touched. Note that, real values are only shown for > > +``CONFIG_DEBUG_KERNEL=y`` builds; to avoid information disclosure for > > non-debug > > +builds, '!' is used instead to denote invalidly written bytes. > [...] > > +KFENCE objects each reside on a dedicated page, at either the left or right > > +page boundaries selected at random. The pages to the left and right of the > > +object page are "guard pages", whose attributes are changed to a protected > > +state, and cause page faults on any attempted access. Such page faults are > > then > > +intercepted by KFENCE, which handles the fault gracefully by reporting an > > +out-of-bounds access. > > ... and marking the page as accessible so that the faulting code can > continue (wrongly) executing. > > > [...] > > +Interface > > +- > > + > > +The following describes the functions which are used by allocators as well > > page > > nit: "as well as"? > > > > > +handling code to set up and deal with KFENCE allocations. -- Alexander Potapenko Software Engineer Google Germany GmbH Erika-Mann-Straße, 33 80636 München Geschäftsführer: Paul Manicle, Halimah DeLaine Prado Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg
Re: [PATCH v6 7/9] kfence, Documentation: add KFENCE documentation
On Thu, Oct 29, 2020 at 2:17 PM Marco Elver wrote: > Add KFENCE documentation in dev-tools/kfence.rst, and add to index. [...] > +The KFENCE memory pool is of fixed size, and if the pool is exhausted, no > +further KFENCE allocations occur. With ``CONFIG_KFENCE_NUM_OBJECTS`` (default > +255), the number of available guarded objects can be controlled. Each object > +requires 2 pages, one for the object itself and the other one used as a guard > +page; object pages are interleaved with guard pages, and every object page is > +therefore surrounded by two guard pages. > + > +The total memory dedicated to the KFENCE memory pool can be computed as:: > + > +( #objects + 1 ) * 2 * PAGE_SIZE Plus memory overhead from shattered hugepages. With the default object count, on x86, we allocate 2MiB of memory pool, but if we have to shatter a 2MiB hugepage for that, we may cause the allocation of one extra page table, or 4KiB. Of course that's pretty much negligible. But on arm64 it's worse, because there we have to disable hugepages in the linear map completely. So on a device with 4GiB memory, we might end up with something on the order of 4GiB/2MiB * 0x1000 bytes = 8MiB of extra L1 page tables that wouldn't have been needed otherwise - significantly more than the default memory pool size. If the memory overhead is documented, this detail should probably be documented, too. > +Using the default config, and assuming a page size of 4 KiB, results in > +dedicating 2 MiB to the KFENCE memory pool. [...] > +For such errors, the address where the corruption as well as the invalidly nit: "the address where the corruption occurred" or "the address of the corruption" > +written bytes (offset from the address) are shown; in this representation, > '.' > +denote untouched bytes. In the example above ``0xac`` is the value written to > +the invalid address at offset 0, and the remaining '.' denote that no > following > +bytes have been touched. Note that, real values are only shown for > +``CONFIG_DEBUG_KERNEL=y`` builds; to avoid information disclosure for > non-debug > +builds, '!' is used instead to denote invalidly written bytes. [...] > +KFENCE objects each reside on a dedicated page, at either the left or right > +page boundaries selected at random. The pages to the left and right of the > +object page are "guard pages", whose attributes are changed to a protected > +state, and cause page faults on any attempted access. Such page faults are > then > +intercepted by KFENCE, which handles the fault gracefully by reporting an > +out-of-bounds access. ... and marking the page as accessible so that the faulting code can continue (wrongly) executing. [...] > +Interface > +- > + > +The following describes the functions which are used by allocators as well > page nit: "as well as"? > +handling code to set up and deal with KFENCE allocations.
[PATCH v6 7/9] kfence, Documentation: add KFENCE documentation
Add KFENCE documentation in dev-tools/kfence.rst, and add to index. Reviewed-by: Dmitry Vyukov Co-developed-by: Alexander Potapenko Signed-off-by: Alexander Potapenko Signed-off-by: Marco Elver --- v3: * Re-introduce reference to Documentation/dev-tools/kfence.rst. v2: * Many clarifications based on comments from Andrey Konovalov. * Document CONFIG_KFENCE_SAMPLE_INTERVAL=0 usage. * Make use-cases between KASAN and KFENCE clearer. * Be clearer about the fact the pool is fixed size. * Update based on reporting changes. * Explicitly mention max supported allocation size is PAGE_SIZE. --- Documentation/dev-tools/index.rst | 1 + Documentation/dev-tools/kfence.rst | 291 + lib/Kconfig.kfence | 2 + 3 files changed, 294 insertions(+) create mode 100644 Documentation/dev-tools/kfence.rst diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst index f7809c7b1ba9..1b1cf4f5c9d9 100644 --- a/Documentation/dev-tools/index.rst +++ b/Documentation/dev-tools/index.rst @@ -22,6 +22,7 @@ whole; patches welcome! ubsan kmemleak kcsan + kfence gdb-kernel-debugging kgdb kselftest diff --git a/Documentation/dev-tools/kfence.rst b/Documentation/dev-tools/kfence.rst new file mode 100644 index ..f0ee8db1bf87 --- /dev/null +++ b/Documentation/dev-tools/kfence.rst @@ -0,0 +1,291 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Kernel Electric-Fence (KFENCE) +== + +Kernel Electric-Fence (KFENCE) is a low-overhead sampling-based memory safety +error detector. KFENCE detects heap out-of-bounds access, use-after-free, and +invalid-free errors. + +KFENCE is designed to be enabled in production kernels, and has near zero +performance overhead. Compared to KASAN, KFENCE trades performance for +precision. The main motivation behind KFENCE's design, is that with enough +total uptime KFENCE will detect bugs in code paths not typically exercised by +non-production test workloads. One way to quickly achieve a large enough total +uptime is when the tool is deployed across a large fleet of machines. + +Usage +- + +To enable KFENCE, configure the kernel with:: + +CONFIG_KFENCE=y + +To build a kernel with KFENCE support, but disabled by default (to enable, set +``kfence.sample_interval`` to non-zero value), configure the kernel with:: + +CONFIG_KFENCE=y +CONFIG_KFENCE_SAMPLE_INTERVAL=0 + +KFENCE provides several other configuration options to customize behaviour (see +the respective help text in ``lib/Kconfig.kfence`` for more info). + +Tuning performance +~~ + +The most important parameter is KFENCE's sample interval, which can be set via +the kernel boot parameter ``kfence.sample_interval`` in milliseconds. The +sample interval determines the frequency with which heap allocations will be +guarded by KFENCE. The default is configurable via the Kconfig option +``CONFIG_KFENCE_SAMPLE_INTERVAL``. Setting ``kfence.sample_interval=0`` +disables KFENCE. + +The KFENCE memory pool is of fixed size, and if the pool is exhausted, no +further KFENCE allocations occur. With ``CONFIG_KFENCE_NUM_OBJECTS`` (default +255), the number of available guarded objects can be controlled. Each object +requires 2 pages, one for the object itself and the other one used as a guard +page; object pages are interleaved with guard pages, and every object page is +therefore surrounded by two guard pages. + +The total memory dedicated to the KFENCE memory pool can be computed as:: + +( #objects + 1 ) * 2 * PAGE_SIZE + +Using the default config, and assuming a page size of 4 KiB, results in +dedicating 2 MiB to the KFENCE memory pool. + +Error reports +~ + +A typical out-of-bounds access looks like this:: + +== +BUG: KFENCE: out-of-bounds in test_out_of_bounds_read+0xa3/0x22b + +Out-of-bounds access at 0xb672efff (1B left of kfence-#17): + test_out_of_bounds_read+0xa3/0x22b + kunit_try_run_case+0x51/0x85 + kunit_generic_run_threadfn_adapter+0x16/0x30 + kthread+0x137/0x160 + ret_from_fork+0x22/0x30 + +kfence-#17 [0xb672f000-0xb672f01f, size=32, cache=kmalloc-32] allocated by task 507: + test_alloc+0xf3/0x25b + test_out_of_bounds_read+0x98/0x22b + kunit_try_run_case+0x51/0x85 + kunit_generic_run_threadfn_adapter+0x16/0x30 + kthread+0x137/0x160 + ret_from_fork+0x22/0x30 + +CPU: 4 PID: 107 Comm: kunit_try_catch Not tainted 5.8.0-rc6+ #7 +Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 +== + +The header of the report provides a short summary of the function involved in +the access. It is followed by more detailed information about the access and +its origin. Note that, real kernel addresses are only shown for +``CONFIG_DEBUG_KERNEL=y``