Re: [RFC] kmemcheck: trap uses of uninitialized memory (v2)
> We don't need to flush all CPUs. This is my rationale: The debug > exception (single-step trap) will always happen on the same CPU that > the page fault occurred on. Page fault shows the page, debug exception > hides the page again. Between those two operations, nothing else can You're ignoring preemptible kernels for once. Also there is always a race. e.g. consider another CPU fetches the entry into its TLB while you have it temporarily unprotected. Then you protect it again but the other CPU still keeps the unprotected entry in its TLBs which you don't flush. In general any changes to the direct mapping have to be done globally. > happen that will use the TLB entry in question (unless you have some > weird race condition, but then the code is in error anyway). > > What is c_p_a() and what is a flush object? change_page_attr() is designed to queue flush requests for later which are then done by global_tlb_flush() For that it creates a list of pages. It cannot be safely used without global_flush_tlb() otherwise the flush list will just grow unbounded. You could in theory write an equivalent optimized for your case that does not require that, but it's not even safe for your case anyways. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] kmemcheck: trap uses of uninitialized memory (v2)
On 29 Nov 2007 11:29:48 +0100, Andi Kleen <[EMAIL PROTECTED]> wrote: > Vegard Nossum <[EMAIL PROTECTED]> writes: > > > > - We properly flush TLB entries that change. This used to not be the case, > > and so we > > For low values of "properly" @) > > > + pte = lookup_address(addr); > > + change_page_attr(page, 1, __pgprot(pte->pte_low | _PAGE_VISIBLE)); > > + __flush_tlb_one(addr); > > That's not enough, you need to flush all CPUs. > > Also when you don't call global_flush_tlb() eventually because c_p_a() will > leak flush > objects over time. We don't need to flush all CPUs. This is my rationale: The debug exception (single-step trap) will always happen on the same CPU that the page fault occurred on. Page fault shows the page, debug exception hides the page again. Between those two operations, nothing else can happen that will use the TLB entry in question (unless you have some weird race condition, but then the code is in error anyway). What is c_p_a() and what is a flush object? > > -Andi > Vegard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] kmemcheck: trap uses of uninitialized memory (v2)
Hi Vegard, On Thu, 29 Nov 2007, Vegard Nossum wrote: > If I understand you correctly, you only want to be notified if any > memory within an allocation is used before any memory within the > allocation has been initialized. I think that this would be quite > useless compared to tracking all the bytes of an allocation. Ok, why is it useless? I am trying to understand what kind of errors you hope to catch here. On Thu, 29 Nov 2007, Vegard Nossum wrote: > I am not truly concerned about the memory usage; this kind of error > detection is by definition slow and memory intense. There's "slow" and then there's "too slow for mainline" :-). But even if you want to track every byte, you could use a bitmap in the slab layer to cut down memory usage and get rid of "shadow pages", no? Pekka - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] kmemcheck: trap uses of uninitialized memory (v2)
Vegard Nossum <[EMAIL PROTECTED]> writes: > > - We properly flush TLB entries that change. This used to not be the case, > and so we For low values of "properly" @) > + pte = lookup_address(addr); > + change_page_attr(page, 1, __pgprot(pte->pte_low | _PAGE_VISIBLE)); > + __flush_tlb_one(addr); That's not enough, you need to flush all CPUs. Also when you don't call global_flush_tlb() eventually because c_p_a() will leak flush objects over time. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] kmemcheck: trap uses of uninitialized memory (v2)
On Nov 29, 2007 10:39 AM, Pekka J Enberg <[EMAIL PROTECTED]> wrote: > Hi, > > On Nov 29, 2007 9:02 AM, Pekka Enberg <[EMAIL PROTECTED]> wrote: > > > Is it really necessary to track every memory address? Tracking slab > > > objects would require far less memory. You might also want to make > > > kzalloc() and GFP_ZERO mark the memory area as initialized to avoid > > > some page faults. > > On Thu, 29 Nov 2007, Vegard Nossum wrote: > > Yes, we are in fact only tracking the memory within SLUB allocations > > (minus what SLUB itself needs for bookkeeping -- like the caches). > > Yeah but you didn't answer my question: why do we track every memory > address instead of slab objects? What's the benefit? Like I already said, > tracking slab objects would require much less memory which makes the > thing more practical. It also reduces the number of false positives (the > CONFIG_OPTIMIZE_FOR_SIZE problem). And we already have slab poisoning to > cover the cases we would not catch with this scheme. If I understand you correctly, you only want to be notified if any memory within an allocation is used before any memory within the allocation has been initialized. I think that this would be quite useless compared to tracking all the bytes of an allocation. I am not truly concerned about the memory usage; this kind of error detection is by definition slow and memory intense. I think that slab poisoning is in a way more subtle, since it does not immediately produce a warning on its own; what it does is really to invoke *other* error catching mechanisms. Slab poisoning works best with pointers since pointers will be dereferenced. What about other kinds of data? The kernel will not complain (loudly enough, anyway) if your bit-field contains some arbitrary unexpected value. Kmemcheck will immediately print a message to the log in this case. > On Thu, 29 Nov 2007, Vegard Nossum wrote: > > As for the kzalloc() and GFP_ZERO, I believe these will write zeros to > > the data in question before the memory is returned to the caller. In > > that case, the area will be "automatically" set to initialized since > > these writes are also intercepted by kmemcheck. > > Yes, and what I proposed is as a potential optimization. Debugging aids > need to be fast enough to be practical. Yep, I didn't realize. Thanks. :-) > > Pekka > Vegard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] kmemcheck: trap uses of uninitialized memory (v2)
Hi, On Nov 29, 2007 9:02 AM, Pekka Enberg <[EMAIL PROTECTED]> wrote: > > Is it really necessary to track every memory address? Tracking slab > > objects would require far less memory. You might also want to make > > kzalloc() and GFP_ZERO mark the memory area as initialized to avoid > > some page faults. On Thu, 29 Nov 2007, Vegard Nossum wrote: > Yes, we are in fact only tracking the memory within SLUB allocations > (minus what SLUB itself needs for bookkeeping -- like the caches). Yeah but you didn't answer my question: why do we track every memory address instead of slab objects? What's the benefit? Like I already said, tracking slab objects would require much less memory which makes the thing more practical. It also reduces the number of false positives (the CONFIG_OPTIMIZE_FOR_SIZE problem). And we already have slab poisoning to cover the cases we would not catch with this scheme. On Thu, 29 Nov 2007, Vegard Nossum wrote: > As for the kzalloc() and GFP_ZERO, I believe these will write zeros to > the data in question before the memory is returned to the caller. In > that case, the area will be "automatically" set to initialized since > these writes are also intercepted by kmemcheck. Yes, and what I proposed is as a potential optimization. Debugging aids need to be fast enough to be practical. Pekka - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] kmemcheck: trap uses of uninitialized memory (v2)
Hi, On Nov 29, 2007 9:02 AM, Pekka Enberg <[EMAIL PROTECTED]> wrote: > Hi Vegard, > > On Nov 27, 2007 5:16 PM, Vegard Nossum <[EMAIL PROTECTED]> wrote: > > +config KMEMCHECK > > + bool "Trap use of uninitialized memory" > > + depends on X86_32 && !CC_OPTIMIZE_FOR_SIZE > > + help > > + This option enables tracing of dynamically allocated kernel memory > > + to see if memory is used before it has been given an initial > > value. > > + Be aware that this requires half of your memory for bookkeeping > > and > > + will insert extra code at *every* read and write to tracked memory > > + thus slow down the kernel code (but user code is unaffected). > > Is it really necessary to track every memory address? Tracking slab > objects would require far less memory. You might also want to make > kzalloc() and GFP_ZERO mark the memory area as initialized to avoid > some page faults. Yes, we are in fact only tracking the memory within SLUB allocations (minus what SLUB itself needs for bookkeeping -- like the caches). Maybe the Kconfig text was unclear? As for the kzalloc() and GFP_ZERO, I believe these will write zeros to the data in question before the memory is returned to the caller. In that case, the area will be "automatically" set to initialized since these writes are also intercepted by kmemcheck. If not, I will have to investigate some more :-) > On Nov 27, 2007 5:16 PM, Vegard Nossum <[EMAIL PROTECTED]> wrote: > > + /* Actually allocate twice as much, since we need to track the > > +* status of each byte within the allocation. */ > > + if (!(flags & __GFP_NOTRACK)) { > > If you change __GFP_NOTRACK to __GFP_TRACK, you can avoid the double > negation here. I deliberately chose this form. Here is my rationale: By default, when kmemcheck is enabled, we want to track as much as possible. So every "normal" allocation should be tracked. It seems easier to make an exception for the pages that should *not* be tracked (like the SLUB caches, DMA allocations), since this group of allocations is much smaller than the group of allocations that should be tracked. I could embed __GFP_TRACK into GFP_KERNEL, but then I would have to mask this out at every non-tracked allocation, which leaves us with the exact opposite problem, just in a different place. Thank you for looking :) > > Pekka > Vegard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] kmemcheck: trap uses of uninitialized memory (v2)
Hi Vegard, On Nov 27, 2007 5:16 PM, Vegard Nossum <[EMAIL PROTECTED]> wrote: > +config KMEMCHECK > + bool "Trap use of uninitialized memory" > + depends on X86_32 && !CC_OPTIMIZE_FOR_SIZE > + help > + This option enables tracing of dynamically allocated kernel memory > + to see if memory is used before it has been given an initial value. > + Be aware that this requires half of your memory for bookkeeping and > + will insert extra code at *every* read and write to tracked memory > + thus slow down the kernel code (but user code is unaffected). Is it really necessary to track every memory address? Tracking slab objects would require far less memory. You might also want to make kzalloc() and GFP_ZERO mark the memory area as initialized to avoid some page faults. On Nov 27, 2007 5:16 PM, Vegard Nossum <[EMAIL PROTECTED]> wrote: > + /* Actually allocate twice as much, since we need to track the > +* status of each byte within the allocation. */ > + if (!(flags & __GFP_NOTRACK)) { If you change __GFP_NOTRACK to __GFP_TRACK, you can avoid the double negation here. Pekka - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] kmemcheck: trap uses of uninitialized memory (v2)
Vegard Nossum wrote: Hi, On Nov 28, 2007 7:51 AM, Richard Knutsson <[EMAIL PROTECTED]> wrote: Vegard Nossum wrote: +static int Not 'static bool'? +page_is_tracked(struct page *page) Why not returning 'false' and 'true'? Sorry, I am not used to using bool in C :-) I will change this if bool is preferred in kernel code. Well, why not use them since we have them (C99 standard and over a year in the kernel). ;) What is "preferred" in a group of a few thousands, is hard to say, but I believe it is the way to go. The only "resistance" to it I know, is "it is not a C idiom". A quite illogical statement, at best. However, the 0/1 vs false/true is just a preference. (I like false/true, since I also say "true AND false = false" for example... (NOT true = false, makes sense to me, NOT 1 = 0 seem strange, why can't it be 2, or -1 ;) )) +static unsigned int +opcode_get_size(const uint8_t *opcode) Are we not using 'u8' in the kernel? Actually, I don't see any reason to use u8 when uint8_t is already standard and used in other places in the kernel. I believe I have heard they can be a problem in some situations. It also have the benefit of uniforming the kernel-code. cu Richard Knutsson - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] kmemcheck: trap uses of uninitialized memory (v2)
Hi, On Nov 28, 2007 7:51 AM, Richard Knutsson <[EMAIL PROTECTED]> wrote: > Vegard Nossum wrote: > > +static int > Not 'static bool'? > > +page_is_tracked(struct page *page) > Why not returning 'false' and 'true'? Sorry, I am not used to using bool in C :-) I will change this if bool is preferred in kernel code. > > +static unsigned int > > +opcode_get_size(const uint8_t *opcode) > Are we not using 'u8' in the kernel? Actually, I don't see any reason to use u8 when uint8_t is already standard and used in other places in the kernel. Thanks for the other comments! I will make the necessary changes for the next version. > cu > Richard Knutsson Vegard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] kmemcheck: trap uses of uninitialized memory (v2)
Vegard Nossum wrote: General description: kmemcheck will trap every read and write to memory that was allocated dynamically (ie. with kmalloc()). If a memory address is read that has not previously been written to, a message is printed to the kernel log. diff --git a/arch/x86/kernel/kmemcheck_32.c b/arch/x86/kernel/kmemcheck_32.c new file mode 100644 index 000..9d065b9 --- /dev/null +++ b/arch/x86/kernel/kmemcheck_32.c @@ -0,0 +1,290 @@ +#include +#include +#include +#include +#include +#include + +static int Not 'static bool'? +page_is_tracked(struct page *page) +{ +struct page *head; + +if (!page) +return 0; +head = compound_head(page); +if (!head) +return 0; +if (!(head->flags & (1 << PG_slab))) +return 0; +if (!head->slab) +return 0; +if (head->slab->flags & __GFP_NOTRACK) +return 0; +return 1; Why not returning 'false' and 'true'? +} + +static void +show_addr(uint32_t addr) +{ +struct page *page; +pte_t *pte; + +if (!addr) +return; +page = virt_to_page(addr); +if (!page_is_tracked(page)) +return; + +pte = lookup_address(addr); +change_page_attr(page, 1, __pgprot(pte->pte_low | _PAGE_VISIBLE)); +__flush_tlb_one(addr); +} + +static void +hide_addr(uint32_t addr) +{ +struct page *page; +pte_t *pte; + +if (!addr) +return; +page = virt_to_page(addr); +if (!page_is_tracked(page)) +return; +pte = lookup_address(addr); +change_page_attr(page, 1, __pgprot(pte->pte_low & ~_PAGE_VISIBLE)); +__flush_tlb_one(addr); +} + +DEFINE_PER_CPU(uint32_t, kmemcheck_read_addr); +DEFINE_PER_CPU(uint32_t, kmemcheck_write_addr); + +void +kmemcheck_show(struct pt_regs *regs) +{ +show_addr(__get_cpu_var(kmemcheck_read_addr)); +show_addr(__get_cpu_var(kmemcheck_write_addr)); +regs->eflags |= TF_MASK; +} + +void +kmemcheck_hide(struct pt_regs *regs) +{ +hide_addr(__get_cpu_var(kmemcheck_read_addr)); +hide_addr(__get_cpu_var(kmemcheck_write_addr)); +regs->eflags &= ~TF_MASK; +} + +void +kmemcheck_hide_pages(struct page *p, unsigned int n) +{ +unsigned int i; + +for(i = 0; i < n; ++i) { +unsigned long address = (unsigned long) page_address(&p[i]); +pte_t *pte = lookup_address(address); + +change_page_attr(&p[i], 1, +__pgprot(pte->pte_low & ~_PAGE_VISIBLE)); +__flush_tlb_one(address); +} +} + +static int 'static bool'? +opcode_is_prefix(uint8_t b) +{ +return +/* Group 1 */ +b == 0xf0 || b == 0xf2 || b == 0xf3 +/* Group 2 */ +|| b == 0x2e || b == 0x36 || b == 0x3e || b == 0x26 +|| b == 0x64 || b == 0x65 || b == 0x2e || b == 0x3e +/* Group 3 */ +|| b == 0x66 +/* Group 4 */ +|| b == 0x67; +} + +/* This is a VERY crude opcode decoder. We only need to find the size of the + * load/store that caused our #PF and this should work for all the opcodes + * that we care about. Moreover, the ones who invented this instruction set + * should be shot. */ +static unsigned int +opcode_get_size(const uint8_t *opcode) Are we not using 'u8' in the kernel? +{ +const uint8_t *i; and here. Also, I find the name 'i' a bit confusing in this context, since it is not really a counter. What about 'cur', 'prefix_op' or something of the sort? + +/* Default operand size */ +int operand_size_override = 32; + +/* prefixes */ +for (i = opcode; opcode_is_prefix(*i); ++i) { +if (*i == 0x66) +operand_size_override = 16; +} + +/* escape opcode */ +if (*i == 0x0f) { +++i; + +if(*i == 0xb6) Please do, as the previous, 'if ()'. A good checker for these kind of things is the 'scripts/checkpatch.pl'... +return operand_size_override >> 1; +if(*i == 0xb7) +return 16; +} + +return (*i & 1) ? operand_size_override : 8; +} + +static uint8_t and here... +opcode_get_primary(const uint8_t *opcode) and here.. +{ +const uint8_t *i; and here. Also, I find the name 'i' a bit confusing in this context, since it is not really a counter. What about 'cur', 'prim_op', 'prefix_op' or something of the sort? + +/* skip prefixes */ +for (i = opcode; opcode_is_prefix(*i); ++i); +return *i; +} + +static void *address_get_shadow_slab(unsigned long address, +struct page *head) +{ +if (!head->slab) +return NULL; + +if (head->slab->flags & __GFP_NOTRACK) +return NULL; + +return (void *) address + (PAGE_SIZE << head->slab->order); +} + +static void *address_get_shadow(unsigned long address) +{ +struct page *page = virt_to_page(address); +struct page *head = compound_head(page); + +if (!head) +return NULL; + +if (head->flags & (1 << PG_slab)) +return address_get_shadow_slab(address, head); + +return NULL; +} + +static int 'static bool'? +test(void *shadow
[RFC] kmemcheck: trap uses of uninitialized memory (v2)
General description: kmemcheck will trap every read and write to memory that was allocated dynamically (ie. with kmalloc()). If a memory address is read that has not previously been written to, a message is printed to the kernel log. Changes since v1: - We properly flush TLB entries that change. This used to not be the case, and so we got both spurious and missing reports of invalid memory accesses. - There was a problem with instructions that both read from and wrote to memory in a single instruction. In particular, memcpy would use a REP MOVSB instruction, which could cause a page fault for either the source or the destination address. In this case the information from the page fault handler is not enough (as we only get the first access, the read), and we would end up either looping indefinitely (between marking the source page and the destination page present if the pages were not the same, and missing the write if the source/destination page was the same). Now we examine the instruction that caused the fault to see if it makes one or two memory accesses. - Compiling with -Os caused gcc to turn some MOVZWs into MOVs, which would result in reads from possibly uninitialized memory. This is not incorrect per se, (since the extra bits are thrown away afterwards) so we now depend on !CC_OPTIMIZE_FOR_SIZE. - Adjacent reads from the same EIP will not display the full stack trace. If anybody wants to take a look at a sample log from the patched kernel, I provide this link: http://folk.uio.no/vegardno/linux/kmemcheck-20071127.txt I do not know which (if any) of the messages are valid yet, but please have a look around anyway. If you can confirm or deny the validity of a report, that's great. To use this patch with qemu, use either qemu-cvs or qemu-0.9.0 with the following patch: http://lists.gnu.org/archive/html/qemu-devel/2007-03/msg00126.html In addition, you probably want to remove the -O2 entirely from the Makefile to inhibit inlining, which makes the stack traces somewhat better. Kind regards, Vegard Nossum arch/x86/Kconfig.debug | 10 ++ arch/x86/kernel/Makefile_32|2 + arch/x86/kernel/kmemcheck_32.c | 290 arch/x86/kernel/traps_32.c | 16 ++- arch/x86/mm/fault_32.c | 19 +++- include/asm-x86/kmemcheck.h|3 + include/asm-x86/kmemcheck_32.h | 33 + include/asm-x86/pgtable_32.h |9 +- include/linux/gfp.h|3 +- mm/slub.c | 37 +- 10 files changed, 405 insertions(+), 17 deletions(-) diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug index 761ca7b..6a3e3ba 100644 --- a/arch/x86/Kconfig.debug +++ b/arch/x86/Kconfig.debug @@ -112,4 +112,14 @@ config IOMMU_LEAK Add a simple leak tracer to the IOMMU code. This is useful when you are debugging a buggy device driver that leaks IOMMU mappings. +config KMEMCHECK + bool "Trap use of uninitialized memory" + depends on X86_32 && !CC_OPTIMIZE_FOR_SIZE + help + This option enables tracing of dynamically allocated kernel memory + to see if memory is used before it has been given an initial value. + Be aware that this requires half of your memory for bookkeeping and + will insert extra code at *every* read and write to tracked memory + thus slow down the kernel code (but user code is unaffected). + endmenu diff --git a/arch/x86/kernel/Makefile_32 b/arch/x86/kernel/Makefile_32 index a7bc93c..91d3d9c 100644 --- a/arch/x86/kernel/Makefile_32 +++ b/arch/x86/kernel/Makefile_32 @@ -49,6 +49,8 @@ obj-y += pcspeaker.o obj-$(CONFIG_SCx200) += scx200_32.o +obj-$(CONFIG_KMEMCHECK)+= kmemcheck_32.o + # vsyscall_32.o contains the vsyscall DSO images as __initdata. # We must build both images before we can assemble it. # Note: kbuild does not track this dependency due to usage of .incbin diff --git a/arch/x86/kernel/kmemcheck_32.c b/arch/x86/kernel/kmemcheck_32.c new file mode 100644 index 000..9d065b9 --- /dev/null +++ b/arch/x86/kernel/kmemcheck_32.c @@ -0,0 +1,290 @@ +#include +#include +#include +#include +#include +#include + +static int +page_is_tracked(struct page *page) +{ + struct page *head; + + if (!page) + return 0; + head = compound_head(page); + if (!head) + return 0; + if (!(head->flags & (1 << PG_slab))) + return 0; + if (!head->slab) + return 0; + if (head->slab->flags & __GFP_NOTRACK) + return 0; + return 1; +} + +static void +show_addr(uint32_t addr) +{ + struct page *page; + pte_t *pte; + + if (!addr) + return; + page = virt_to_page(addr); + if (!page_is_tracked(page)) + return; + + pte = lookup_address(addr); + change_page_attr(page, 1, __pgprot