RE: [PATCH v2] efi-pstore: Make efi-pstore return a unique id
> -Original Message- > From: Matt Fleming [mailto:m...@console-pimps.org] > Sent: Wednesday, November 27, 2013 6:45 AM > To: Madper Xie > Cc: 'linux-kernel@vger.kernel.org'; linux-...@vger.kernel.org; > matt.flem...@intel.com; Seiji Aguchi; rich...@nod.at; Tony Luck > Subject: Re: [PATCH v2] efi-pstore: Make efi-pstore return a unique id > > On Sat, 23 Nov, at 07:55:37PM, Madper Xie wrote: > > > > Pstore fs expects that backends provide a uniqued id which could avoid > > pstore making entries as duplication or denominating entries the same > > name. So I combine the timestamp, part and count into id. > > > > Signed-off-by: Madper Xie > > --- > > drivers/firmware/efi/efi-pstore.c | 19 +-- > > 1 file changed, 13 insertions(+), 6 deletions(-) > > Thanks unless anyone complains, I'm going to apply this. > I'm OK with this patch based on Madper's comment below. So i'd like to using fixed length. My dell xps has 128kb nvram space. and it at most store 126 pstore entries. So I think 1000 will be a safe value for count. For part, 100 should be okay. I don't think there will be a large kmsg and we must split it into more than 100 parts. > Does this need to be tagged for stable? > I think so. Seiji > -- > Matt Fleming, Intel Open Source Technology Center -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v2] efi-pstore: Make efi-pstore return a unique id
-Original Message- From: Matt Fleming [mailto:m...@console-pimps.org] Sent: Wednesday, November 27, 2013 6:45 AM To: Madper Xie Cc: 'linux-kernel@vger.kernel.org'; linux-...@vger.kernel.org; matt.flem...@intel.com; Seiji Aguchi; rich...@nod.at; Tony Luck Subject: Re: [PATCH v2] efi-pstore: Make efi-pstore return a unique id On Sat, 23 Nov, at 07:55:37PM, Madper Xie wrote: Pstore fs expects that backends provide a uniqued id which could avoid pstore making entries as duplication or denominating entries the same name. So I combine the timestamp, part and count into id. Signed-off-by: Madper Xie c...@redhat.com --- drivers/firmware/efi/efi-pstore.c | 19 +-- 1 file changed, 13 insertions(+), 6 deletions(-) Thanks unless anyone complains, I'm going to apply this. I'm OK with this patch based on Madper's comment below. snip So i'd like to using fixed length. My dell xps has 128kb nvram space. and it at most store 126 pstore entries. So I think 1000 will be a safe value for count. For part, 100 should be okay. I don't think there will be a large kmsg and we must split it into more than 100 parts. snip Does this need to be tagged for stable? I think so. Seiji -- Matt Fleming, Intel Open Source Technology Center -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [tip:x86/trace] x86, trace: Add page fault tracepoints
> Seiji - are you okay with this? If so I'll just make a quick patch now. > I'm OK with it. Seiji -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [tip:x86/trace] x86, trace: Add page fault tracepoints
Seiji - are you okay with this? If so I'll just make a quick patch now. I'm OK with it. Seiji -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/trace] x86, trace: Add page fault tracepoints
Commit-ID: d34603b07c4255b2b00a546d34f297ccd50ae4c6 Gitweb: http://git.kernel.org/tip/d34603b07c4255b2b00a546d34f297ccd50ae4c6 Author: Seiji Aguchi AuthorDate: Wed, 30 Oct 2013 16:39:03 -0400 Committer: H. Peter Anvin CommitDate: Fri, 8 Nov 2013 14:15:49 -0800 x86, trace: Add page fault tracepoints This patch introduces page fault tracepoints to x86 architecture by switching IDT. Two events, for user and kernel spaces, are introduced at the beginning of page fault handler for tracing. - User space event There is a request of page fault event for user space as below. https://lkml.kernel.org/r/1368079520-11015-2-git-send-email-fdeslaur+()+gmail+!+com https://lkml.kernel.org/r/1368079520-11015-1-git-send-email-fdeslaur+()+gmail+!+com - Kernel space event: When we measure an overhead in kernel space for investigating performance issues, we can check if it comes from the page fault events. Signed-off-by: Seiji Aguchi Link: http://lkml.kernel.org/r/52716e67.6090...@hds.com Signed-off-by: H. Peter Anvin --- arch/x86/include/asm/trace/exceptions.h | 52 + arch/x86/mm/Makefile| 2 ++ arch/x86/mm/fault.c | 13 + 3 files changed, 67 insertions(+) diff --git a/arch/x86/include/asm/trace/exceptions.h b/arch/x86/include/asm/trace/exceptions.h new file mode 100644 index 000..86540c0 --- /dev/null +++ b/arch/x86/include/asm/trace/exceptions.h @@ -0,0 +1,52 @@ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM exceptions + +#if !defined(_TRACE_PAGE_FAULT_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_PAGE_FAULT_H + +#include + +extern void trace_irq_vector_regfunc(void); +extern void trace_irq_vector_unregfunc(void); + +DECLARE_EVENT_CLASS(x86_exceptions, + + TP_PROTO(unsigned long address, struct pt_regs *regs, +unsigned long error_code), + + TP_ARGS(address, regs, error_code), + + TP_STRUCT__entry( + __field(unsigned long, address ) + __field(unsigned long, ip ) + __field(unsigned long, error_code ) + ), + + TP_fast_assign( + __entry->address = address; + __entry->ip = regs->ip; + __entry->error_code = error_code; + ), + + TP_printk("address=%pf ip=%pf error_code=0x%lx", + (void *)__entry->address, (void *)__entry->ip, + __entry->error_code) ); + +#define DEFINE_PAGE_FAULT_EVENT(name) \ +DEFINE_EVENT_FN(x86_exceptions, name, \ + TP_PROTO(unsigned long address, struct pt_regs *regs, \ +unsigned long error_code), \ + TP_ARGS(address, regs, error_code), \ + trace_irq_vector_regfunc, \ + trace_irq_vector_unregfunc); + +DEFINE_PAGE_FAULT_EVENT(user_page_fault); +DEFINE_PAGE_FAULT_EVENT(kernel_page_fault); + +#undef TRACE_INCLUDE_PATH +#define TRACE_INCLUDE_PATH . +#define TRACE_INCLUDE_FILE exceptions +#endif /* _TRACE_PAGE_FAULT_H */ + +/* This part must be outside protection */ +#include diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 23d8e5f..6a19ad9 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -6,6 +6,8 @@ nostackp := $(call cc-option, -fno-stack-protector) CFLAGS_physaddr.o := $(nostackp) CFLAGS_setup_nx.o := $(nostackp) +CFLAGS_fault.o := -I$(src)/../include/asm/trace + obj-$(CONFIG_X86_PAT) += pat_rbtree.o obj-$(CONFIG_SMP) += tlb.o diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index fd3e281..f2730cbc 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -20,6 +20,9 @@ #include /* kmemcheck_*(), ... */ #include /* VSYSCALL_START */ +#define CREATE_TRACE_POINTS +#include + /* * Page fault error code bits: * @@ -1232,12 +1235,22 @@ do_page_fault(struct pt_regs *regs, unsigned long error_code) exception_exit(prev_state); } +static void trace_page_fault_entries(struct pt_regs *regs, +unsigned long error_code) +{ + if (user_mode(regs)) + trace_user_page_fault(read_cr2(), regs, error_code); + else + trace_kernel_page_fault(read_cr2(), regs, error_code); +} + dotraplinkage void __kprobes trace_do_page_fault(struct pt_regs *regs, unsigned long error_code) { enum ctx_state prev_state; prev_state = exception_enter(); + trace_page_fault_entries(regs, error_code); __do_page_fault(regs, error_code); exception_exit(prev_state); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More
[tip:x86/trace] x86, trace: Register exception handler to trace IDT
Commit-ID: 25c74b10bacead867478480170083f69cfc0db48 Gitweb: http://git.kernel.org/tip/25c74b10bacead867478480170083f69cfc0db48 Author: Seiji Aguchi AuthorDate: Wed, 30 Oct 2013 16:37:00 -0400 Committer: H. Peter Anvin CommitDate: Fri, 8 Nov 2013 14:15:45 -0800 x86, trace: Register exception handler to trace IDT This patch registers exception handlers for tracing to a trace IDT. To implemented it in set_intr_gate(), this patch does followings. - Register the exception handlers to the trace IDT by prepending "trace_" to the handler's names. - Also, newly introduce trace_page_fault() to add tracepoints in a subsequent patch. Signed-off-by: Seiji Aguchi Link: http://lkml.kernel.org/r/52716dec.5050...@hds.com Signed-off-by: H. Peter Anvin --- arch/x86/include/asm/desc.h| 28 +++- arch/x86/include/asm/hw_irq.h | 3 +++ arch/x86/include/asm/segment.h | 3 +++ arch/x86/include/asm/traps.h | 20 arch/x86/kernel/entry_32.S | 10 ++ arch/x86/kernel/entry_64.S | 13 - arch/x86/kernel/head64.c | 2 +- arch/x86/kernel/kvm.c | 2 +- arch/x86/kernel/traps.c| 28 ++-- arch/x86/mm/fault.c| 10 ++ 10 files changed, 97 insertions(+), 22 deletions(-) diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index d939567..3d73437 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -327,10 +327,25 @@ static inline void write_trace_idt_entry(int entry, const gate_desc *gate) { write_idt_entry(trace_idt_table, entry, gate); } + +static inline void _trace_set_gate(int gate, unsigned type, void *addr, + unsigned dpl, unsigned ist, unsigned seg) +{ + gate_desc s; + + pack_gate(, type, (unsigned long)addr, dpl, ist, seg); + /* +* does not need to be atomic because it is only done once at +* setup time +*/ + write_trace_idt_entry(gate, ); +} #else static inline void write_trace_idt_entry(int entry, const gate_desc *gate) { } + +#define _trace_set_gate(gate, type, addr, dpl, ist, seg) #endif static inline void _set_gate(int gate, unsigned type, void *addr, @@ -353,11 +368,14 @@ static inline void _set_gate(int gate, unsigned type, void *addr, * Pentium F0 0F bugfix can have resulted in the mapped * IDT being write-protected. */ -static inline void set_intr_gate(unsigned int n, void *addr) -{ - BUG_ON((unsigned)n > 0xFF); - _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); -} +#define set_intr_gate(n, addr) \ + do {\ + BUG_ON((unsigned)n > 0xFF); \ + _set_gate(n, GATE_INTERRUPT, (void *)addr, 0, 0,\ + __KERNEL_CS); \ + _trace_set_gate(n, GATE_INTERRUPT, (void *)trace_##addr,\ + 0, 0, __KERNEL_CS); \ + } while (0) extern int first_system_vector; /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */ diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h index 92b3bae..cba45d9 100644 --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -187,6 +187,9 @@ extern __visible void smp_invalidate_interrupt(struct pt_regs *); #endif extern void (*__initconst interrupt[NR_VECTORS-FIRST_EXTERNAL_VECTOR])(void); +#ifdef CONFIG_TRACING +#define trace_interrupt interrupt +#endif typedef int vector_irq_t[NR_VECTORS]; DECLARE_PER_CPU(vector_irq_t, vector_irq); diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h index c48a950..6f1c3a8 100644 --- a/arch/x86/include/asm/segment.h +++ b/arch/x86/include/asm/segment.h @@ -214,6 +214,9 @@ #ifdef __KERNEL__ #ifndef __ASSEMBLY__ extern const char early_idt_handlers[NUM_EXCEPTION_VECTORS][2+2+5]; +#ifdef CONFIG_TRACING +#define trace_early_idt_handlers early_idt_handlers +#endif /* * Load a segment. Fall back on loading the zero diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h index 7036cb6..58d66fe 100644 --- a/arch/x86/include/asm/traps.h +++ b/arch/x86/include/asm/traps.h @@ -37,6 +37,23 @@ asmlinkage void machine_check(void); #endif /* CONFIG_X86_MCE */ asmlinkage void simd_coprocessor_error(void); +#ifdef CONFIG_TRACING +asmlinkage void trace_page_fault(void); +#define trace_divide_error divide_error +#define trace_bounds bounds +#define trace_invalid_op invalid_op +#define trace_device_not_available device_not_available +#define trace_coprocessor_segment_overrun coprocessor_segment_overrun +#define trace_invalid_TSS invalid_TSS +#define trace_segment_not_present segment_not_present +#define trace_general_protection general_pro
[tip:x86/trace] x86, trace: Delete __trace_alloc_intr_gate()
Commit-ID: ac7956e2699380b8b10146ec2ba8cbe43a03ff7a Gitweb: http://git.kernel.org/tip/ac7956e2699380b8b10146ec2ba8cbe43a03ff7a Author: Seiji Aguchi AuthorDate: Wed, 30 Oct 2013 16:37:47 -0400 Committer: H. Peter Anvin CommitDate: Fri, 8 Nov 2013 14:15:47 -0800 x86, trace: Delete __trace_alloc_intr_gate() Currently irq vector handlers for tracing are registered in both set_intr_gate() and __trace_alloc_intr_gate() in alloc_intr_gate(). But, we don't need to do that twice. So, let's delete __trace_alloc_intr_gate(). Signed-off-by: Seiji Aguchi Link: http://lkml.kernel.org/r/52716e1b.7090...@hds.com Signed-off-by: H. Peter Anvin --- arch/x86/include/asm/desc.h | 22 -- 1 file changed, 22 deletions(-) diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index 3d73437..50d033a 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -392,32 +392,10 @@ static inline void alloc_system_vector(int vector) } } -#ifdef CONFIG_TRACING -static inline void trace_set_intr_gate(unsigned int gate, void *addr) -{ - gate_desc s; - - pack_gate(, GATE_INTERRUPT, (unsigned long)addr, 0, 0, __KERNEL_CS); - write_idt_entry(trace_idt_table, gate, ); -} - -static inline void __trace_alloc_intr_gate(unsigned int n, void *addr) -{ - trace_set_intr_gate(n, addr); -} -#else -static inline void trace_set_intr_gate(unsigned int gate, void *addr) -{ -} - -#define __trace_alloc_intr_gate(n, addr) -#endif - #define alloc_intr_gate(n, addr) \ do {\ alloc_system_vector(n); \ set_intr_gate(n, addr); \ - __trace_alloc_intr_gate(n, trace_##addr); \ } while (0) /* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/trace] x86, trace: Remove __alloc_intr_gate()
Commit-ID: 959c071f0974cda7702d7574647de7ad9259eb57 Gitweb: http://git.kernel.org/tip/959c071f0974cda7702d7574647de7ad9259eb57 Author: Seiji Aguchi AuthorDate: Wed, 30 Oct 2013 16:36:08 -0400 Committer: H. Peter Anvin CommitDate: Fri, 8 Nov 2013 14:15:44 -0800 x86, trace: Remove __alloc_intr_gate() Prepare to move set_intr_gate() into a macro by removing __alloc_intr_gate(). The purpose is to avoid failing a kernel build after applying a subsequent patch which changes set_intr_gate() into a macro. Signed-off-by: Seiji Aguchi Link: http://lkml.kernel.org/r/52716db8.1080...@hds.com Signed-off-by: H. Peter Anvin --- arch/x86/include/asm/desc.h | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index b90e5df..d939567 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -395,15 +395,10 @@ static inline void trace_set_intr_gate(unsigned int gate, void *addr) #define __trace_alloc_intr_gate(n, addr) #endif -static inline void __alloc_intr_gate(unsigned int n, void *addr) -{ - set_intr_gate(n, addr); -} - #define alloc_intr_gate(n, addr) \ do {\ alloc_system_vector(n); \ - __alloc_intr_gate(n, addr); \ + set_intr_gate(n, addr); \ __trace_alloc_intr_gate(n, trace_##addr); \ } while (0) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/trace] x86, trace: Delete __trace_alloc_intr_gate()
Commit-ID: ac7956e2699380b8b10146ec2ba8cbe43a03ff7a Gitweb: http://git.kernel.org/tip/ac7956e2699380b8b10146ec2ba8cbe43a03ff7a Author: Seiji Aguchi seiji.agu...@hds.com AuthorDate: Wed, 30 Oct 2013 16:37:47 -0400 Committer: H. Peter Anvin h...@linux.intel.com CommitDate: Fri, 8 Nov 2013 14:15:47 -0800 x86, trace: Delete __trace_alloc_intr_gate() Currently irq vector handlers for tracing are registered in both set_intr_gate() and __trace_alloc_intr_gate() in alloc_intr_gate(). But, we don't need to do that twice. So, let's delete __trace_alloc_intr_gate(). Signed-off-by: Seiji Aguchi seiji.agu...@hds.com Link: http://lkml.kernel.org/r/52716e1b.7090...@hds.com Signed-off-by: H. Peter Anvin h...@linux.intel.com --- arch/x86/include/asm/desc.h | 22 -- 1 file changed, 22 deletions(-) diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index 3d73437..50d033a 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -392,32 +392,10 @@ static inline void alloc_system_vector(int vector) } } -#ifdef CONFIG_TRACING -static inline void trace_set_intr_gate(unsigned int gate, void *addr) -{ - gate_desc s; - - pack_gate(s, GATE_INTERRUPT, (unsigned long)addr, 0, 0, __KERNEL_CS); - write_idt_entry(trace_idt_table, gate, s); -} - -static inline void __trace_alloc_intr_gate(unsigned int n, void *addr) -{ - trace_set_intr_gate(n, addr); -} -#else -static inline void trace_set_intr_gate(unsigned int gate, void *addr) -{ -} - -#define __trace_alloc_intr_gate(n, addr) -#endif - #define alloc_intr_gate(n, addr) \ do {\ alloc_system_vector(n); \ set_intr_gate(n, addr); \ - __trace_alloc_intr_gate(n, trace_##addr); \ } while (0) /* -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/trace] x86, trace: Add page fault tracepoints
Commit-ID: d34603b07c4255b2b00a546d34f297ccd50ae4c6 Gitweb: http://git.kernel.org/tip/d34603b07c4255b2b00a546d34f297ccd50ae4c6 Author: Seiji Aguchi seiji.agu...@hds.com AuthorDate: Wed, 30 Oct 2013 16:39:03 -0400 Committer: H. Peter Anvin h...@linux.intel.com CommitDate: Fri, 8 Nov 2013 14:15:49 -0800 x86, trace: Add page fault tracepoints This patch introduces page fault tracepoints to x86 architecture by switching IDT. Two events, for user and kernel spaces, are introduced at the beginning of page fault handler for tracing. - User space event There is a request of page fault event for user space as below. https://lkml.kernel.org/r/1368079520-11015-2-git-send-email-fdeslaur+()+gmail+!+com https://lkml.kernel.org/r/1368079520-11015-1-git-send-email-fdeslaur+()+gmail+!+com - Kernel space event: When we measure an overhead in kernel space for investigating performance issues, we can check if it comes from the page fault events. Signed-off-by: Seiji Aguchi seiji.agu...@hds.com Link: http://lkml.kernel.org/r/52716e67.6090...@hds.com Signed-off-by: H. Peter Anvin h...@linux.intel.com --- arch/x86/include/asm/trace/exceptions.h | 52 + arch/x86/mm/Makefile| 2 ++ arch/x86/mm/fault.c | 13 + 3 files changed, 67 insertions(+) diff --git a/arch/x86/include/asm/trace/exceptions.h b/arch/x86/include/asm/trace/exceptions.h new file mode 100644 index 000..86540c0 --- /dev/null +++ b/arch/x86/include/asm/trace/exceptions.h @@ -0,0 +1,52 @@ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM exceptions + +#if !defined(_TRACE_PAGE_FAULT_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_PAGE_FAULT_H + +#include linux/tracepoint.h + +extern void trace_irq_vector_regfunc(void); +extern void trace_irq_vector_unregfunc(void); + +DECLARE_EVENT_CLASS(x86_exceptions, + + TP_PROTO(unsigned long address, struct pt_regs *regs, +unsigned long error_code), + + TP_ARGS(address, regs, error_code), + + TP_STRUCT__entry( + __field(unsigned long, address ) + __field(unsigned long, ip ) + __field(unsigned long, error_code ) + ), + + TP_fast_assign( + __entry-address = address; + __entry-ip = regs-ip; + __entry-error_code = error_code; + ), + + TP_printk(address=%pf ip=%pf error_code=0x%lx, + (void *)__entry-address, (void *)__entry-ip, + __entry-error_code) ); + +#define DEFINE_PAGE_FAULT_EVENT(name) \ +DEFINE_EVENT_FN(x86_exceptions, name, \ + TP_PROTO(unsigned long address, struct pt_regs *regs, \ +unsigned long error_code), \ + TP_ARGS(address, regs, error_code), \ + trace_irq_vector_regfunc, \ + trace_irq_vector_unregfunc); + +DEFINE_PAGE_FAULT_EVENT(user_page_fault); +DEFINE_PAGE_FAULT_EVENT(kernel_page_fault); + +#undef TRACE_INCLUDE_PATH +#define TRACE_INCLUDE_PATH . +#define TRACE_INCLUDE_FILE exceptions +#endif /* _TRACE_PAGE_FAULT_H */ + +/* This part must be outside protection */ +#include trace/define_trace.h diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 23d8e5f..6a19ad9 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -6,6 +6,8 @@ nostackp := $(call cc-option, -fno-stack-protector) CFLAGS_physaddr.o := $(nostackp) CFLAGS_setup_nx.o := $(nostackp) +CFLAGS_fault.o := -I$(src)/../include/asm/trace + obj-$(CONFIG_X86_PAT) += pat_rbtree.o obj-$(CONFIG_SMP) += tlb.o diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index fd3e281..f2730cbc 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -20,6 +20,9 @@ #include asm/kmemcheck.h /* kmemcheck_*(), ... */ #include asm/fixmap.h/* VSYSCALL_START */ +#define CREATE_TRACE_POINTS +#include asm/trace/exceptions.h + /* * Page fault error code bits: * @@ -1232,12 +1235,22 @@ do_page_fault(struct pt_regs *regs, unsigned long error_code) exception_exit(prev_state); } +static void trace_page_fault_entries(struct pt_regs *regs, +unsigned long error_code) +{ + if (user_mode(regs)) + trace_user_page_fault(read_cr2(), regs, error_code); + else + trace_kernel_page_fault(read_cr2(), regs, error_code); +} + dotraplinkage void __kprobes trace_do_page_fault(struct pt_regs *regs, unsigned long error_code) { enum ctx_state prev_state; prev_state = exception_enter(); + trace_page_fault_entries(regs, error_code); __do_page_fault(regs, error_code); exception_exit(prev_state); } -- To unsubscribe
[tip:x86/trace] x86, trace: Register exception handler to trace IDT
Commit-ID: 25c74b10bacead867478480170083f69cfc0db48 Gitweb: http://git.kernel.org/tip/25c74b10bacead867478480170083f69cfc0db48 Author: Seiji Aguchi seiji.agu...@hds.com AuthorDate: Wed, 30 Oct 2013 16:37:00 -0400 Committer: H. Peter Anvin h...@linux.intel.com CommitDate: Fri, 8 Nov 2013 14:15:45 -0800 x86, trace: Register exception handler to trace IDT This patch registers exception handlers for tracing to a trace IDT. To implemented it in set_intr_gate(), this patch does followings. - Register the exception handlers to the trace IDT by prepending trace_ to the handler's names. - Also, newly introduce trace_page_fault() to add tracepoints in a subsequent patch. Signed-off-by: Seiji Aguchi seiji.agu...@hds.com Link: http://lkml.kernel.org/r/52716dec.5050...@hds.com Signed-off-by: H. Peter Anvin h...@linux.intel.com --- arch/x86/include/asm/desc.h| 28 +++- arch/x86/include/asm/hw_irq.h | 3 +++ arch/x86/include/asm/segment.h | 3 +++ arch/x86/include/asm/traps.h | 20 arch/x86/kernel/entry_32.S | 10 ++ arch/x86/kernel/entry_64.S | 13 - arch/x86/kernel/head64.c | 2 +- arch/x86/kernel/kvm.c | 2 +- arch/x86/kernel/traps.c| 28 ++-- arch/x86/mm/fault.c| 10 ++ 10 files changed, 97 insertions(+), 22 deletions(-) diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index d939567..3d73437 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -327,10 +327,25 @@ static inline void write_trace_idt_entry(int entry, const gate_desc *gate) { write_idt_entry(trace_idt_table, entry, gate); } + +static inline void _trace_set_gate(int gate, unsigned type, void *addr, + unsigned dpl, unsigned ist, unsigned seg) +{ + gate_desc s; + + pack_gate(s, type, (unsigned long)addr, dpl, ist, seg); + /* +* does not need to be atomic because it is only done once at +* setup time +*/ + write_trace_idt_entry(gate, s); +} #else static inline void write_trace_idt_entry(int entry, const gate_desc *gate) { } + +#define _trace_set_gate(gate, type, addr, dpl, ist, seg) #endif static inline void _set_gate(int gate, unsigned type, void *addr, @@ -353,11 +368,14 @@ static inline void _set_gate(int gate, unsigned type, void *addr, * Pentium F0 0F bugfix can have resulted in the mapped * IDT being write-protected. */ -static inline void set_intr_gate(unsigned int n, void *addr) -{ - BUG_ON((unsigned)n 0xFF); - _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); -} +#define set_intr_gate(n, addr) \ + do {\ + BUG_ON((unsigned)n 0xFF); \ + _set_gate(n, GATE_INTERRUPT, (void *)addr, 0, 0,\ + __KERNEL_CS); \ + _trace_set_gate(n, GATE_INTERRUPT, (void *)trace_##addr,\ + 0, 0, __KERNEL_CS); \ + } while (0) extern int first_system_vector; /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */ diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h index 92b3bae..cba45d9 100644 --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -187,6 +187,9 @@ extern __visible void smp_invalidate_interrupt(struct pt_regs *); #endif extern void (*__initconst interrupt[NR_VECTORS-FIRST_EXTERNAL_VECTOR])(void); +#ifdef CONFIG_TRACING +#define trace_interrupt interrupt +#endif typedef int vector_irq_t[NR_VECTORS]; DECLARE_PER_CPU(vector_irq_t, vector_irq); diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h index c48a950..6f1c3a8 100644 --- a/arch/x86/include/asm/segment.h +++ b/arch/x86/include/asm/segment.h @@ -214,6 +214,9 @@ #ifdef __KERNEL__ #ifndef __ASSEMBLY__ extern const char early_idt_handlers[NUM_EXCEPTION_VECTORS][2+2+5]; +#ifdef CONFIG_TRACING +#define trace_early_idt_handlers early_idt_handlers +#endif /* * Load a segment. Fall back on loading the zero diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h index 7036cb6..58d66fe 100644 --- a/arch/x86/include/asm/traps.h +++ b/arch/x86/include/asm/traps.h @@ -37,6 +37,23 @@ asmlinkage void machine_check(void); #endif /* CONFIG_X86_MCE */ asmlinkage void simd_coprocessor_error(void); +#ifdef CONFIG_TRACING +asmlinkage void trace_page_fault(void); +#define trace_divide_error divide_error +#define trace_bounds bounds +#define trace_invalid_op invalid_op +#define trace_device_not_available device_not_available +#define trace_coprocessor_segment_overrun coprocessor_segment_overrun +#define trace_invalid_TSS invalid_TSS +#define trace_segment_not_present
[tip:x86/trace] x86, trace: Remove __alloc_intr_gate()
Commit-ID: 959c071f0974cda7702d7574647de7ad9259eb57 Gitweb: http://git.kernel.org/tip/959c071f0974cda7702d7574647de7ad9259eb57 Author: Seiji Aguchi seiji.agu...@hds.com AuthorDate: Wed, 30 Oct 2013 16:36:08 -0400 Committer: H. Peter Anvin h...@linux.intel.com CommitDate: Fri, 8 Nov 2013 14:15:44 -0800 x86, trace: Remove __alloc_intr_gate() Prepare to move set_intr_gate() into a macro by removing __alloc_intr_gate(). The purpose is to avoid failing a kernel build after applying a subsequent patch which changes set_intr_gate() into a macro. Signed-off-by: Seiji Aguchi seiji.agu...@hds.com Link: http://lkml.kernel.org/r/52716db8.1080...@hds.com Signed-off-by: H. Peter Anvin h...@linux.intel.com --- arch/x86/include/asm/desc.h | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index b90e5df..d939567 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -395,15 +395,10 @@ static inline void trace_set_intr_gate(unsigned int gate, void *addr) #define __trace_alloc_intr_gate(n, addr) #endif -static inline void __alloc_intr_gate(unsigned int n, void *addr) -{ - set_intr_gate(n, addr); -} - #define alloc_intr_gate(n, addr) \ do {\ alloc_system_vector(n); \ - __alloc_intr_gate(n, addr); \ + set_intr_gate(n, addr); \ __trace_alloc_intr_gate(n, trace_##addr); \ } while (0) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] Make efi-pstore return a unique id
> How does efivars backend handle "unlink(2)" in the pstore file system. > pstore will call the backend->erase function passing the "id". The > backend should then erase the right record from persistent storage. > > With the ((timestamp * 100 + part) * 100 + count function - you can > easily reverse it to find timestamp, part and count - would that make life > easier for the backend to find the record to be erased? If you use a > hash function you will need to check each record and compute the > hash to see if it matches (probably not a big deal because the backend > will generally only hold a handful of records). By generating the id in efi_pstore_write(), and using it to a variable name, It works at an erasing time as well. The root cause of this problem is that efivars used "part" as id. It was a wrong way. So, we should not keep it. Seiji N�Р骒r��yb�X�肚�v�^�)藓{.n�+�伐�{��赙zXФ�≤�}��财�z�:+v�����赙zZ+��+zf"�h���~i���z��wア�?�ㄨ��&�)撷f��^j谦y�m��@A�a囤� 0鹅h���i
RE: [PATCH] Make efi-pstore return a unique id
How does efivars backend handle unlink(2) in the pstore file system. pstore will call the backend-erase function passing the id. The backend should then erase the right record from persistent storage. With the ((timestamp * 100 + part) * 100 + count function - you can easily reverse it to find timestamp, part and count - would that make life easier for the backend to find the record to be erased? If you use a hash function you will need to check each record and compute the hash to see if it matches (probably not a big deal because the backend will generally only hold a handful of records). By generating the id in efi_pstore_write(), and using it to a variable name, It works at an erasing time as well. The root cause of this problem is that efivars used part as id. It was a wrong way. So, we should not keep it. Seiji N�Р骒r��yb�X�肚�v�^�)藓{.n�+�伐�{��赙zXФ�≤�}��财�z�j:+v�����赙zZ+��+zf"�h���~i���z��wア�?�ㄨ���)撷f��^j谦y�m��@A�a囤� 0鹅h���i
RE: [PATCH] Make efi-pstore return a unique id
> What about feeding the bytes of all three integers into a non-cryptographic > hash function? > Using this way you get a cheap unique id. It is reasonable to me. Seiji
RE: [PATCH] Make efi-pstore return a unique id
> >> +{ > >> + char id_str[64]; > >> + u64 id = 0; > >> + > >> + sprintf(id_str, "%lu%u%d", timestamp, part, count); > >> + if (kstrtoull(id_str, 10, )) > >> + pr_warn("efi-pstore: failed to generate id\n"); > >> + return id; > >> +} > > > > This is just odd. You make a string from three ints and then a parse > > it to a int again. > > Agreed. I liked your ((timestamp * 100 + part) * 100 + count function much > more than this. I was worried that the part and count could be more than 100. If it happens, the id may not be unique... But, currently, size of nvram storage is limited, so it is a corner case. I respect your opinion. > @@ -125,9 +137,11 @@ static int efi_pstore_write(enum pstore_type_id type, > efi_char16_t efi_name[DUMP_NAME_LEN]; > efi_guid_t vendor = LINUX_EFI_CRASH_GUID; > int i, ret = 0; > + unsigned long timestamp; > > + timestamp = get_seconds(); > sprintf(name, "dump-type%u-%u-%d-%lu-%c", type, part, count, > - get_seconds(), compressed ? 'C' : 'D'); > + timestamp, compressed ? 'C' : 'D'); > I don't think you need to change efi_pstore_write(). It is just add a local timestamp variable. Seiji > for (i = 0; i < DUMP_NAME_LEN; i++) > efi_name[i] = name[i]; > -- > 1.8.4.2
RE: [PATCH] Make efi-pstore return a unique id
+{ + char id_str[64]; + u64 id = 0; + + sprintf(id_str, %lu%u%d, timestamp, part, count); + if (kstrtoull(id_str, 10, id)) + pr_warn(efi-pstore: failed to generate id\n); + return id; +} This is just odd. You make a string from three ints and then a parse it to a int again. Agreed. I liked your ((timestamp * 100 + part) * 100 + count function much more than this. I was worried that the part and count could be more than 100. If it happens, the id may not be unique... But, currently, size of nvram storage is limited, so it is a corner case. I respect your opinion. @@ -125,9 +137,11 @@ static int efi_pstore_write(enum pstore_type_id type, efi_char16_t efi_name[DUMP_NAME_LEN]; efi_guid_t vendor = LINUX_EFI_CRASH_GUID; int i, ret = 0; + unsigned long timestamp; + timestamp = get_seconds(); sprintf(name, dump-type%u-%u-%d-%lu-%c, type, part, count, - get_seconds(), compressed ? 'C' : 'D'); + timestamp, compressed ? 'C' : 'D'); I don't think you need to change efi_pstore_write(). It is just add a local timestamp variable. Seiji for (i = 0; i DUMP_NAME_LEN; i++) efi_name[i] = name[i]; -- 1.8.4.2
RE: [PATCH] Make efi-pstore return a unique id
What about feeding the bytes of all three integers into a non-cryptographic hash function? Using this way you get a cheap unique id. It is reasonable to me. Seiji
RE: [PATCH 0/2] make all stored entries accessible.
> I also like option 1 ... but I think the "id" should be a persistent value for > a given saved record. So some func(timestamp, part, count) would be a > good idea. If we try using "sequential" numbers - and don't manage to > clear out /sys/fs/pstore each time - then we may have the same "dmesg" > file show up with different names on each boot. > > Right now I have a simple script to save & clear ... not much more > complex than: > > cd /sys/fs/pstore > cp * /var/log/save-pstore > rm * > > This depends on not re-using filenames (otherwise new files in pstore > might overwrite older saved files in my /var/log/save-pstore area). I see.. It is a persuasive use case. (1) I agree that some func(timestamp, part, count) would be good idea. This might work, although an overflow will happen some time... sprintf(id_str, "%lld%d%d", timestamp, part, count) simple_str_to_ull(id_str, , base) (2) There is a GetNextHighMonotonicCount() runtime service in EFI specification to get a persistent number across the reboot, but I'm not sure if it is safe to use it.. Also, it would be good if we can create the id by ourselves, rather than using firmware. (3) Also, (it might not be good idea), if a pstore filesystem expects all backend drivers to use the persistent id, the pstore should provide it by itself. (by using time stamp counter or something like that.) As I looked into the ramoops's code. It seems to use a non-persistent sequential counter,by initializing read_cnt to "0" in ramoops_pstore_open(), and incrementing it in ramoops_pstore_read(). It doesn't seem to be the pstore's expectation. And when someone introduces a new driver, they may misunderstand how to create the id as well.. As above, there are mutiple ideas, but (1) is reasonable to me. Seiji -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 0/2] make all stored entries accessible.
> 1. combine timestamp, count and part into "id". >for now, in efi-pstore.c, *id = part. and we could simply change it >to unique one. F.E. *id = (timestamp * 100 + part) * 100 + count. My opinion close to 1. But, the "*id" should not be the complex one like (timestamp * 100 + part) * 100 + count. Rather, it should be a simple sequential number beginning with 1. - Remove "id" member from pstore_read_info struct. - Introduce a global sequential counter like "static u64 efi_pstore_read_count" (or add the member to pstore_info structure?) - Initialize to "1" in efi_pstore_open(). - Increment it in efi_pstore_read(). If we can do it, we don't need to touch a code of pstore filesystem and can avoid regressions of other backend drivers. Seiji > -Original Message- > From: linux-efi-ow...@vger.kernel.org > [mailto:linux-efi-ow...@vger.kernel.org] On Behalf Of Madper Xie > Sent: Wednesday, October 30, 2013 11:01 PM > To: Luck, Tony > Cc: Seiji Aguchi; Madper Xie; keesc...@chromium.org; ccr...@android.com; > an...@enomsg.org; linux-...@vger.kernel.org; linux- > ker...@vger.kernel.org; bbbo...@gmail.com > Subject: Re: [PATCH 0/2] make all stored entries accessible. > > > tony.l...@intel.com writes: > > >> So, do you mean efivars should fix to use the "id" in a proper way? > > > > It would avoid the need for all these tests, and additions to the filename > > to guarantee > > uniqueness. > > > > Not sure what options efivars has to create a unique, persistent "id" for > > each > > record. It's a fundamental part of how ERST works (though the unique ID is > > just > > based around a timestamp). > > > Okay, maybe there are three options here: > 1. combine timestamp, count and part into "id". >for now, in efi-pstore.c, *id = part. and we could simply change it >to unique one. F.E. *id = (timestamp * 100 + part) * 100 + count. > 2. change the id's type. let id become a string. >so every backend could write anything to id. then it will become a >part of filename in pstore filesystem. (but we need fix all backends >since we modified api.) > 3. apply the patches I have sent... even if the filename will be ugly >and gory... > Which one do you prefer? > >> I acked Madper's patch 2/2 earlier today, but when I look at your test > >> result, I'm not sure if > >> it is reasonable for users to make multiple numbers visible to the file > >> name. > >> > >>> -r--r--r-- 1 root root 17499 Oct 30 13:41 > >>> dmesg-erst-5940651313304961029--2129078373-1383165669 > > > > after I added the "count = 0" initialization the filename gets a tiny bit > > less > > scary: > > > > -r--r--r-- 1 root root 17499 Oct 30 13:41 > > dmesg-erst-5940651313304961029-0-1383165669 > > > > -Tony > > > -- > Best, > Madper Xie. > -- > To unsubscribe from this list: send the line "unsubscribe linux-efi" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 0/2] make all stored entries accessible.
1. combine timestamp, count and part into id. for now, in efi-pstore.c, *id = part. and we could simply change it to unique one. F.E. *id = (timestamp * 100 + part) * 100 + count. My opinion close to 1. But, the *id should not be the complex one like (timestamp * 100 + part) * 100 + count. Rather, it should be a simple sequential number beginning with 1. - Remove id member from pstore_read_info struct. - Introduce a global sequential counter like static u64 efi_pstore_read_count (or add the member to pstore_info structure?) - Initialize to 1 in efi_pstore_open(). - Increment it in efi_pstore_read(). If we can do it, we don't need to touch a code of pstore filesystem and can avoid regressions of other backend drivers. Seiji -Original Message- From: linux-efi-ow...@vger.kernel.org [mailto:linux-efi-ow...@vger.kernel.org] On Behalf Of Madper Xie Sent: Wednesday, October 30, 2013 11:01 PM To: Luck, Tony Cc: Seiji Aguchi; Madper Xie; keesc...@chromium.org; ccr...@android.com; an...@enomsg.org; linux-...@vger.kernel.org; linux- ker...@vger.kernel.org; bbbo...@gmail.com Subject: Re: [PATCH 0/2] make all stored entries accessible. tony.l...@intel.com writes: So, do you mean efivars should fix to use the id in a proper way? It would avoid the need for all these tests, and additions to the filename to guarantee uniqueness. Not sure what options efivars has to create a unique, persistent id for each record. It's a fundamental part of how ERST works (though the unique ID is just based around a timestamp). Okay, maybe there are three options here: 1. combine timestamp, count and part into id. for now, in efi-pstore.c, *id = part. and we could simply change it to unique one. F.E. *id = (timestamp * 100 + part) * 100 + count. 2. change the id's type. let id become a string. so every backend could write anything to id. then it will become a part of filename in pstore filesystem. (but we need fix all backends since we modified api.) 3. apply the patches I have sent... even if the filename will be ugly and gory... Which one do you prefer? I acked Madper's patch 2/2 earlier today, but when I look at your test result, I'm not sure if it is reasonable for users to make multiple numbers visible to the file name. -r--r--r-- 1 root root 17499 Oct 30 13:41 dmesg-erst-5940651313304961029--2129078373-1383165669 after I added the count = 0 initialization the filename gets a tiny bit less scary: -r--r--r-- 1 root root 17499 Oct 30 13:41 dmesg-erst-5940651313304961029-0-1383165669 -Tony -- Best, Madper Xie. -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 0/2] make all stored entries accessible.
I also like option 1 ... but I think the id should be a persistent value for a given saved record. So some func(timestamp, part, count) would be a good idea. If we try using sequential numbers - and don't manage to clear out /sys/fs/pstore each time - then we may have the same dmesg file show up with different names on each boot. Right now I have a simple script to save clear ... not much more complex than: cd /sys/fs/pstore cp * /var/log/save-pstore rm * This depends on not re-using filenames (otherwise new files in pstore might overwrite older saved files in my /var/log/save-pstore area). I see.. It is a persuasive use case. (1) I agree that some func(timestamp, part, count) would be good idea. This might work, although an overflow will happen some time... sprintf(id_str, %lld%d%d, timestamp, part, count) simple_str_to_ull(id_str, id, base) (2) There is a GetNextHighMonotonicCount() runtime service in EFI specification to get a persistent number across the reboot, but I'm not sure if it is safe to use it.. Also, it would be good if we can create the id by ourselves, rather than using firmware. (3) Also, (it might not be good idea), if a pstore filesystem expects all backend drivers to use the persistent id, the pstore should provide it by itself. (by using time stamp counter or something like that.) As I looked into the ramoops's code. It seems to use a non-persistent sequential counter,by initializing read_cnt to 0 in ramoops_pstore_open(), and incrementing it in ramoops_pstore_read(). It doesn't seem to be the pstore's expectation. And when someone introduces a new driver, they may misunderstand how to create the id as well.. As above, there are mutiple ideas, but (1) is reasonable to me. Seiji -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 0/2] make all stored entries accessible.
> Ah - I was expecting that the backend driver would have a unique "id" for > each record stored ... but is seems that this isn't true for efivars. > So, do you mean efivars should fix to use the "id" in a proper way? I acked Madper's patch 2/2 earlier today, but when I look at your test result, I'm not sure if it is reasonable for users to make multiple numbers visible to the file name. > -r--r--r-- 1 root root 17499 Oct 30 13:41 > dmesg-erst-5940651313304961029--2129078373-1383165669 Seiji -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 3/4] Delete __trace_alloc_intr_gate()
Currently irq vector handlers for tracing are registered in both set_intr_gate() and __trace_alloc_intr_gate() in alloc_intr_gate(). But, we don't need to do that twice. So, let's delete __trace_alloc_intr_gate(). Signed-off-by: Seiji Aguchi --- arch/x86/include/asm/desc.h | 22 -- 1 file changed, 22 deletions(-) diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index 3d73437..50d033a 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -392,32 +392,10 @@ static inline void alloc_system_vector(int vector) } } -#ifdef CONFIG_TRACING -static inline void trace_set_intr_gate(unsigned int gate, void *addr) -{ - gate_desc s; - - pack_gate(, GATE_INTERRUPT, (unsigned long)addr, 0, 0, __KERNEL_CS); - write_idt_entry(trace_idt_table, gate, ); -} - -static inline void __trace_alloc_intr_gate(unsigned int n, void *addr) -{ - trace_set_intr_gate(n, addr); -} -#else -static inline void trace_set_intr_gate(unsigned int gate, void *addr) -{ -} - -#define __trace_alloc_intr_gate(n, addr) -#endif - #define alloc_intr_gate(n, addr) \ do {\ alloc_system_vector(n); \ set_intr_gate(n, addr); \ - __trace_alloc_intr_gate(n, trace_##addr); \ } while (0) /* -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 4/4] Add page fault tracepoints
This patch introduces page fault tracepoints to x86 architecture by switching IDT. Two events, for user and kernel spaces, are introduced at the beginning of page fault handler for tracing. - User space event There is a request of page fault event for user space as below. https://lkml.kernel.org/r/1368079520-11015-2-git-send-email-fdeslaur+()+gmail+!+com https://lkml.kernel.org/r/1368079520-11015-1-git-send-email-fdeslaur+()+gmail+!+com - Kernel space event: When we meature an overhead in kernel space for investigating performance issues, we can check if it comes from the page fault events. Signed-off-by: Seiji Aguchi --- arch/x86/include/asm/trace/exceptions.h | 52 + arch/x86/mm/Makefile| 2 ++ arch/x86/mm/fault.c | 13 + 3 files changed, 67 insertions(+) create mode 100644 arch/x86/include/asm/trace/exceptions.h diff --git a/arch/x86/include/asm/trace/exceptions.h b/arch/x86/include/asm/trace/exceptions.h new file mode 100644 index 000..86540c0 --- /dev/null +++ b/arch/x86/include/asm/trace/exceptions.h @@ -0,0 +1,52 @@ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM exceptions + +#if !defined(_TRACE_PAGE_FAULT_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_PAGE_FAULT_H + +#include + +extern void trace_irq_vector_regfunc(void); +extern void trace_irq_vector_unregfunc(void); + +DECLARE_EVENT_CLASS(x86_exceptions, + + TP_PROTO(unsigned long address, struct pt_regs *regs, +unsigned long error_code), + + TP_ARGS(address, regs, error_code), + + TP_STRUCT__entry( + __field(unsigned long, address ) + __field(unsigned long, ip ) + __field(unsigned long, error_code ) + ), + + TP_fast_assign( + __entry->address = address; + __entry->ip = regs->ip; + __entry->error_code = error_code; + ), + + TP_printk("address=%pf ip=%pf error_code=0x%lx", + (void *)__entry->address, (void *)__entry->ip, + __entry->error_code) ); + +#define DEFINE_PAGE_FAULT_EVENT(name) \ +DEFINE_EVENT_FN(x86_exceptions, name, \ + TP_PROTO(unsigned long address, struct pt_regs *regs, \ +unsigned long error_code), \ + TP_ARGS(address, regs, error_code), \ + trace_irq_vector_regfunc, \ + trace_irq_vector_unregfunc); + +DEFINE_PAGE_FAULT_EVENT(user_page_fault); +DEFINE_PAGE_FAULT_EVENT(kernel_page_fault); + +#undef TRACE_INCLUDE_PATH +#define TRACE_INCLUDE_PATH . +#define TRACE_INCLUDE_FILE exceptions +#endif /* _TRACE_PAGE_FAULT_H */ + +/* This part must be outside protection */ +#include diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 23d8e5f..6a19ad9 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -6,6 +6,8 @@ nostackp := $(call cc-option, -fno-stack-protector) CFLAGS_physaddr.o := $(nostackp) CFLAGS_setup_nx.o := $(nostackp) +CFLAGS_fault.o := -I$(src)/../include/asm/trace + obj-$(CONFIG_X86_PAT) += pat_rbtree.o obj-$(CONFIG_SMP) += tlb.o diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index fd3e281..f2730cbc 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -20,6 +20,9 @@ #include /* kmemcheck_*(), ... */ #include /* VSYSCALL_START */ +#define CREATE_TRACE_POINTS +#include + /* * Page fault error code bits: * @@ -1232,12 +1235,22 @@ do_page_fault(struct pt_regs *regs, unsigned long error_code) exception_exit(prev_state); } +static void trace_page_fault_entries(struct pt_regs *regs, +unsigned long error_code) +{ + if (user_mode(regs)) + trace_user_page_fault(read_cr2(), regs, error_code); + else + trace_kernel_page_fault(read_cr2(), regs, error_code); +} + dotraplinkage void __kprobes trace_do_page_fault(struct pt_regs *regs, unsigned long error_code) { enum ctx_state prev_state; prev_state = exception_enter(); + trace_page_fault_entries(regs, error_code); __do_page_fault(regs, error_code); exception_exit(prev_state); } -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 2/4] Register exception handler to trace IDT
This patch registers exception handlers for tracing to a trace IDT. To implemented it in set_intr_gate(), this patch does followings. - Register the exception handlers to the trace IDT by prepending "trace_" to the handler's names. - Also, newly introduce trace_page_fault() to add tracepoints in a subsequent patch. Signed-off-by: Seiji Aguchi --- arch/x86/include/asm/desc.h| 28 +++- arch/x86/include/asm/hw_irq.h | 3 +++ arch/x86/include/asm/segment.h | 3 +++ arch/x86/include/asm/traps.h | 20 arch/x86/kernel/entry_32.S | 10 ++ arch/x86/kernel/entry_64.S | 13 - arch/x86/kernel/head64.c | 2 +- arch/x86/kernel/kvm.c | 2 +- arch/x86/kernel/traps.c| 28 ++-- arch/x86/mm/fault.c| 10 ++ 10 files changed, 97 insertions(+), 22 deletions(-) diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index d939567..3d73437 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -327,10 +327,25 @@ static inline void write_trace_idt_entry(int entry, const gate_desc *gate) { write_idt_entry(trace_idt_table, entry, gate); } + +static inline void _trace_set_gate(int gate, unsigned type, void *addr, + unsigned dpl, unsigned ist, unsigned seg) +{ + gate_desc s; + + pack_gate(, type, (unsigned long)addr, dpl, ist, seg); + /* +* does not need to be atomic because it is only done once at +* setup time +*/ + write_trace_idt_entry(gate, ); +} #else static inline void write_trace_idt_entry(int entry, const gate_desc *gate) { } + +#define _trace_set_gate(gate, type, addr, dpl, ist, seg) #endif static inline void _set_gate(int gate, unsigned type, void *addr, @@ -353,11 +368,14 @@ static inline void _set_gate(int gate, unsigned type, void *addr, * Pentium F0 0F bugfix can have resulted in the mapped * IDT being write-protected. */ -static inline void set_intr_gate(unsigned int n, void *addr) -{ - BUG_ON((unsigned)n > 0xFF); - _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); -} +#define set_intr_gate(n, addr) \ + do {\ + BUG_ON((unsigned)n > 0xFF); \ + _set_gate(n, GATE_INTERRUPT, (void *)addr, 0, 0,\ + __KERNEL_CS); \ + _trace_set_gate(n, GATE_INTERRUPT, (void *)trace_##addr,\ + 0, 0, __KERNEL_CS); \ + } while (0) extern int first_system_vector; /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */ diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h index 92b3bae..cba45d9 100644 --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -187,6 +187,9 @@ extern __visible void smp_invalidate_interrupt(struct pt_regs *); #endif extern void (*__initconst interrupt[NR_VECTORS-FIRST_EXTERNAL_VECTOR])(void); +#ifdef CONFIG_TRACING +#define trace_interrupt interrupt +#endif typedef int vector_irq_t[NR_VECTORS]; DECLARE_PER_CPU(vector_irq_t, vector_irq); diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h index c48a950..6f1c3a8 100644 --- a/arch/x86/include/asm/segment.h +++ b/arch/x86/include/asm/segment.h @@ -214,6 +214,9 @@ #ifdef __KERNEL__ #ifndef __ASSEMBLY__ extern const char early_idt_handlers[NUM_EXCEPTION_VECTORS][2+2+5]; +#ifdef CONFIG_TRACING +#define trace_early_idt_handlers early_idt_handlers +#endif /* * Load a segment. Fall back on loading the zero diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h index 7036cb6..58d66fe 100644 --- a/arch/x86/include/asm/traps.h +++ b/arch/x86/include/asm/traps.h @@ -37,6 +37,23 @@ asmlinkage void machine_check(void); #endif /* CONFIG_X86_MCE */ asmlinkage void simd_coprocessor_error(void); +#ifdef CONFIG_TRACING +asmlinkage void trace_page_fault(void); +#define trace_divide_error divide_error +#define trace_bounds bounds +#define trace_invalid_op invalid_op +#define trace_device_not_available device_not_available +#define trace_coprocessor_segment_overrun coprocessor_segment_overrun +#define trace_invalid_TSS invalid_TSS +#define trace_segment_not_present segment_not_present +#define trace_general_protection general_protection +#define trace_spurious_interrupt_bug spurious_interrupt_bug +#define trace_coprocessor_error coprocessor_error +#define trace_alignment_check alignment_check +#define trace_simd_coprocessor_error simd_coprocessor_error +#define trace_async_page_fault async_page_fault +#endif + dotraplinkage void do_divide_error(struct pt_regs *, long); dotraplinkage void do_debug(struct pt_regs *, long); dotraplinkage vo
[PATCH v4 0/4] Introduce page fault tracepoints
Change from v3: - Separate modifications to make review easy. - Refactor implementations registering exception/irq_vector handers. (Patch 1, 2, 3) This series introduce page fault tracepoints. Detailed descriptions are explained in each patch. Any comments are welcome. Seiji Aguchi (4): Move set_intr_gate() into macro Register exception handler to trace IDT Delete __trace_alloc_intr_gate() Add page fault tracepoints arch/x86/include/asm/desc.h | 57 ++--- arch/x86/include/asm/hw_irq.h | 3 ++ arch/x86/include/asm/segment.h | 3 ++ arch/x86/include/asm/trace/exceptions.h | 52 ++ arch/x86/include/asm/traps.h| 20 arch/x86/kernel/entry_32.S | 10 ++ arch/x86/kernel/entry_64.S | 13 +++- arch/x86/kernel/head64.c| 2 +- arch/x86/kernel/kvm.c | 2 +- arch/x86/kernel/traps.c | 28 arch/x86/mm/Makefile| 2 ++ arch/x86/mm/fault.c | 23 + 12 files changed, 165 insertions(+), 50 deletions(-) create mode 100644 arch/x86/include/asm/trace/exceptions.h -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 1/4] Move set_intr_gate() into macro
Move set_intr_gate() into a macro by removing __alloc_intr_gate(). The purpose is to avoid failing a kernel build after applying a subsequent patch which changes set_intr_gate() to macro. Signed-off-by: Seiji Aguchi --- arch/x86/include/asm/desc.h | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index b90e5df..d939567 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -395,15 +395,10 @@ static inline void trace_set_intr_gate(unsigned int gate, void *addr) #define __trace_alloc_intr_gate(n, addr) #endif -static inline void __alloc_intr_gate(unsigned int n, void *addr) -{ - set_intr_gate(n, addr); -} - #define alloc_intr_gate(n, addr) \ do {\ alloc_system_vector(n); \ - __alloc_intr_gate(n, addr); \ + set_intr_gate(n, addr); \ __trace_alloc_intr_gate(n, trace_##addr); \ } while (0) -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4] efivars,efi-pstore: Hold off deletion of sysfs entry until the scan is completed
Change from v3: - Introduce EFI_VARS_DATA_SIZE_MAX macro. - Add some description why a scanning/deleting logic is needed. Currently, when mounting pstore file system, a read callback of efi_pstore driver runs mutiple times as below. - In the first read callback, scan efivar_sysfs_list from head and pass a kmsg buffer of a entry to an upper pstore layer. - In the second read callback, rescan efivar_sysfs_list from the entry and pass another kmsg buffer to it. - Repeat the scan and pass until the end of efivar_sysfs_list. In this process, an entry is read across the multiple read function calls. To avoid race between the read and erasion, the whole process above is protected by a spinlock, holding in open() and releasing in close(). At the same time, kmemdup() is called to pass the buffer to pstore filesystem during it. And then, it causes a following lockdep warning. To make the dynamic memory allocation runnable without taking spinlock, holding off a deletion of sysfs entry if it happens while scanning it via efi_pstore, and deleting it after the scan is completed. To implement it, this patch introduces two flags, scanning and deleting, to efivar_entry. On the code basis, it seems that all the scanning and deleting logic is not needed because __efivars->lock are not dropped when reading from the EFI variable store. But, the scanning and deleting logic is still needed because an efi-pstore and a pstore filesystem works as follows. In case an entry(A) is found, the pointer is saved to psi->data. And efi_pstore_read() passes the entry(A) to a pstore filesystem by releasing __efivars->lock. And then, the pstore filesystem calls efi_pstore_read() again and the same entry(A), which is saved to psi->data, is used for resuming to scan a sysfs-list. So, to protect the entry(A), the logic is needed. [1.143710] [ cut here ] [1.144058] WARNING: CPU: 1 PID: 1 at kernel/lockdep.c:2740 lockdep_trace_alloc+0x104/0x110() [1.144058] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags)) [1.144058] Modules linked in: [1.144058] CPU: 1 PID: 1 Comm: systemd Not tainted 3.11.0-rc5 #2 [1.144058] 0009 8800797e9ae0 816614a5 8800797e9b28 [1.144058] 8800797e9b18 8105510d 0080 0046 [1.144058] 00d0 03af 81ccd0c0 8800797e9b78 [1.144058] Call Trace: [1.144058] [] dump_stack+0x54/0x74 [1.144058] [] warn_slowpath_common+0x7d/0xa0 [1.144058] [] warn_slowpath_fmt+0x4c/0x50 [1.144058] [] ? vsscanf+0x57f/0x7b0 [1.144058] [] lockdep_trace_alloc+0x104/0x110 [1.144058] [] __kmalloc_track_caller+0x50/0x280 [1.144058] [] ? efi_pstore_read_func.part.1+0x12b/0x170 [1.144058] [] kmemdup+0x20/0x50 [1.144058] [] efi_pstore_read_func.part.1+0x12b/0x170 [1.144058] [] ? efi_pstore_read_func.part.1+0x170/0x170 [1.144058] [] efi_pstore_read_func+0xb4/0xe0 [1.144058] [] __efivar_entry_iter+0xfb/0x120 [1.144058] [] efi_pstore_read+0x3f/0x50 [1.144058] [] pstore_get_records+0x9a/0x150 [1.158207] [] ? selinux_d_instantiate+0x1c/0x20 [1.158207] [] ? parse_options+0x80/0x80 [1.158207] [] pstore_fill_super+0xa5/0xc0 [1.158207] [] mount_single+0xa2/0xd0 [1.158207] [] pstore_mount+0x18/0x20 [1.158207] [] mount_fs+0x39/0x1b0 [1.158207] [] ? __alloc_percpu+0x10/0x20 [1.158207] [] vfs_kern_mount+0x63/0xf0 [1.158207] [] do_mount+0x23e/0xa20 [1.158207] [] ? strndup_user+0x4b/0xf0 [1.158207] [] SyS_mount+0x83/0xc0 [1.158207] [] system_call_fastpath+0x16/0x1b [1.158207] ---[ end trace 61981bc62de9f6f4 ]--- Signed-off-by: Seiji Aguchi --- drivers/firmware/efi/efi-pstore.c | 143 +++--- drivers/firmware/efi/efivars.c| 12 ++-- drivers/firmware/efi/vars.c | 12 +++- include/linux/efi.h | 4 ++ 4 files changed, 155 insertions(+), 16 deletions(-) diff --git a/drivers/firmware/efi/efi-pstore.c b/drivers/firmware/efi/efi-pstore.c index 5002d50..6ce31e9 100644 --- a/drivers/firmware/efi/efi-pstore.c +++ b/drivers/firmware/efi/efi-pstore.c @@ -18,14 +18,12 @@ module_param_named(pstore_disable, efivars_pstore_disable, bool, 0644); static int efi_pstore_open(struct pstore_info *psi) { - efivar_entry_iter_begin(); psi->data = NULL; return 0; } static int efi_pstore_close(struct pstore_info *psi) { - efivar_entry_iter_end(); psi->data = NULL; return 0; } @@ -91,19 +89,125 @@ static int efi_pstore_read_func(struct efivar_entry *entry, void *data) __efivar_entry_get(entry, >var.Attributes, >var.DataSize, entry->var.Data); size = entry->var.DataSize; + memcpy(*cb_data->buf, entry->var.Data, + (size_t)min_t(unsigned long, EFIVARS_DATA_SIZE_MAX, size)); - *cb_data-
RE: [PATCH v3] efivars,efi-pstore: Hold off deletion of sysfs entry until, the scan is completed
> On Fri, 18 Oct, at 10:30:58PM, Seiji Aguchi wrote: > > The scanning and deleting logic is still needed. In case an entry(A) > > is found, the pointer is saved to psi->data. And efi_pstore_read() > > passes the entry(A) to a pstore filesystem by releasing > > __efivars->lock. > > > > And then, the pstore filesystem calls efi_pstore_read() again and the > > same entry(A), which is saved to psi->data, is used for re-scanning a > > sysfs-list. (That is why list_for_each_entry_safe_from () is used in > > efi_pstore_sysfs_entry_iter().) > > > > So, to protect the entry(A), the logic is needed because, in pstore > > filesystem, __efivars->lock Is released. > > Very good point. Thanks, I will add the description in the next patch. Seiji -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 2/2] pstore: Differentiating names by adding count and timestamp
> -Original Message- > From: Madper Xie [mailto:c...@redhat.com] > Sent: Wednesday, October 30, 2013 5:45 AM > To: tony.l...@intel.com; keesc...@chromium.org; ccr...@android.com; > an...@enomsg.org; Seiji Aguchi > Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; > bbbo...@gmail.com; Madper Xie > Subject: [PATCH 2/2] pstore: Differentiating names by adding count and > timestamp > > From: Madper Xie > > pstore denominates dumped file as type-psname-id. it makes many file have > the same name if there are many entries in backend have the same id. > So adding count and timestamp to file name for differentiating. > > Signed-off-by: Madper Xie It should be tested by other drivers as well.. But, looks good to me. Acked-by: Seiji Aguchi > --- > fs/pstore/inode.c | 29 ++--- > 1 file changed, 18 insertions(+), 11 deletions(-) > > diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c > index 0ae994c..36b502f 100644 > --- a/fs/pstore/inode.c > +++ b/fs/pstore/inode.c > @@ -285,7 +285,7 @@ int pstore_mkfile(enum pstore_type_id type, char *psname, > u64 id, int count, > int rc = 0; > charname[PSTORE_NAMELEN]; > struct pstore_private *private, *pos; > - unsigned long flags; > + unsigned long flags, timestamp; > > spin_lock_irqsave(_lock, flags); > list_for_each_entry(pos, , list) { > @@ -316,35 +316,42 @@ int pstore_mkfile(enum pstore_type_id type, char > *psname, u64 id, int count, > private->count = count; > private->psi = psi; > memcpy(>time, , sizeof(struct timespec)); > + timestamp = time.tv_sec; > > switch (type) { > case PSTORE_TYPE_DMESG: > - sprintf(name, "dmesg-%s-%lld%s", psname, id, > - compressed ? ".enc.z" : ""); > + sprintf(name, "dmesg-%s-%lld-%d-%ld%s", psname, id, count, > + timestamp, compressed ? ".enc.z" : ""); > break; > case PSTORE_TYPE_CONSOLE: > - sprintf(name, "console-%s", psname); > + sprintf(name, "console-%s-%d-%ld", psname, count, timestamp); > break; > case PSTORE_TYPE_FTRACE: > - sprintf(name, "ftrace-%s", psname); > + sprintf(name, "ftrace-%s-%d-%ld", psname, count, timestamp); > break; > case PSTORE_TYPE_MCE: > - sprintf(name, "mce-%s-%lld", psname, id); > + sprintf(name, "mce-%s-%lld-%d-%ld", psname, id, count, > + timestamp); > break; > case PSTORE_TYPE_PPC_RTAS: > - sprintf(name, "rtas-%s-%lld", psname, id); > + sprintf(name, "rtas-%s-%lld-%d-%ld", psname, id, count, > + timestamp); > break; > case PSTORE_TYPE_PPC_OF: > - sprintf(name, "powerpc-ofw-%s-%lld", psname, id); > + sprintf(name, "powerpc-ofw-%s-%lld-%d-%ld", psname, id, count, > + timestamp); > break; > case PSTORE_TYPE_PPC_COMMON: > - sprintf(name, "powerpc-common-%s-%lld", psname, id); > + sprintf(name, "powerpc-common-%s-%lld-%d-%ld", psname, id, > + count, timestamp); > break; > case PSTORE_TYPE_UNKNOWN: > - sprintf(name, "unknown-%s-%lld", psname, id); > + sprintf(name, "unknown-%s-%lld-%d-%ld", psname, id, count, > + timestamp); > break; > default: > - sprintf(name, "type%d-%s-%lld", type, psname, id); > + sprintf(name, "type%d-%s-%lld-%d-%ld", type, psname, id, count, > + timestamp); > break; > } > > -- > 1.8.4.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/2] pstore: avoid incorrectly mark entry as duplicate
> -Original Message- > From: Madper Xie [mailto:c...@redhat.com] > Sent: Wednesday, October 30, 2013 5:45 AM > To: tony.l...@intel.com; keesc...@chromium.org; ccr...@android.com; > an...@enomsg.org; Seiji Aguchi > Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; > bbbo...@gmail.com; Madper Xie > Subject: [PATCH 1/2] pstore: avoid incorrectly mark entry as duplicate > > From: Madper Xie > > pstore try to find duplicate entries by check both ID, type and psi. > They are not really enough for efi backend. dumped vars always have > the same type, psi and ID. like follows: > dump-type0-9-1-1382511508-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0 > dump-type0-9-1-1382515661-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0 > > The "duplicate" entries won't appear in pstorefs. And a complain will be > print -- pstore: failed to load 76 record(s) from 'efi' > So I add two more checks: timespec and count. > > Signed-off-by: Madper Xie Looks good to me. Acked-by: Seiji Aguchi But, this patch has to be tested by other backend drivers like erst and ramoops. In my understanding, erst has supported to log multiple events. So, I'm interested why the same error hasn't happened in the other drivers. Seiji > --- > fs/pstore/inode.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c > index 1282384..0ae994c 100644 > --- a/fs/pstore/inode.c > +++ b/fs/pstore/inode.c > @@ -47,6 +47,7 @@ static LIST_HEAD(allpstore); > struct pstore_private { > struct list_head list; > struct pstore_info *psi; > + struct timespec time; > enum pstore_type_id type; > u64 id; > int count; > @@ -290,7 +291,9 @@ int pstore_mkfile(enum pstore_type_id type, char *psname, > u64 id, int count, > list_for_each_entry(pos, , list) { > if (pos->type == type && > pos->id == id && > - pos->psi == psi) { > + pos->psi == psi && > + pos->count == count && > + !timespec_compare(>time, )) { > rc = -EEXIST; > break; > } > @@ -312,6 +315,7 @@ int pstore_mkfile(enum pstore_type_id type, char *psname, > u64 id, int count, > private->id = id; > private->count = count; > private->psi = psi; > + memcpy(>time, , sizeof(struct timespec)); > > switch (type) { > case PSTORE_TYPE_DMESG: > -- > 1.8.4.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/2] pstore: avoid incorrectly mark entry as duplicate
-Original Message- From: Madper Xie [mailto:c...@redhat.com] Sent: Wednesday, October 30, 2013 5:45 AM To: tony.l...@intel.com; keesc...@chromium.org; ccr...@android.com; an...@enomsg.org; Seiji Aguchi Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; bbbo...@gmail.com; Madper Xie Subject: [PATCH 1/2] pstore: avoid incorrectly mark entry as duplicate From: Madper Xie bbbo...@gmail.com pstore try to find duplicate entries by check both ID, type and psi. They are not really enough for efi backend. dumped vars always have the same type, psi and ID. like follows: dump-type0-9-1-1382511508-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0 dump-type0-9-1-1382515661-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0 The duplicate entries won't appear in pstorefs. And a complain will be print -- pstore: failed to load 76 record(s) from 'efi' So I add two more checks: timespec and count. Signed-off-by: Madper Xie c...@redhat.com Looks good to me. Acked-by: Seiji Aguchi seiji.agu...@hds.com But, this patch has to be tested by other backend drivers like erst and ramoops. In my understanding, erst has supported to log multiple events. So, I'm interested why the same error hasn't happened in the other drivers. Seiji --- fs/pstore/inode.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c index 1282384..0ae994c 100644 --- a/fs/pstore/inode.c +++ b/fs/pstore/inode.c @@ -47,6 +47,7 @@ static LIST_HEAD(allpstore); struct pstore_private { struct list_head list; struct pstore_info *psi; + struct timespec time; enum pstore_type_id type; u64 id; int count; @@ -290,7 +291,9 @@ int pstore_mkfile(enum pstore_type_id type, char *psname, u64 id, int count, list_for_each_entry(pos, allpstore, list) { if (pos-type == type pos-id == id - pos-psi == psi) { + pos-psi == psi + pos-count == count + !timespec_compare(pos-time, time)) { rc = -EEXIST; break; } @@ -312,6 +315,7 @@ int pstore_mkfile(enum pstore_type_id type, char *psname, u64 id, int count, private-id = id; private-count = count; private-psi = psi; + memcpy(private-time, time, sizeof(struct timespec)); switch (type) { case PSTORE_TYPE_DMESG: -- 1.8.4.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 2/2] pstore: Differentiating names by adding count and timestamp
-Original Message- From: Madper Xie [mailto:c...@redhat.com] Sent: Wednesday, October 30, 2013 5:45 AM To: tony.l...@intel.com; keesc...@chromium.org; ccr...@android.com; an...@enomsg.org; Seiji Aguchi Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; bbbo...@gmail.com; Madper Xie Subject: [PATCH 2/2] pstore: Differentiating names by adding count and timestamp From: Madper Xie bbbo...@gmail.com pstore denominates dumped file as type-psname-id. it makes many file have the same name if there are many entries in backend have the same id. So adding count and timestamp to file name for differentiating. Signed-off-by: Madper Xie c...@redhat.com It should be tested by other drivers as well.. But, looks good to me. Acked-by: Seiji Aguchi seiji.agu...@hds.com --- fs/pstore/inode.c | 29 ++--- 1 file changed, 18 insertions(+), 11 deletions(-) diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c index 0ae994c..36b502f 100644 --- a/fs/pstore/inode.c +++ b/fs/pstore/inode.c @@ -285,7 +285,7 @@ int pstore_mkfile(enum pstore_type_id type, char *psname, u64 id, int count, int rc = 0; charname[PSTORE_NAMELEN]; struct pstore_private *private, *pos; - unsigned long flags; + unsigned long flags, timestamp; spin_lock_irqsave(allpstore_lock, flags); list_for_each_entry(pos, allpstore, list) { @@ -316,35 +316,42 @@ int pstore_mkfile(enum pstore_type_id type, char *psname, u64 id, int count, private-count = count; private-psi = psi; memcpy(private-time, time, sizeof(struct timespec)); + timestamp = time.tv_sec; switch (type) { case PSTORE_TYPE_DMESG: - sprintf(name, dmesg-%s-%lld%s, psname, id, - compressed ? .enc.z : ); + sprintf(name, dmesg-%s-%lld-%d-%ld%s, psname, id, count, + timestamp, compressed ? .enc.z : ); break; case PSTORE_TYPE_CONSOLE: - sprintf(name, console-%s, psname); + sprintf(name, console-%s-%d-%ld, psname, count, timestamp); break; case PSTORE_TYPE_FTRACE: - sprintf(name, ftrace-%s, psname); + sprintf(name, ftrace-%s-%d-%ld, psname, count, timestamp); break; case PSTORE_TYPE_MCE: - sprintf(name, mce-%s-%lld, psname, id); + sprintf(name, mce-%s-%lld-%d-%ld, psname, id, count, + timestamp); break; case PSTORE_TYPE_PPC_RTAS: - sprintf(name, rtas-%s-%lld, psname, id); + sprintf(name, rtas-%s-%lld-%d-%ld, psname, id, count, + timestamp); break; case PSTORE_TYPE_PPC_OF: - sprintf(name, powerpc-ofw-%s-%lld, psname, id); + sprintf(name, powerpc-ofw-%s-%lld-%d-%ld, psname, id, count, + timestamp); break; case PSTORE_TYPE_PPC_COMMON: - sprintf(name, powerpc-common-%s-%lld, psname, id); + sprintf(name, powerpc-common-%s-%lld-%d-%ld, psname, id, + count, timestamp); break; case PSTORE_TYPE_UNKNOWN: - sprintf(name, unknown-%s-%lld, psname, id); + sprintf(name, unknown-%s-%lld-%d-%ld, psname, id, count, + timestamp); break; default: - sprintf(name, type%d-%s-%lld, type, psname, id); + sprintf(name, type%d-%s-%lld-%d-%ld, type, psname, id, count, + timestamp); break; } -- 1.8.4.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v3] efivars,efi-pstore: Hold off deletion of sysfs entry until, the scan is completed
On Fri, 18 Oct, at 10:30:58PM, Seiji Aguchi wrote: The scanning and deleting logic is still needed. In case an entry(A) is found, the pointer is saved to psi-data. And efi_pstore_read() passes the entry(A) to a pstore filesystem by releasing __efivars-lock. And then, the pstore filesystem calls efi_pstore_read() again and the same entry(A), which is saved to psi-data, is used for re-scanning a sysfs-list. (That is why list_for_each_entry_safe_from () is used in efi_pstore_sysfs_entry_iter().) So, to protect the entry(A), the logic is needed because, in pstore filesystem, __efivars-lock Is released. Very good point. Thanks, I will add the description in the next patch. Seiji -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4] efivars,efi-pstore: Hold off deletion of sysfs entry until the scan is completed
Change from v3: - Introduce EFI_VARS_DATA_SIZE_MAX macro. - Add some description why a scanning/deleting logic is needed. Currently, when mounting pstore file system, a read callback of efi_pstore driver runs mutiple times as below. - In the first read callback, scan efivar_sysfs_list from head and pass a kmsg buffer of a entry to an upper pstore layer. - In the second read callback, rescan efivar_sysfs_list from the entry and pass another kmsg buffer to it. - Repeat the scan and pass until the end of efivar_sysfs_list. In this process, an entry is read across the multiple read function calls. To avoid race between the read and erasion, the whole process above is protected by a spinlock, holding in open() and releasing in close(). At the same time, kmemdup() is called to pass the buffer to pstore filesystem during it. And then, it causes a following lockdep warning. To make the dynamic memory allocation runnable without taking spinlock, holding off a deletion of sysfs entry if it happens while scanning it via efi_pstore, and deleting it after the scan is completed. To implement it, this patch introduces two flags, scanning and deleting, to efivar_entry. On the code basis, it seems that all the scanning and deleting logic is not needed because __efivars-lock are not dropped when reading from the EFI variable store. But, the scanning and deleting logic is still needed because an efi-pstore and a pstore filesystem works as follows. In case an entry(A) is found, the pointer is saved to psi-data. And efi_pstore_read() passes the entry(A) to a pstore filesystem by releasing __efivars-lock. And then, the pstore filesystem calls efi_pstore_read() again and the same entry(A), which is saved to psi-data, is used for resuming to scan a sysfs-list. So, to protect the entry(A), the logic is needed. [1.143710] [ cut here ] [1.144058] WARNING: CPU: 1 PID: 1 at kernel/lockdep.c:2740 lockdep_trace_alloc+0x104/0x110() [1.144058] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags)) [1.144058] Modules linked in: [1.144058] CPU: 1 PID: 1 Comm: systemd Not tainted 3.11.0-rc5 #2 [1.144058] 0009 8800797e9ae0 816614a5 8800797e9b28 [1.144058] 8800797e9b18 8105510d 0080 0046 [1.144058] 00d0 03af 81ccd0c0 8800797e9b78 [1.144058] Call Trace: [1.144058] [816614a5] dump_stack+0x54/0x74 [1.144058] [8105510d] warn_slowpath_common+0x7d/0xa0 [1.144058] [8105517c] warn_slowpath_fmt+0x4c/0x50 [1.144058] [8131290f] ? vsscanf+0x57f/0x7b0 [1.144058] [810bbd74] lockdep_trace_alloc+0x104/0x110 [1.144058] [81192da0] __kmalloc_track_caller+0x50/0x280 [1.144058] [815147bb] ? efi_pstore_read_func.part.1+0x12b/0x170 [1.144058] [8115b260] kmemdup+0x20/0x50 [1.144058] [815147bb] efi_pstore_read_func.part.1+0x12b/0x170 [1.144058] [81514800] ? efi_pstore_read_func.part.1+0x170/0x170 [1.144058] [815148b4] efi_pstore_read_func+0xb4/0xe0 [1.144058] [81512b7b] __efivar_entry_iter+0xfb/0x120 [1.144058] [8151428f] efi_pstore_read+0x3f/0x50 [1.144058] [8128d7ba] pstore_get_records+0x9a/0x150 [1.158207] [812af25c] ? selinux_d_instantiate+0x1c/0x20 [1.158207] [8128ce30] ? parse_options+0x80/0x80 [1.158207] [8128ced5] pstore_fill_super+0xa5/0xc0 [1.158207] [811ae7d2] mount_single+0xa2/0xd0 [1.158207] [8128ccf8] pstore_mount+0x18/0x20 [1.158207] [811ae8b9] mount_fs+0x39/0x1b0 [1.158207] [81160550] ? __alloc_percpu+0x10/0x20 [1.158207] [811c9493] vfs_kern_mount+0x63/0xf0 [1.158207] [811cbb0e] do_mount+0x23e/0xa20 [1.158207] [8115b51b] ? strndup_user+0x4b/0xf0 [1.158207] [811cc373] SyS_mount+0x83/0xc0 [1.158207] [81673cc2] system_call_fastpath+0x16/0x1b [1.158207] ---[ end trace 61981bc62de9f6f4 ]--- Signed-off-by: Seiji Aguchi seiji.agu...@hds.com --- drivers/firmware/efi/efi-pstore.c | 143 +++--- drivers/firmware/efi/efivars.c| 12 ++-- drivers/firmware/efi/vars.c | 12 +++- include/linux/efi.h | 4 ++ 4 files changed, 155 insertions(+), 16 deletions(-) diff --git a/drivers/firmware/efi/efi-pstore.c b/drivers/firmware/efi/efi-pstore.c index 5002d50..6ce31e9 100644 --- a/drivers/firmware/efi/efi-pstore.c +++ b/drivers/firmware/efi/efi-pstore.c @@ -18,14 +18,12 @@ module_param_named(pstore_disable, efivars_pstore_disable, bool, 0644); static int efi_pstore_open(struct pstore_info *psi) { - efivar_entry_iter_begin(); psi-data = NULL; return 0; } static int efi_pstore_close(struct pstore_info *psi) { - efivar_entry_iter_end(); psi-data = NULL
[PATCH v4 0/4] Introduce page fault tracepoints
Change from v3: - Separate modifications to make review easy. - Refactor implementations registering exception/irq_vector handers. (Patch 1, 2, 3) This series introduce page fault tracepoints. Detailed descriptions are explained in each patch. Any comments are welcome. Seiji Aguchi (4): Move set_intr_gate() into macro Register exception handler to trace IDT Delete __trace_alloc_intr_gate() Add page fault tracepoints arch/x86/include/asm/desc.h | 57 ++--- arch/x86/include/asm/hw_irq.h | 3 ++ arch/x86/include/asm/segment.h | 3 ++ arch/x86/include/asm/trace/exceptions.h | 52 ++ arch/x86/include/asm/traps.h| 20 arch/x86/kernel/entry_32.S | 10 ++ arch/x86/kernel/entry_64.S | 13 +++- arch/x86/kernel/head64.c| 2 +- arch/x86/kernel/kvm.c | 2 +- arch/x86/kernel/traps.c | 28 arch/x86/mm/Makefile| 2 ++ arch/x86/mm/fault.c | 23 + 12 files changed, 165 insertions(+), 50 deletions(-) create mode 100644 arch/x86/include/asm/trace/exceptions.h -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 2/4] Register exception handler to trace IDT
This patch registers exception handlers for tracing to a trace IDT. To implemented it in set_intr_gate(), this patch does followings. - Register the exception handlers to the trace IDT by prepending trace_ to the handler's names. - Also, newly introduce trace_page_fault() to add tracepoints in a subsequent patch. Signed-off-by: Seiji Aguchi seiji.agu...@hds.com --- arch/x86/include/asm/desc.h| 28 +++- arch/x86/include/asm/hw_irq.h | 3 +++ arch/x86/include/asm/segment.h | 3 +++ arch/x86/include/asm/traps.h | 20 arch/x86/kernel/entry_32.S | 10 ++ arch/x86/kernel/entry_64.S | 13 - arch/x86/kernel/head64.c | 2 +- arch/x86/kernel/kvm.c | 2 +- arch/x86/kernel/traps.c| 28 ++-- arch/x86/mm/fault.c| 10 ++ 10 files changed, 97 insertions(+), 22 deletions(-) diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index d939567..3d73437 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -327,10 +327,25 @@ static inline void write_trace_idt_entry(int entry, const gate_desc *gate) { write_idt_entry(trace_idt_table, entry, gate); } + +static inline void _trace_set_gate(int gate, unsigned type, void *addr, + unsigned dpl, unsigned ist, unsigned seg) +{ + gate_desc s; + + pack_gate(s, type, (unsigned long)addr, dpl, ist, seg); + /* +* does not need to be atomic because it is only done once at +* setup time +*/ + write_trace_idt_entry(gate, s); +} #else static inline void write_trace_idt_entry(int entry, const gate_desc *gate) { } + +#define _trace_set_gate(gate, type, addr, dpl, ist, seg) #endif static inline void _set_gate(int gate, unsigned type, void *addr, @@ -353,11 +368,14 @@ static inline void _set_gate(int gate, unsigned type, void *addr, * Pentium F0 0F bugfix can have resulted in the mapped * IDT being write-protected. */ -static inline void set_intr_gate(unsigned int n, void *addr) -{ - BUG_ON((unsigned)n 0xFF); - _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); -} +#define set_intr_gate(n, addr) \ + do {\ + BUG_ON((unsigned)n 0xFF); \ + _set_gate(n, GATE_INTERRUPT, (void *)addr, 0, 0,\ + __KERNEL_CS); \ + _trace_set_gate(n, GATE_INTERRUPT, (void *)trace_##addr,\ + 0, 0, __KERNEL_CS); \ + } while (0) extern int first_system_vector; /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */ diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h index 92b3bae..cba45d9 100644 --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -187,6 +187,9 @@ extern __visible void smp_invalidate_interrupt(struct pt_regs *); #endif extern void (*__initconst interrupt[NR_VECTORS-FIRST_EXTERNAL_VECTOR])(void); +#ifdef CONFIG_TRACING +#define trace_interrupt interrupt +#endif typedef int vector_irq_t[NR_VECTORS]; DECLARE_PER_CPU(vector_irq_t, vector_irq); diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h index c48a950..6f1c3a8 100644 --- a/arch/x86/include/asm/segment.h +++ b/arch/x86/include/asm/segment.h @@ -214,6 +214,9 @@ #ifdef __KERNEL__ #ifndef __ASSEMBLY__ extern const char early_idt_handlers[NUM_EXCEPTION_VECTORS][2+2+5]; +#ifdef CONFIG_TRACING +#define trace_early_idt_handlers early_idt_handlers +#endif /* * Load a segment. Fall back on loading the zero diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h index 7036cb6..58d66fe 100644 --- a/arch/x86/include/asm/traps.h +++ b/arch/x86/include/asm/traps.h @@ -37,6 +37,23 @@ asmlinkage void machine_check(void); #endif /* CONFIG_X86_MCE */ asmlinkage void simd_coprocessor_error(void); +#ifdef CONFIG_TRACING +asmlinkage void trace_page_fault(void); +#define trace_divide_error divide_error +#define trace_bounds bounds +#define trace_invalid_op invalid_op +#define trace_device_not_available device_not_available +#define trace_coprocessor_segment_overrun coprocessor_segment_overrun +#define trace_invalid_TSS invalid_TSS +#define trace_segment_not_present segment_not_present +#define trace_general_protection general_protection +#define trace_spurious_interrupt_bug spurious_interrupt_bug +#define trace_coprocessor_error coprocessor_error +#define trace_alignment_check alignment_check +#define trace_simd_coprocessor_error simd_coprocessor_error +#define trace_async_page_fault async_page_fault +#endif + dotraplinkage void do_divide_error(struct pt_regs *, long); dotraplinkage void do_debug(struct pt_regs *, long); dotraplinkage
[PATCH v4 1/4] Move set_intr_gate() into macro
Move set_intr_gate() into a macro by removing __alloc_intr_gate(). The purpose is to avoid failing a kernel build after applying a subsequent patch which changes set_intr_gate() to macro. Signed-off-by: Seiji Aguchi seiji.agu...@hds.com --- arch/x86/include/asm/desc.h | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index b90e5df..d939567 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -395,15 +395,10 @@ static inline void trace_set_intr_gate(unsigned int gate, void *addr) #define __trace_alloc_intr_gate(n, addr) #endif -static inline void __alloc_intr_gate(unsigned int n, void *addr) -{ - set_intr_gate(n, addr); -} - #define alloc_intr_gate(n, addr) \ do {\ alloc_system_vector(n); \ - __alloc_intr_gate(n, addr); \ + set_intr_gate(n, addr); \ __trace_alloc_intr_gate(n, trace_##addr); \ } while (0) -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 3/4] Delete __trace_alloc_intr_gate()
Currently irq vector handlers for tracing are registered in both set_intr_gate() and __trace_alloc_intr_gate() in alloc_intr_gate(). But, we don't need to do that twice. So, let's delete __trace_alloc_intr_gate(). Signed-off-by: Seiji Aguchi seiji.agu...@hds.com --- arch/x86/include/asm/desc.h | 22 -- 1 file changed, 22 deletions(-) diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index 3d73437..50d033a 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -392,32 +392,10 @@ static inline void alloc_system_vector(int vector) } } -#ifdef CONFIG_TRACING -static inline void trace_set_intr_gate(unsigned int gate, void *addr) -{ - gate_desc s; - - pack_gate(s, GATE_INTERRUPT, (unsigned long)addr, 0, 0, __KERNEL_CS); - write_idt_entry(trace_idt_table, gate, s); -} - -static inline void __trace_alloc_intr_gate(unsigned int n, void *addr) -{ - trace_set_intr_gate(n, addr); -} -#else -static inline void trace_set_intr_gate(unsigned int gate, void *addr) -{ -} - -#define __trace_alloc_intr_gate(n, addr) -#endif - #define alloc_intr_gate(n, addr) \ do {\ alloc_system_vector(n); \ set_intr_gate(n, addr); \ - __trace_alloc_intr_gate(n, trace_##addr); \ } while (0) /* -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4 4/4] Add page fault tracepoints
This patch introduces page fault tracepoints to x86 architecture by switching IDT. Two events, for user and kernel spaces, are introduced at the beginning of page fault handler for tracing. - User space event There is a request of page fault event for user space as below. https://lkml.kernel.org/r/1368079520-11015-2-git-send-email-fdeslaur+()+gmail+!+com https://lkml.kernel.org/r/1368079520-11015-1-git-send-email-fdeslaur+()+gmail+!+com - Kernel space event: When we meature an overhead in kernel space for investigating performance issues, we can check if it comes from the page fault events. Signed-off-by: Seiji Aguchi seiji.agu...@hds.com --- arch/x86/include/asm/trace/exceptions.h | 52 + arch/x86/mm/Makefile| 2 ++ arch/x86/mm/fault.c | 13 + 3 files changed, 67 insertions(+) create mode 100644 arch/x86/include/asm/trace/exceptions.h diff --git a/arch/x86/include/asm/trace/exceptions.h b/arch/x86/include/asm/trace/exceptions.h new file mode 100644 index 000..86540c0 --- /dev/null +++ b/arch/x86/include/asm/trace/exceptions.h @@ -0,0 +1,52 @@ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM exceptions + +#if !defined(_TRACE_PAGE_FAULT_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_PAGE_FAULT_H + +#include linux/tracepoint.h + +extern void trace_irq_vector_regfunc(void); +extern void trace_irq_vector_unregfunc(void); + +DECLARE_EVENT_CLASS(x86_exceptions, + + TP_PROTO(unsigned long address, struct pt_regs *regs, +unsigned long error_code), + + TP_ARGS(address, regs, error_code), + + TP_STRUCT__entry( + __field(unsigned long, address ) + __field(unsigned long, ip ) + __field(unsigned long, error_code ) + ), + + TP_fast_assign( + __entry-address = address; + __entry-ip = regs-ip; + __entry-error_code = error_code; + ), + + TP_printk(address=%pf ip=%pf error_code=0x%lx, + (void *)__entry-address, (void *)__entry-ip, + __entry-error_code) ); + +#define DEFINE_PAGE_FAULT_EVENT(name) \ +DEFINE_EVENT_FN(x86_exceptions, name, \ + TP_PROTO(unsigned long address, struct pt_regs *regs, \ +unsigned long error_code), \ + TP_ARGS(address, regs, error_code), \ + trace_irq_vector_regfunc, \ + trace_irq_vector_unregfunc); + +DEFINE_PAGE_FAULT_EVENT(user_page_fault); +DEFINE_PAGE_FAULT_EVENT(kernel_page_fault); + +#undef TRACE_INCLUDE_PATH +#define TRACE_INCLUDE_PATH . +#define TRACE_INCLUDE_FILE exceptions +#endif /* _TRACE_PAGE_FAULT_H */ + +/* This part must be outside protection */ +#include trace/define_trace.h diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 23d8e5f..6a19ad9 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -6,6 +6,8 @@ nostackp := $(call cc-option, -fno-stack-protector) CFLAGS_physaddr.o := $(nostackp) CFLAGS_setup_nx.o := $(nostackp) +CFLAGS_fault.o := -I$(src)/../include/asm/trace + obj-$(CONFIG_X86_PAT) += pat_rbtree.o obj-$(CONFIG_SMP) += tlb.o diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index fd3e281..f2730cbc 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -20,6 +20,9 @@ #include asm/kmemcheck.h /* kmemcheck_*(), ... */ #include asm/fixmap.h/* VSYSCALL_START */ +#define CREATE_TRACE_POINTS +#include asm/trace/exceptions.h + /* * Page fault error code bits: * @@ -1232,12 +1235,22 @@ do_page_fault(struct pt_regs *regs, unsigned long error_code) exception_exit(prev_state); } +static void trace_page_fault_entries(struct pt_regs *regs, +unsigned long error_code) +{ + if (user_mode(regs)) + trace_user_page_fault(read_cr2(), regs, error_code); + else + trace_kernel_page_fault(read_cr2(), regs, error_code); +} + dotraplinkage void __kprobes trace_do_page_fault(struct pt_regs *regs, unsigned long error_code) { enum ctx_state prev_state; prev_state = exception_enter(); + trace_page_fault_entries(regs, error_code); __do_page_fault(regs, error_code); exception_exit(prev_state); } -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 0/2] make all stored entries accessible.
Ah - I was expecting that the backend driver would have a unique id for each record stored ... but is seems that this isn't true for efivars. So, do you mean efivars should fix to use the id in a proper way? I acked Madper's patch 2/2 earlier today, but when I look at your test result, I'm not sure if it is reasonable for users to make multiple numbers visible to the file name. -r--r--r-- 1 root root 17499 Oct 30 13:41 dmesg-erst-5940651313304961029--2129078373-1383165669 Seiji -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] Differentiating names by adding a timestamp.
To distinguish all events more precisely, "count" is needed to add to the name. Please see the commit below. 755d4fe46529018ae45bc7c86df682de45ace764 Seiji > -Original Message- > From: Madper Xie [mailto:c...@redhat.com] > Sent: Monday, October 28, 2013 9:22 AM > To: Tony Luck; Seiji Aguchi; Anton Vorontsov; Colin Cross; Kees Cook > Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org > Subject: [PATCH] Differentiating names by adding a timestamp. > > > pstore denominate dumped file as type-psname-id. it makes many file have > the same name if there are many entries in backend have the same id. > So adding a timestamp to file name. > > Signed-off-by: Madper Xie > --- > fs/pstore/inode.c | 26 -- > 1 file changed, 16 insertions(+), 10 deletions(-) > > diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c > index 1282384..bf6ed3a 100644 > --- a/fs/pstore/inode.c > +++ b/fs/pstore/inode.c > @@ -315,32 +315,38 @@ int pstore_mkfile(enum pstore_type_id type, char > *psname, u64 id, int count, > > switch (type) { > case PSTORE_TYPE_DMESG: > - sprintf(name, "dmesg-%s-%lld%s", psname, id, > - compressed ? ".enc.z" : ""); > + sprintf(name, "dmesg-%s-%lld-%lld%s", psname, id, > + timespec_to_ns(), compressed ? ".enc.z" : ""); > break; > case PSTORE_TYPE_CONSOLE: > - sprintf(name, "console-%s", psname); > + sprintf(name, "console-%s-%lld", psname, timespec_to_ns()); > break; > case PSTORE_TYPE_FTRACE: > - sprintf(name, "ftrace-%s", psname); > + sprintf(name, "ftrace-%s-%lld", psname, timespec_to_ns()); > break; > case PSTORE_TYPE_MCE: > - sprintf(name, "mce-%s-%lld", psname, id); > + sprintf(name, "mce-%s-%lld-%lld", psname, id, > + timespec_to_ns()); > break; > case PSTORE_TYPE_PPC_RTAS: > - sprintf(name, "rtas-%s-%lld", psname, id); > + sprintf(name, "rtas-%s-%lld-%lld", psname, id, > + timespec_to_ns()); > break; > case PSTORE_TYPE_PPC_OF: > - sprintf(name, "powerpc-ofw-%s-%lld", psname, id); > + sprintf(name, "powerpc-ofw-%s-%lld-%lld", psname, id, > + timespec_to_ns()); > break; > case PSTORE_TYPE_PPC_COMMON: > - sprintf(name, "powerpc-common-%s-%lld", psname, id); > + sprintf(name, "powerpc-common-%s-%lld-%lld", psname, id, > + timespec_to_ns()); > break; > case PSTORE_TYPE_UNKNOWN: > - sprintf(name, "unknown-%s-%lld", psname, id); > + sprintf(name, "unknown-%s-%lld-%lld", psname, id, > + timespec_to_ns()); > break; > default: > - sprintf(name, "type%d-%s-%lld", type, psname, id); > + sprintf(name, "type%d-%s-%lld-%lld", type, psname, id, > + timespec_to_ns()); > break; > } > > -- > 1.8.4.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] Differentiating names by adding a timestamp.
To distinguish all events more precisely, count is needed to add to the name. Please see the commit below. 755d4fe46529018ae45bc7c86df682de45ace764 Seiji -Original Message- From: Madper Xie [mailto:c...@redhat.com] Sent: Monday, October 28, 2013 9:22 AM To: Tony Luck; Seiji Aguchi; Anton Vorontsov; Colin Cross; Kees Cook Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org Subject: [PATCH] Differentiating names by adding a timestamp. pstore denominate dumped file as type-psname-id. it makes many file have the same name if there are many entries in backend have the same id. So adding a timestamp to file name. Signed-off-by: Madper Xie c...@redhat.com --- fs/pstore/inode.c | 26 -- 1 file changed, 16 insertions(+), 10 deletions(-) diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c index 1282384..bf6ed3a 100644 --- a/fs/pstore/inode.c +++ b/fs/pstore/inode.c @@ -315,32 +315,38 @@ int pstore_mkfile(enum pstore_type_id type, char *psname, u64 id, int count, switch (type) { case PSTORE_TYPE_DMESG: - sprintf(name, dmesg-%s-%lld%s, psname, id, - compressed ? .enc.z : ); + sprintf(name, dmesg-%s-%lld-%lld%s, psname, id, + timespec_to_ns(time), compressed ? .enc.z : ); break; case PSTORE_TYPE_CONSOLE: - sprintf(name, console-%s, psname); + sprintf(name, console-%s-%lld, psname, timespec_to_ns(time)); break; case PSTORE_TYPE_FTRACE: - sprintf(name, ftrace-%s, psname); + sprintf(name, ftrace-%s-%lld, psname, timespec_to_ns(time)); break; case PSTORE_TYPE_MCE: - sprintf(name, mce-%s-%lld, psname, id); + sprintf(name, mce-%s-%lld-%lld, psname, id, + timespec_to_ns(time)); break; case PSTORE_TYPE_PPC_RTAS: - sprintf(name, rtas-%s-%lld, psname, id); + sprintf(name, rtas-%s-%lld-%lld, psname, id, + timespec_to_ns(time)); break; case PSTORE_TYPE_PPC_OF: - sprintf(name, powerpc-ofw-%s-%lld, psname, id); + sprintf(name, powerpc-ofw-%s-%lld-%lld, psname, id, + timespec_to_ns(time)); break; case PSTORE_TYPE_PPC_COMMON: - sprintf(name, powerpc-common-%s-%lld, psname, id); + sprintf(name, powerpc-common-%s-%lld-%lld, psname, id, + timespec_to_ns(time)); break; case PSTORE_TYPE_UNKNOWN: - sprintf(name, unknown-%s-%lld, psname, id); + sprintf(name, unknown-%s-%lld-%lld, psname, id, + timespec_to_ns(time)); break; default: - sprintf(name, type%d-%s-%lld, type, psname, id); + sprintf(name, type%d-%s-%lld-%lld, type, psname, id, + timespec_to_ns(time)); break; } -- 1.8.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v3] efivars,efi-pstore: Hold off deletion of sysfs entry until, the scan is completed
Matt, > It seems to me that because you're no longer dropping __efivars->lock > when reading from the EFI variable store, you actually don't need all > the ->scanning and ->deleting logic because anything that sets those > flags runs to completion while holding the lock. The scanning and deleting logic is still needed. In case an entry(A) is found, the pointer is saved to psi->data. And efi_pstore_read() passes the entry(A) to a pstore filesystem by releasing __efivars->lock. And then, the pstore filesystem calls efi_pstore_read() again and the same entry(A), which is saved to psi->data, is used for re-scanning a sysfs-list. (That is why list_for_each_entry_safe_from () is used in efi_pstore_sysfs_entry_iter().) So, to protect the entry(A), the logic is needed because, in pstore filesystem, __efivars->lock Is released. > Can't the patch be simplified to just allocating data.buf at the > beginning of efi_pstore_read()? I think data.buf is simply allocated at the beginning of efi_pstore_read() with this patch v3. If you are not satisfied with it, could you please show me your pseudo code? >Also, it would be a good idea to > introduce a #define for the 1024 magic number, e.g. > > #define EFIVARS_DATA_SIZE_MAX 1024 It is good idea. I can fix it. Seiji +/** + * efi_pstore_read + * + * This function returns a size of NVRAM entry logged via efi_pstore_write(). + * The meaning and behavior of efi_pstore/pstore are as below. + * + * size > 0: Got data of an entry logged via efi_pstore_write() successfully, + * and pstore filesystem will continue reading subsequent entries. + * size == 0: Entry was not logged via efi_pstore_write(), + *and efi_pstore driver will continue reading subsequent entries. + * size < 0: Failed to get data of entry logging via efi_pstore_write(), + * and pstore will stop reading entry. + */ static ssize_t efi_pstore_read(u64 *id, enum pstore_type_id *type, int *count, struct timespec *timespec, char **buf, bool *compressed, struct pstore_info *psi) { struct pstore_read_data data; + ssize_t size; data.id = id; data.type = type; @@ -112,8 +216,17 @@ static ssize_t efi_pstore_read(u64 *id, enum pstore_type_id *type, data.compressed = compressed; data.buf = buf; - return __efivar_entry_iter(efi_pstore_read_func, _sysfs_list, , - (struct efivar_entry **)>data); + *data.buf = kzalloc(1024, GFP_KERNEL); + if (!*data.buf) + return -ENOMEM; + + efivar_entry_iter_begin(); + size = efi_pstore_sysfs_entry_iter(, + (struct efivar_entry **)>data); + efivar_entry_iter_end(); + if (size <= 0) + kfree(*data.buf); + return size; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v3] efivars,efi-pstore: Hold off deletion of sysfs entry until, the scan is completed
Matt, It seems to me that because you're no longer dropping __efivars-lock when reading from the EFI variable store, you actually don't need all the -scanning and -deleting logic because anything that sets those flags runs to completion while holding the lock. The scanning and deleting logic is still needed. In case an entry(A) is found, the pointer is saved to psi-data. And efi_pstore_read() passes the entry(A) to a pstore filesystem by releasing __efivars-lock. And then, the pstore filesystem calls efi_pstore_read() again and the same entry(A), which is saved to psi-data, is used for re-scanning a sysfs-list. (That is why list_for_each_entry_safe_from () is used in efi_pstore_sysfs_entry_iter().) So, to protect the entry(A), the logic is needed because, in pstore filesystem, __efivars-lock Is released. Can't the patch be simplified to just allocating data.buf at the beginning of efi_pstore_read()? I think data.buf is simply allocated at the beginning of efi_pstore_read() with this patch v3. If you are not satisfied with it, could you please show me your pseudo code? Also, it would be a good idea to introduce a #define for the 1024 magic number, e.g. #define EFIVARS_DATA_SIZE_MAX 1024 It is good idea. I can fix it. Seiji snip +/** + * efi_pstore_read + * + * This function returns a size of NVRAM entry logged via efi_pstore_write(). + * The meaning and behavior of efi_pstore/pstore are as below. + * + * size 0: Got data of an entry logged via efi_pstore_write() successfully, + * and pstore filesystem will continue reading subsequent entries. + * size == 0: Entry was not logged via efi_pstore_write(), + *and efi_pstore driver will continue reading subsequent entries. + * size 0: Failed to get data of entry logging via efi_pstore_write(), + * and pstore will stop reading entry. + */ static ssize_t efi_pstore_read(u64 *id, enum pstore_type_id *type, int *count, struct timespec *timespec, char **buf, bool *compressed, struct pstore_info *psi) { struct pstore_read_data data; + ssize_t size; data.id = id; data.type = type; @@ -112,8 +216,17 @@ static ssize_t efi_pstore_read(u64 *id, enum pstore_type_id *type, data.compressed = compressed; data.buf = buf; - return __efivar_entry_iter(efi_pstore_read_func, efivar_sysfs_list, data, - (struct efivar_entry **)psi-data); + *data.buf = kzalloc(1024, GFP_KERNEL); + if (!*data.buf) + return -ENOMEM; + + efivar_entry_iter_begin(); + size = efi_pstore_sysfs_entry_iter(data, + (struct efivar_entry **)psi-data); + efivar_entry_iter_end(); + if (size = 0) + kfree(*data.buf); + return size; } snip -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v3] efivars,efi-pstore: Hold off deletion of sysfs entry until, the scan is completed
Hi Madper, Thank you for assisting me. But, I need to discuss the implementation with Matt more. After the discussion, I will post v4 and ask you to test it. Please wait for a while. Seiji > -Original Message- > From: Madper Xie [mailto:c...@redhat.com] > Sent: Thursday, October 17, 2013 11:07 AM > To: Madper Xie > Cc: Seiji Aguchi; linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; > matt.flem...@intel.com; tony.l...@intel.com; Tomoki > Sekiyama; dle-deve...@lists.sourceforge.net > Subject: Re: [PATCH v3] efivars,efi-pstore: Hold off deletion of sysfs entry > until, the scan is completed > > Hi folks, >I tested it on my DELL XPS desktop. And it won't show any warnings >when I mounting pstore and deleting pstore items after this patch >applied. > >Tested-by: Madper Xie > c...@redhat.com writes: > > > Oops, It seems my mu4e(a email client for emacs)'s auto-indent breaks the > > patch... I apologize for this... > > > > seiji.agu...@hds.com writes: > > > >> Hi Madper, > >> > >> I tested this patch on 3.12-rc4. > >> Could you please send me the log when you failed to apply? > >> > >> Seiji > >> > >>> -Original Message- > >>> From: Madper Xie [mailto:c...@redhat.com] > >>> Sent: Thursday, October 17, 2013 1:54 AM > >>> To: Seiji Aguchi > >>> Cc: linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; > >>> matt.flem...@intel.com; tony.l...@intel.com; Tomoki Sekiyama; > dle- > >>> deve...@lists.sourceforge.net > >>> Subject: Re: [PATCH v3] efivars,efi-pstore: Hold off deletion of sysfs > >>> entry until, the scan is completed > >>> > >>> Howdy Seiji, > >>> I failed appily this patch to both 3.12-rc2 and 3.12-rc4. Could you > >>> please let me know which is the right tree for this patch? > >>> > >>> Thanks, > >>> Madper. > >>> seiji.agu...@hds.com writes: > >>> > >>> > Change from v2: > >>> > - Move dynamic memory allocation to efi_pstore_read() before holding > >>> > efivars->lock to protect entry->var.Data. > >>> > - Access to entry->scanning while holding efivars->lock. > >>> > - Move a comment about a returned value from efi_pstore_read_func() to > >>> > efi_pstore_read() because "size < 0" case may happen in > >>> > efi_pstore_read(). > >>> > > >>> > Currently, when mounting pstore file system, a read callback of > >>> > efi_pstore > >>> > driver runs mutiple times as below. > >>> > > >>> > - In the first read callback, scan efivar_sysfs_list from head and pass > >>> > a kmsg buffer of a entry to an upper pstore layer. > >>> > - In the second read callback, rescan efivar_sysfs_list from the entry > >>> > and pass > >>> > another kmsg buffer to it. > >>> > - Repeat the scan and pass until the end of efivar_sysfs_list. > >>> > > >>> > In this process, an entry is read across the multiple read function > >>> > calls. > >>> > To avoid race between the read and erasion, the whole process above is > >>> > protected by a spinlock, holding in open() and releasing in close(). > >>> > > >>> > At the same time, kmemdup() is called to pass the buffer to pstore > >>> > filesystem > >>> > during it. > >>> > And then, it causes a following lockdep warning. > >>> > > >>> > To make the dynamic memory allocation runnable without taking spinlock, > >>> > holding off a deletion of sysfs entry if it happens while scanning it > >>> > via efi_pstore, and deleting it after the scan is completed. > >>> > > >>> > To implement it, this patch introduces two flags, scanning and deleting, > >>> > to efivar_entry. > >>> > > >>> > [1.143710] [ cut here ] > >>> > [1.144058] WARNING: CPU: 1 PID: 1 at kernel/lockdep.c:2740 > >>> > lockdep_trace_alloc+0x104/0x110() > >>> > [1.144058] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags)) > >>> > [1.144058] Modules linked in: > >>> > > >>> > [1.144058] CPU: 1 PID: 1 Comm: systemd Not tainted 3.11.0-rc5 #2 > >>> > [1.144058] 0009 88
RE: [PATCH v3] efivars,efi-pstore: Hold off deletion of sysfs entry until, the scan is completed
Hi Madper, I tested this patch on 3.12-rc4. Could you please send me the log when you failed to apply? Seiji > -Original Message- > From: Madper Xie [mailto:c...@redhat.com] > Sent: Thursday, October 17, 2013 1:54 AM > To: Seiji Aguchi > Cc: linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; > matt.flem...@intel.com; tony.l...@intel.com; Tomoki Sekiyama; dle- > deve...@lists.sourceforge.net > Subject: Re: [PATCH v3] efivars,efi-pstore: Hold off deletion of sysfs entry > until, the scan is completed > > Howdy Seiji, > I failed appily this patch to both 3.12-rc2 and 3.12-rc4. Could you > please let me know which is the right tree for this patch? > > Thanks, > Madper. > seiji.agu...@hds.com writes: > > > Change from v2: > > - Move dynamic memory allocation to efi_pstore_read() before holding > > efivars->lock to protect entry->var.Data. > > - Access to entry->scanning while holding efivars->lock. > > - Move a comment about a returned value from efi_pstore_read_func() to > > efi_pstore_read() because "size < 0" case may happen in efi_pstore_read(). > > > > Currently, when mounting pstore file system, a read callback of efi_pstore > > driver runs mutiple times as below. > > > > - In the first read callback, scan efivar_sysfs_list from head and pass > > a kmsg buffer of a entry to an upper pstore layer. > > - In the second read callback, rescan efivar_sysfs_list from the entry and > > pass > > another kmsg buffer to it. > > - Repeat the scan and pass until the end of efivar_sysfs_list. > > > > In this process, an entry is read across the multiple read function calls. > > To avoid race between the read and erasion, the whole process above is > > protected by a spinlock, holding in open() and releasing in close(). > > > > At the same time, kmemdup() is called to pass the buffer to pstore > > filesystem > > during it. > > And then, it causes a following lockdep warning. > > > > To make the dynamic memory allocation runnable without taking spinlock, > > holding off a deletion of sysfs entry if it happens while scanning it > > via efi_pstore, and deleting it after the scan is completed. > > > > To implement it, this patch introduces two flags, scanning and deleting, > > to efivar_entry. > > > > [1.143710] [ cut here ] > > [1.144058] WARNING: CPU: 1 PID: 1 at kernel/lockdep.c:2740 > > lockdep_trace_alloc+0x104/0x110() > > [1.144058] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags)) > > [1.144058] Modules linked in: > > > > [1.144058] CPU: 1 PID: 1 Comm: systemd Not tainted 3.11.0-rc5 #2 > > [1.144058] 0009 8800797e9ae0 816614a5 > > 8800797e9b28 > > [1.144058] 8800797e9b18 8105510d 0080 > > 0046 > > [1.144058] 00d0 03af 81ccd0c0 > > 8800797e9b78 > > [1.144058] Call Trace: > > [1.144058] [] dump_stack+0x54/0x74 > > [1.144058] [] warn_slowpath_common+0x7d/0xa0 > > [1.144058] [] warn_slowpath_fmt+0x4c/0x50 > > [1.144058] [] ? vsscanf+0x57f/0x7b0 > > [1.144058] [] lockdep_trace_alloc+0x104/0x110 > > [1.144058] [] __kmalloc_track_caller+0x50/0x280 > > [1.144058] [] ? > > efi_pstore_read_func.part.1+0x12b/0x170 > > [1.144058] [] kmemdup+0x20/0x50 > > [1.144058] [] efi_pstore_read_func.part.1+0x12b/0x170 > > [1.144058] [] ? > > efi_pstore_read_func.part.1+0x170/0x170 > > [1.144058] [] efi_pstore_read_func+0xb4/0xe0 > > [1.144058] [] __efivar_entry_iter+0xfb/0x120 > > [1.144058] [] efi_pstore_read+0x3f/0x50 > > [1.144058] [] pstore_get_records+0x9a/0x150 > > [1.158207] [] ? selinux_d_instantiate+0x1c/0x20 > > [1.158207] [] ? parse_options+0x80/0x80 > > [1.158207] [] pstore_fill_super+0xa5/0xc0 > > [1.158207] [] mount_single+0xa2/0xd0 > > [1.158207] [] pstore_mount+0x18/0x20 > > [1.158207] [] mount_fs+0x39/0x1b0 > > [1.158207] [] ? __alloc_percpu+0x10/0x20 > > [1.158207] [] vfs_kern_mount+0x63/0xf0 > > [1.158207] [] do_mount+0x23e/0xa20 > > [1.158207] [] ? strndup_user+0x4b/0xf0 > > [1.158207] [] SyS_mount+0x83/0xc0 > > [1.158207] [] system_call_fastpath+0x16/0x1b > > [1.158207] ---[ end trace 61981bc62de9f6f4 ]--- > > > > Signed-off-by: Seiji Aguchi > > --- > > drivers/firmware/efi/efi-pstore.c | 143 > >
RE: [PATCH v3] efivars,efi-pstore: Hold off deletion of sysfs entry until, the scan is completed
Hi Madper, I tested this patch on 3.12-rc4. Could you please send me the log when you failed to apply? Seiji -Original Message- From: Madper Xie [mailto:c...@redhat.com] Sent: Thursday, October 17, 2013 1:54 AM To: Seiji Aguchi Cc: linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; matt.flem...@intel.com; tony.l...@intel.com; Tomoki Sekiyama; dle- deve...@lists.sourceforge.net Subject: Re: [PATCH v3] efivars,efi-pstore: Hold off deletion of sysfs entry until, the scan is completed Howdy Seiji, I failed appily this patch to both 3.12-rc2 and 3.12-rc4. Could you please let me know which is the right tree for this patch? Thanks, Madper. seiji.agu...@hds.com writes: Change from v2: - Move dynamic memory allocation to efi_pstore_read() before holding efivars-lock to protect entry-var.Data. - Access to entry-scanning while holding efivars-lock. - Move a comment about a returned value from efi_pstore_read_func() to efi_pstore_read() because size 0 case may happen in efi_pstore_read(). Currently, when mounting pstore file system, a read callback of efi_pstore driver runs mutiple times as below. - In the first read callback, scan efivar_sysfs_list from head and pass a kmsg buffer of a entry to an upper pstore layer. - In the second read callback, rescan efivar_sysfs_list from the entry and pass another kmsg buffer to it. - Repeat the scan and pass until the end of efivar_sysfs_list. In this process, an entry is read across the multiple read function calls. To avoid race between the read and erasion, the whole process above is protected by a spinlock, holding in open() and releasing in close(). At the same time, kmemdup() is called to pass the buffer to pstore filesystem during it. And then, it causes a following lockdep warning. To make the dynamic memory allocation runnable without taking spinlock, holding off a deletion of sysfs entry if it happens while scanning it via efi_pstore, and deleting it after the scan is completed. To implement it, this patch introduces two flags, scanning and deleting, to efivar_entry. [1.143710] [ cut here ] [1.144058] WARNING: CPU: 1 PID: 1 at kernel/lockdep.c:2740 lockdep_trace_alloc+0x104/0x110() [1.144058] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags)) [1.144058] Modules linked in: [1.144058] CPU: 1 PID: 1 Comm: systemd Not tainted 3.11.0-rc5 #2 [1.144058] 0009 8800797e9ae0 816614a5 8800797e9b28 [1.144058] 8800797e9b18 8105510d 0080 0046 [1.144058] 00d0 03af 81ccd0c0 8800797e9b78 [1.144058] Call Trace: [1.144058] [816614a5] dump_stack+0x54/0x74 [1.144058] [8105510d] warn_slowpath_common+0x7d/0xa0 [1.144058] [8105517c] warn_slowpath_fmt+0x4c/0x50 [1.144058] [8131290f] ? vsscanf+0x57f/0x7b0 [1.144058] [810bbd74] lockdep_trace_alloc+0x104/0x110 [1.144058] [81192da0] __kmalloc_track_caller+0x50/0x280 [1.144058] [815147bb] ? efi_pstore_read_func.part.1+0x12b/0x170 [1.144058] [8115b260] kmemdup+0x20/0x50 [1.144058] [815147bb] efi_pstore_read_func.part.1+0x12b/0x170 [1.144058] [81514800] ? efi_pstore_read_func.part.1+0x170/0x170 [1.144058] [815148b4] efi_pstore_read_func+0xb4/0xe0 [1.144058] [81512b7b] __efivar_entry_iter+0xfb/0x120 [1.144058] [8151428f] efi_pstore_read+0x3f/0x50 [1.144058] [8128d7ba] pstore_get_records+0x9a/0x150 [1.158207] [812af25c] ? selinux_d_instantiate+0x1c/0x20 [1.158207] [8128ce30] ? parse_options+0x80/0x80 [1.158207] [8128ced5] pstore_fill_super+0xa5/0xc0 [1.158207] [811ae7d2] mount_single+0xa2/0xd0 [1.158207] [8128ccf8] pstore_mount+0x18/0x20 [1.158207] [811ae8b9] mount_fs+0x39/0x1b0 [1.158207] [81160550] ? __alloc_percpu+0x10/0x20 [1.158207] [811c9493] vfs_kern_mount+0x63/0xf0 [1.158207] [811cbb0e] do_mount+0x23e/0xa20 [1.158207] [8115b51b] ? strndup_user+0x4b/0xf0 [1.158207] [811cc373] SyS_mount+0x83/0xc0 [1.158207] [81673cc2] system_call_fastpath+0x16/0x1b [1.158207] ---[ end trace 61981bc62de9f6f4 ]--- Signed-off-by: Seiji Aguchi seiji.agu...@hds.com --- drivers/firmware/efi/efi-pstore.c | 143 +++--- drivers/firmware/efi/efivars.c| 12 ++-- drivers/firmware/efi/vars.c | 12 +++- include/linux/efi.h | 2 + 4 files changed, 153 insertions(+), 16 deletions(-) diff --git a/drivers/firmware/efi/efi-pstore.c b/drivers/firmware/efi/efi-pstore.c index 5002d50..ebd5dbc
RE: [PATCH v3] efivars,efi-pstore: Hold off deletion of sysfs entry until, the scan is completed
Hi Madper, Thank you for assisting me. But, I need to discuss the implementation with Matt more. After the discussion, I will post v4 and ask you to test it. Please wait for a while. Seiji -Original Message- From: Madper Xie [mailto:c...@redhat.com] Sent: Thursday, October 17, 2013 11:07 AM To: Madper Xie Cc: Seiji Aguchi; linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; matt.flem...@intel.com; tony.l...@intel.com; Tomoki Sekiyama; dle-deve...@lists.sourceforge.net Subject: Re: [PATCH v3] efivars,efi-pstore: Hold off deletion of sysfs entry until, the scan is completed Hi folks, I tested it on my DELL XPS desktop. And it won't show any warnings when I mounting pstore and deleting pstore items after this patch applied. Tested-by: Madper Xie c...@redhat.com writes: Oops, It seems my mu4e(a email client for emacs)'s auto-indent breaks the patch... I apologize for this... seiji.agu...@hds.com writes: Hi Madper, I tested this patch on 3.12-rc4. Could you please send me the log when you failed to apply? Seiji -Original Message- From: Madper Xie [mailto:c...@redhat.com] Sent: Thursday, October 17, 2013 1:54 AM To: Seiji Aguchi Cc: linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; matt.flem...@intel.com; tony.l...@intel.com; Tomoki Sekiyama; dle- deve...@lists.sourceforge.net Subject: Re: [PATCH v3] efivars,efi-pstore: Hold off deletion of sysfs entry until, the scan is completed Howdy Seiji, I failed appily this patch to both 3.12-rc2 and 3.12-rc4. Could you please let me know which is the right tree for this patch? Thanks, Madper. seiji.agu...@hds.com writes: Change from v2: - Move dynamic memory allocation to efi_pstore_read() before holding efivars-lock to protect entry-var.Data. - Access to entry-scanning while holding efivars-lock. - Move a comment about a returned value from efi_pstore_read_func() to efi_pstore_read() because size 0 case may happen in efi_pstore_read(). Currently, when mounting pstore file system, a read callback of efi_pstore driver runs mutiple times as below. - In the first read callback, scan efivar_sysfs_list from head and pass a kmsg buffer of a entry to an upper pstore layer. - In the second read callback, rescan efivar_sysfs_list from the entry and pass another kmsg buffer to it. - Repeat the scan and pass until the end of efivar_sysfs_list. In this process, an entry is read across the multiple read function calls. To avoid race between the read and erasion, the whole process above is protected by a spinlock, holding in open() and releasing in close(). At the same time, kmemdup() is called to pass the buffer to pstore filesystem during it. And then, it causes a following lockdep warning. To make the dynamic memory allocation runnable without taking spinlock, holding off a deletion of sysfs entry if it happens while scanning it via efi_pstore, and deleting it after the scan is completed. To implement it, this patch introduces two flags, scanning and deleting, to efivar_entry. [1.143710] [ cut here ] [1.144058] WARNING: CPU: 1 PID: 1 at kernel/lockdep.c:2740 lockdep_trace_alloc+0x104/0x110() [1.144058] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags)) [1.144058] Modules linked in: [1.144058] CPU: 1 PID: 1 Comm: systemd Not tainted 3.11.0-rc5 #2 [1.144058] 0009 8800797e9ae0 816614a5 8800797e9b28 [1.144058] 8800797e9b18 8105510d 0080 0046 [1.144058] 00d0 03af 81ccd0c0 8800797e9b78 [1.144058] Call Trace: [1.144058] [816614a5] dump_stack+0x54/0x74 [1.144058] [8105510d] warn_slowpath_common+0x7d/0xa0 [1.144058] [8105517c] warn_slowpath_fmt+0x4c/0x50 [1.144058] [8131290f] ? vsscanf+0x57f/0x7b0 [1.144058] [810bbd74] lockdep_trace_alloc+0x104/0x110 [1.144058] [81192da0] __kmalloc_track_caller+0x50/0x280 [1.144058] [815147bb] ? efi_pstore_read_func.part.1+0x12b/0x170 [1.144058] [8115b260] kmemdup+0x20/0x50 [1.144058] [815147bb] efi_pstore_read_func.part.1+0x12b/0x170 [1.144058] [81514800] ? efi_pstore_read_func.part.1+0x170/0x170 [1.144058] [815148b4] efi_pstore_read_func+0xb4/0xe0 [1.144058] [81512b7b] __efivar_entry_iter+0xfb/0x120 [1.144058] [8151428f] efi_pstore_read+0x3f/0x50 [1.144058] [8128d7ba] pstore_get_records+0x9a/0x150 [1.158207] [812af25c] ? selinux_d_instantiate+0x1c/0x20 [1.158207] [8128ce30] ? parse_options+0x80/0x80 [1.158207] [8128ced5
RE: [PATCH 5/5] efi: Capsule update support and pstore backend
> There's also an "either/or" choice between using efi-capsule with pstore, and > the > traditional kexec/kdump method for getting a memory dump from a crash. We > have to go through a reset to save the capsule - but we don't want a reset for > kexec. Perhaps we can pass the reset parameters through the kexec path to > the new kernel to make it do the right kind of reset ... but the value of the > capsule > dump is minimal if kdump managed to dump everything. Tony, I tried to log kmsg into the kexec path months ago. It was rejected due to the impelementation problem. But, as Eric said, it should be OK if it is implemented in the kdump kenel. http://marc.info/?l=linux-kernel=136917686732183=2 The only problem with kdump here is the implementation in the initial ram disk. Fixing the initial ramdisk so it logs to kmsg before it touches scarier hardware should be the solution. Seiji -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 2/5] efi: Introduce a Runtime Services lock
> +#define efi_call_reset_virt(f, args...) > \ > +({ \ > + unsigned long __flags; \ > + bool __nmi = in_nmi(); \ > + \ > + if (__nmi) \ > + spin_lock_irqsave(_runtime_lock, __flags); \ If the lock is not held in the nmi context, runtime service may run concurrently in non-nmi context as follows. - In nmi context, cpu0 calls a runtime service (no lock is held.) - In non-nmi context, cpu1 call can take the lock and call the runtime service. To avoid this, using try_lock in nmi context is better.. If(in_nmi()) try_spin_lock_irqsave(); else spin_lock_irqsave(); Please see the commit abd4d5587be911f63592537284dad78766d97d62, which is introduced to pstore by Don Zickus. Seiji -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v3] Introduce page fault tracepoint
Thank you for reviewing. > > http://marc.info/?l=linux-mm=136807959830182=2 > > http://marc.info/?l=linux-mm=136807959130175=2 > > > > For permanence, please use links of the form: > > http://lkml.kernel.org/r/message-id > > (Yes, they currently point to marc.info, but can be redirected to point > to any archive.) I will fix it. > This is needlessly confusing, which is apart of why reviewing this patch > took a lot more time than it should. > > Please break this patch into two: one which sets up the tracing IDT and > one to create the #PF tracepoint. The assumption is, I am assuming, > there will be more. OK. I will divide the patch into two or more to make the review smooth. Seiji > > -hpa
RE: [PATCH v3] Introduce page fault tracepoint
Thank you for reviewing. http://marc.info/?l=linux-mmm=136807959830182w=2 http://marc.info/?l=linux-mmm=136807959130175w=2 For permanence, please use links of the form: http://lkml.kernel.org/r/message-id (Yes, they currently point to marc.info, but can be redirected to point to any archive.) I will fix it. This is needlessly confusing, which is apart of why reviewing this patch took a lot more time than it should. Please break this patch into two: one which sets up the tracing IDT and one to create the #PF tracepoint. The assumption is, I am assuming, there will be more. OK. I will divide the patch into two or more to make the review smooth. Seiji -hpa
RE: [PATCH 2/5] efi: Introduce a Runtime Services lock
+#define efi_call_reset_virt(f, args...) \ +({ \ + unsigned long __flags; \ + bool __nmi = in_nmi(); \ + \ + if (__nmi) \ + spin_lock_irqsave(efi_runtime_lock, __flags); \ If the lock is not held in the nmi context, runtime service may run concurrently in non-nmi context as follows. - In nmi context, cpu0 calls a runtime service (no lock is held.) - In non-nmi context, cpu1 call can take the lock and call the runtime service. To avoid this, using try_lock in nmi context is better.. If(in_nmi()) try_spin_lock_irqsave(); else spin_lock_irqsave(); Please see the commit abd4d5587be911f63592537284dad78766d97d62, which is introduced to pstore by Don Zickus. Seiji -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 5/5] efi: Capsule update support and pstore backend
There's also an either/or choice between using efi-capsule with pstore, and the traditional kexec/kdump method for getting a memory dump from a crash. We have to go through a reset to save the capsule - but we don't want a reset for kexec. Perhaps we can pass the reset parameters through the kexec path to the new kernel to make it do the right kind of reset ... but the value of the capsule dump is minimal if kdump managed to dump everything. Tony, I tried to log kmsg into the kexec path months ago. It was rejected due to the impelementation problem. But, as Eric said, it should be OK if it is implemented in the kdump kenel. http://marc.info/?l=linux-kernelm=136917686732183w=2 The only problem with kdump here is the implementation in the initial ram disk. Fixing the initial ramdisk so it logs to kmsg before it touches scarier hardware should be the solution. Seiji -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v3] Introduce page fault tracepoint
Peter, Any comment? Seiji > -Original Message- > From: linux-kernel-ow...@vger.kernel.org > [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Seiji Aguchi > Sent: Monday, September 09, 2013 5:56 PM > To: linux-kernel@vger.kernel.org; x...@kernel.org > Cc: h...@zytor.com; rost...@goodmis.org; mi...@elte.hu; b...@alien8.de; > t...@linutronix.de; fdesl...@gmail.com; > raphael.beamo...@gmail.com; dle-deve...@lists.sourceforge.net; Tomoki Sekiyama > Subject: [PATCH v3] Introduce page fault tracepoint > > Change from v2 > - Print entry->ip instead of entry->regs->ip to avoid kernel crash. > - Use %pf instead of 0x%lx to print address and ip. > > This patch introduces page fault tracepoints to x86 architecture > by switching IDT. > > [Use case of page fault events] > > Two events, for user and kernel spaces, are introduced at the beginning of > page fault handler. > > - User space event > There is a request of page fault event for user space as below. > > http://marc.info/?l=linux-mm=136807959830182=2 > http://marc.info/?l=linux-mm=136807959130175=2 > > - Kernel space event: > Overhead in kernel space is measurable by enabling it. > > [Creating IDT] > > A way to create IDT is as below. > > - Introduce set_intr_gate_raw() to register just non-trace handler to IDT. >This is used at boot time which tracing is disabled. > - Make set_intr_gate() macro so that it can register trace handler to >trace IDT and non-trace handler to normal IDT. > > Signed-off-by: Seiji Aguchi > --- > arch/x86/include/asm/desc.h | 33 + > arch/x86/include/asm/hw_irq.h | 14 - > arch/x86/include/asm/trace/exceptions.h | 52 > + > arch/x86/include/asm/traps.h| 22 ++ > arch/x86/kernel/entry_32.S | 10 +++ > arch/x86/kernel/entry_64.S | 13 - > arch/x86/kernel/head64.c| 2 +- > arch/x86/kernel/irqinit.c | 2 +- > arch/x86/kernel/kvm.c | 2 +- > arch/x86/kernel/traps.c | 28 +- > arch/x86/mm/Makefile| 2 ++ > arch/x86/mm/fault.c | 22 ++ > 12 files changed, 178 insertions(+), 24 deletions(-) > create mode 100644 arch/x86/include/asm/trace/exceptions.h > > diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h > index b90e5df..c04302b 100644 > --- a/arch/x86/include/asm/desc.h > +++ b/arch/x86/include/asm/desc.h > @@ -327,10 +327,28 @@ static inline void write_trace_idt_entry(int entry, > const gate_desc *gate) > { > write_idt_entry(trace_idt_table, entry, gate); > } > + > +static inline void _trace_set_gate(int gate, unsigned type, void *addr, > +unsigned dpl, unsigned ist, unsigned seg) > +{ > + gate_desc s; > + > + pack_gate(, type, (unsigned long)addr, dpl, ist, seg); > + /* > + * does not need to be atomic because it is only done once at > + * setup time > + */ > + write_trace_idt_entry(gate, ); > +} > #else > static inline void write_trace_idt_entry(int entry, const gate_desc *gate) > { > } > + > +static inline void _trace_set_gate(int gate, unsigned type, void *addr, > +unsigned dpl, unsigned ist, unsigned seg) > +{ > +} > #endif > > static inline void _set_gate(int gate, unsigned type, void *addr, > @@ -353,12 +371,20 @@ static inline void _set_gate(int gate, unsigned type, > void *addr, > * Pentium F0 0F bugfix can have resulted in the mapped > * IDT being write-protected. > */ > -static inline void set_intr_gate(unsigned int n, void *addr) > +static inline void set_intr_gate_raw(unsigned int n, void *addr) > { > BUG_ON((unsigned)n > 0xFF); > _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); > } > > +#define set_intr_gate(n, addr) > \ > + do {\ > + BUG_ON((unsigned)n > 0xFF); \ > + _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); \ > + _trace_set_gate(n, GATE_INTERRUPT, trace_##addr, 0, 0, \ > + __KERNEL_CS); \ > + } while (0) > + > extern int first_system_vector; > /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */ > extern unsigned long used_vectors[]; > @@ -395,10 +421,7 @@ static inline
RE: [RFC][PATCH v2] efivars,efi-pstore: Hold off deletion of sysfs entry until the scan is completed
Matt, I submitted a v3 patch based on my comment below.. Seiji > -Original Message- > From: linux-efi-ow...@vger.kernel.org > [mailto:linux-efi-ow...@vger.kernel.org] On Behalf Of Seiji Aguchi > Sent: Wednesday, October 09, 2013 12:37 PM > To: Matt Fleming > Cc: linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; > tony.l...@intel.com; matt.flem...@intel.com; dle- > deve...@lists.sourceforge.net; Tomoki Sekiyama > Subject: RE: [RFC][PATCH v2] efivars,efi-pstore: Hold off deletion of sysfs > entry until the scan is completed > > Thank you for reviewing. > In my understanding, your point is that all accesses to efivar_entry should > be done while holding __efivars->lock. > > > > @@ -88,8 +103,9 @@ static int efi_pstore_read_func(struct efivar_entry > > > *entry, void *data) > > > return 0; > > > > > > entry->var.DataSize = 1024; > > > - __efivar_entry_get(entry, >var.Attributes, > > > ->var.DataSize, entry->var.Data); > > > + efivar_entry_get(entry, >var.Attributes, > > > + >var.DataSize, entry->var.Data); > > > + > > > size = entry->var.DataSize; > > > > > > *cb_data->buf = kmemdup(entry->var.Data, size, GFP_KERNEL); > > > > This isn't safe to do without holding the __efivars->lock, because > > there's the potential for someone else to update entry->var.Data and > > entry->var.DataSize while you're in the middle of copying the data in > > kmemdup(). This could leak to an information leak, though I think you're > > safe from an out-of-bounds access because DataSize is never > 1024. > > > > I see... > Bu, kmemdup() cannot be called while holding the spinlock. > > So, for protecting efivar_entry, I will call kzalloc() before holding the > lock in efi_pstore_read(). > and use memcpy() in efi_pstore_read_func(). > > The pseudo code is as below. > > static ssize_t efi_pstore_read(u64 *id, enum pstore_type_id *type, > struct pstore_info *psi) > { > *data.buf = kzalloc(1024, GFP_KERNEL); > if (!*data.buf) > return -ENOMEM; > > efivar_entry_iter_begin(); > size = efi_pstore_sysfs_entry_iter(, > (struct efivar_entry > **)>data); > efivar_entry_iter_end(); > if (size <= 0) > kfree(*data.buf); > return size; > } > > static int efi_pstore_read_func(struct efivar_entry *entry, void *data) > { > [...] > entry->var.DataSize = 1024; > __efivar_entry_get(entry, >var.Attributes, >>var.DataSize, entry->var.Data); > > size = entry->var.DataSize; > memcpy(*cb_data->buf, entry->var.Data, (size_t)min_t(unsigned > long, >1024, > size)); > return size; > } > > > > This doesn't look correct to me. You can't access 'entry' outside of the > > *_iter_begin() and *_iter_end() blocks. You can't do, > > > > efivar_entry_iter_end(): > > > > if (!entry->scanning) > > efivar_unregister(entry); > > > > because 'entry' may have already been freed by another CPU. > > I will fix it as follows. > > if (!entry->scanning) { > efivar_entry_iter_end(); > efivar_unregister(entry); > } else > efivar_entry_iter_end(); > > (efivar_unregister(entry) still runs concurrently. > But, it cannot move inside spinlock because kzalloc() may run while freeing > kobject.) > > Is it your expectation? > > Seiji > > -- > To unsubscribe from this list: send the line "unsubscribe linux-efi" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3] efivars,efi-pstore: Hold off deletion of sysfs entry until, the scan is completed
Change from v2: - Move dynamic memory allocation to efi_pstore_read() before holding efivars->lock to protect entry->var.Data. - Access to entry->scanning while holding efivars->lock. - Move a comment about a returned value from efi_pstore_read_func() to efi_pstore_read() because "size < 0" case may happen in efi_pstore_read(). Currently, when mounting pstore file system, a read callback of efi_pstore driver runs mutiple times as below. - In the first read callback, scan efivar_sysfs_list from head and pass a kmsg buffer of a entry to an upper pstore layer. - In the second read callback, rescan efivar_sysfs_list from the entry and pass another kmsg buffer to it. - Repeat the scan and pass until the end of efivar_sysfs_list. In this process, an entry is read across the multiple read function calls. To avoid race between the read and erasion, the whole process above is protected by a spinlock, holding in open() and releasing in close(). At the same time, kmemdup() is called to pass the buffer to pstore filesystem during it. And then, it causes a following lockdep warning. To make the dynamic memory allocation runnable without taking spinlock, holding off a deletion of sysfs entry if it happens while scanning it via efi_pstore, and deleting it after the scan is completed. To implement it, this patch introduces two flags, scanning and deleting, to efivar_entry. [1.143710] [ cut here ] [1.144058] WARNING: CPU: 1 PID: 1 at kernel/lockdep.c:2740 lockdep_trace_alloc+0x104/0x110() [1.144058] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags)) [1.144058] Modules linked in: [1.144058] CPU: 1 PID: 1 Comm: systemd Not tainted 3.11.0-rc5 #2 [1.144058] 0009 8800797e9ae0 816614a5 8800797e9b28 [1.144058] 8800797e9b18 8105510d 0080 0046 [1.144058] 00d0 03af 81ccd0c0 8800797e9b78 [1.144058] Call Trace: [1.144058] [] dump_stack+0x54/0x74 [1.144058] [] warn_slowpath_common+0x7d/0xa0 [1.144058] [] warn_slowpath_fmt+0x4c/0x50 [1.144058] [] ? vsscanf+0x57f/0x7b0 [1.144058] [] lockdep_trace_alloc+0x104/0x110 [1.144058] [] __kmalloc_track_caller+0x50/0x280 [1.144058] [] ? efi_pstore_read_func.part.1+0x12b/0x170 [1.144058] [] kmemdup+0x20/0x50 [1.144058] [] efi_pstore_read_func.part.1+0x12b/0x170 [1.144058] [] ? efi_pstore_read_func.part.1+0x170/0x170 [1.144058] [] efi_pstore_read_func+0xb4/0xe0 [1.144058] [] __efivar_entry_iter+0xfb/0x120 [1.144058] [] efi_pstore_read+0x3f/0x50 [1.144058] [] pstore_get_records+0x9a/0x150 [1.158207] [] ? selinux_d_instantiate+0x1c/0x20 [1.158207] [] ? parse_options+0x80/0x80 [1.158207] [] pstore_fill_super+0xa5/0xc0 [1.158207] [] mount_single+0xa2/0xd0 [1.158207] [] pstore_mount+0x18/0x20 [1.158207] [] mount_fs+0x39/0x1b0 [1.158207] [] ? __alloc_percpu+0x10/0x20 [1.158207] [] vfs_kern_mount+0x63/0xf0 [1.158207] [] do_mount+0x23e/0xa20 [1.158207] [] ? strndup_user+0x4b/0xf0 [1.158207] [] SyS_mount+0x83/0xc0 [1.158207] [] system_call_fastpath+0x16/0x1b [1.158207] ---[ end trace 61981bc62de9f6f4 ]--- Signed-off-by: Seiji Aguchi --- drivers/firmware/efi/efi-pstore.c | 143 +++--- drivers/firmware/efi/efivars.c| 12 ++-- drivers/firmware/efi/vars.c | 12 +++- include/linux/efi.h | 2 + 4 files changed, 153 insertions(+), 16 deletions(-) diff --git a/drivers/firmware/efi/efi-pstore.c b/drivers/firmware/efi/efi-pstore.c index 5002d50..ebd5dbc 100644 --- a/drivers/firmware/efi/efi-pstore.c +++ b/drivers/firmware/efi/efi-pstore.c @@ -18,14 +18,12 @@ module_param_named(pstore_disable, efivars_pstore_disable, bool, 0644); static int efi_pstore_open(struct pstore_info *psi) { - efivar_entry_iter_begin(); psi->data = NULL; return 0; } static int efi_pstore_close(struct pstore_info *psi) { - efivar_entry_iter_end(); psi->data = NULL; return 0; } @@ -91,19 +89,125 @@ static int efi_pstore_read_func(struct efivar_entry *entry, void *data) __efivar_entry_get(entry, >var.Attributes, >var.DataSize, entry->var.Data); size = entry->var.DataSize; + memcpy(*cb_data->buf, entry->var.Data, (size_t)min_t(unsigned long, +1024, size)); - *cb_data->buf = kmemdup(entry->var.Data, size, GFP_KERNEL); - if (*cb_data->buf == NULL) - return -ENOMEM; return size; } +/** + * efi_pstore_scan_sysfs_enter + * @entry: scanning entry + * @next: next entry + * @head: list head + */ +static void efi_pstore_scan_sysfs_enter(struct efivar_entry *pos, + struct efivar_entry *next
[PATCH v3] efivars,efi-pstore: Hold off deletion of sysfs entry until, the scan is completed
Change from v2: - Move dynamic memory allocation to efi_pstore_read() before holding efivars-lock to protect entry-var.Data. - Access to entry-scanning while holding efivars-lock. - Move a comment about a returned value from efi_pstore_read_func() to efi_pstore_read() because size 0 case may happen in efi_pstore_read(). Currently, when mounting pstore file system, a read callback of efi_pstore driver runs mutiple times as below. - In the first read callback, scan efivar_sysfs_list from head and pass a kmsg buffer of a entry to an upper pstore layer. - In the second read callback, rescan efivar_sysfs_list from the entry and pass another kmsg buffer to it. - Repeat the scan and pass until the end of efivar_sysfs_list. In this process, an entry is read across the multiple read function calls. To avoid race between the read and erasion, the whole process above is protected by a spinlock, holding in open() and releasing in close(). At the same time, kmemdup() is called to pass the buffer to pstore filesystem during it. And then, it causes a following lockdep warning. To make the dynamic memory allocation runnable without taking spinlock, holding off a deletion of sysfs entry if it happens while scanning it via efi_pstore, and deleting it after the scan is completed. To implement it, this patch introduces two flags, scanning and deleting, to efivar_entry. [1.143710] [ cut here ] [1.144058] WARNING: CPU: 1 PID: 1 at kernel/lockdep.c:2740 lockdep_trace_alloc+0x104/0x110() [1.144058] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags)) [1.144058] Modules linked in: [1.144058] CPU: 1 PID: 1 Comm: systemd Not tainted 3.11.0-rc5 #2 [1.144058] 0009 8800797e9ae0 816614a5 8800797e9b28 [1.144058] 8800797e9b18 8105510d 0080 0046 [1.144058] 00d0 03af 81ccd0c0 8800797e9b78 [1.144058] Call Trace: [1.144058] [816614a5] dump_stack+0x54/0x74 [1.144058] [8105510d] warn_slowpath_common+0x7d/0xa0 [1.144058] [8105517c] warn_slowpath_fmt+0x4c/0x50 [1.144058] [8131290f] ? vsscanf+0x57f/0x7b0 [1.144058] [810bbd74] lockdep_trace_alloc+0x104/0x110 [1.144058] [81192da0] __kmalloc_track_caller+0x50/0x280 [1.144058] [815147bb] ? efi_pstore_read_func.part.1+0x12b/0x170 [1.144058] [8115b260] kmemdup+0x20/0x50 [1.144058] [815147bb] efi_pstore_read_func.part.1+0x12b/0x170 [1.144058] [81514800] ? efi_pstore_read_func.part.1+0x170/0x170 [1.144058] [815148b4] efi_pstore_read_func+0xb4/0xe0 [1.144058] [81512b7b] __efivar_entry_iter+0xfb/0x120 [1.144058] [8151428f] efi_pstore_read+0x3f/0x50 [1.144058] [8128d7ba] pstore_get_records+0x9a/0x150 [1.158207] [812af25c] ? selinux_d_instantiate+0x1c/0x20 [1.158207] [8128ce30] ? parse_options+0x80/0x80 [1.158207] [8128ced5] pstore_fill_super+0xa5/0xc0 [1.158207] [811ae7d2] mount_single+0xa2/0xd0 [1.158207] [8128ccf8] pstore_mount+0x18/0x20 [1.158207] [811ae8b9] mount_fs+0x39/0x1b0 [1.158207] [81160550] ? __alloc_percpu+0x10/0x20 [1.158207] [811c9493] vfs_kern_mount+0x63/0xf0 [1.158207] [811cbb0e] do_mount+0x23e/0xa20 [1.158207] [8115b51b] ? strndup_user+0x4b/0xf0 [1.158207] [811cc373] SyS_mount+0x83/0xc0 [1.158207] [81673cc2] system_call_fastpath+0x16/0x1b [1.158207] ---[ end trace 61981bc62de9f6f4 ]--- Signed-off-by: Seiji Aguchi seiji.agu...@hds.com --- drivers/firmware/efi/efi-pstore.c | 143 +++--- drivers/firmware/efi/efivars.c| 12 ++-- drivers/firmware/efi/vars.c | 12 +++- include/linux/efi.h | 2 + 4 files changed, 153 insertions(+), 16 deletions(-) diff --git a/drivers/firmware/efi/efi-pstore.c b/drivers/firmware/efi/efi-pstore.c index 5002d50..ebd5dbc 100644 --- a/drivers/firmware/efi/efi-pstore.c +++ b/drivers/firmware/efi/efi-pstore.c @@ -18,14 +18,12 @@ module_param_named(pstore_disable, efivars_pstore_disable, bool, 0644); static int efi_pstore_open(struct pstore_info *psi) { - efivar_entry_iter_begin(); psi-data = NULL; return 0; } static int efi_pstore_close(struct pstore_info *psi) { - efivar_entry_iter_end(); psi-data = NULL; return 0; } @@ -91,19 +89,125 @@ static int efi_pstore_read_func(struct efivar_entry *entry, void *data) __efivar_entry_get(entry, entry-var.Attributes, entry-var.DataSize, entry-var.Data); size = entry-var.DataSize; + memcpy(*cb_data-buf, entry-var.Data, (size_t)min_t(unsigned long, +1024, size)); - *cb_data-buf = kmemdup
RE: [RFC][PATCH v2] efivars,efi-pstore: Hold off deletion of sysfs entry until the scan is completed
Matt, I submitted a v3 patch based on my comment below.. Seiji -Original Message- From: linux-efi-ow...@vger.kernel.org [mailto:linux-efi-ow...@vger.kernel.org] On Behalf Of Seiji Aguchi Sent: Wednesday, October 09, 2013 12:37 PM To: Matt Fleming Cc: linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; tony.l...@intel.com; matt.flem...@intel.com; dle- deve...@lists.sourceforge.net; Tomoki Sekiyama Subject: RE: [RFC][PATCH v2] efivars,efi-pstore: Hold off deletion of sysfs entry until the scan is completed Thank you for reviewing. In my understanding, your point is that all accesses to efivar_entry should be done while holding __efivars-lock. @@ -88,8 +103,9 @@ static int efi_pstore_read_func(struct efivar_entry *entry, void *data) return 0; entry-var.DataSize = 1024; - __efivar_entry_get(entry, entry-var.Attributes, -entry-var.DataSize, entry-var.Data); + efivar_entry_get(entry, entry-var.Attributes, + entry-var.DataSize, entry-var.Data); + size = entry-var.DataSize; *cb_data-buf = kmemdup(entry-var.Data, size, GFP_KERNEL); This isn't safe to do without holding the __efivars-lock, because there's the potential for someone else to update entry-var.Data and entry-var.DataSize while you're in the middle of copying the data in kmemdup(). This could leak to an information leak, though I think you're safe from an out-of-bounds access because DataSize is never 1024. I see... Bu, kmemdup() cannot be called while holding the spinlock. So, for protecting efivar_entry, I will call kzalloc() before holding the lock in efi_pstore_read(). and use memcpy() in efi_pstore_read_func(). The pseudo code is as below. static ssize_t efi_pstore_read(u64 *id, enum pstore_type_id *type, struct pstore_info *psi) { *data.buf = kzalloc(1024, GFP_KERNEL); if (!*data.buf) return -ENOMEM; efivar_entry_iter_begin(); size = efi_pstore_sysfs_entry_iter(data, (struct efivar_entry **)psi-data); efivar_entry_iter_end(); if (size = 0) kfree(*data.buf); return size; } static int efi_pstore_read_func(struct efivar_entry *entry, void *data) { [...] entry-var.DataSize = 1024; __efivar_entry_get(entry, entry-var.Attributes, entry-var.DataSize, entry-var.Data); size = entry-var.DataSize; memcpy(*cb_data-buf, entry-var.Data, (size_t)min_t(unsigned long, 1024, size)); return size; } This doesn't look correct to me. You can't access 'entry' outside of the *_iter_begin() and *_iter_end() blocks. You can't do, efivar_entry_iter_end(): if (!entry-scanning) efivar_unregister(entry); because 'entry' may have already been freed by another CPU. I will fix it as follows. if (!entry-scanning) { efivar_entry_iter_end(); efivar_unregister(entry); } else efivar_entry_iter_end(); (efivar_unregister(entry) still runs concurrently. But, it cannot move inside spinlock because kzalloc() may run while freeing kobject.) Is it your expectation? Seiji -- To unsubscribe from this list: send the line unsubscribe linux-efi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v3] Introduce page fault tracepoint
Peter, Any comment? Seiji -Original Message- From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Seiji Aguchi Sent: Monday, September 09, 2013 5:56 PM To: linux-kernel@vger.kernel.org; x...@kernel.org Cc: h...@zytor.com; rost...@goodmis.org; mi...@elte.hu; b...@alien8.de; t...@linutronix.de; fdesl...@gmail.com; raphael.beamo...@gmail.com; dle-deve...@lists.sourceforge.net; Tomoki Sekiyama Subject: [PATCH v3] Introduce page fault tracepoint Change from v2 - Print entry-ip instead of entry-regs-ip to avoid kernel crash. - Use %pf instead of 0x%lx to print address and ip. This patch introduces page fault tracepoints to x86 architecture by switching IDT. [Use case of page fault events] Two events, for user and kernel spaces, are introduced at the beginning of page fault handler. - User space event There is a request of page fault event for user space as below. http://marc.info/?l=linux-mmm=136807959830182w=2 http://marc.info/?l=linux-mmm=136807959130175w=2 - Kernel space event: Overhead in kernel space is measurable by enabling it. [Creating IDT] A way to create IDT is as below. - Introduce set_intr_gate_raw() to register just non-trace handler to IDT. This is used at boot time which tracing is disabled. - Make set_intr_gate() macro so that it can register trace handler to trace IDT and non-trace handler to normal IDT. Signed-off-by: Seiji Aguchi seiji.agu...@hds.com --- arch/x86/include/asm/desc.h | 33 + arch/x86/include/asm/hw_irq.h | 14 - arch/x86/include/asm/trace/exceptions.h | 52 + arch/x86/include/asm/traps.h| 22 ++ arch/x86/kernel/entry_32.S | 10 +++ arch/x86/kernel/entry_64.S | 13 - arch/x86/kernel/head64.c| 2 +- arch/x86/kernel/irqinit.c | 2 +- arch/x86/kernel/kvm.c | 2 +- arch/x86/kernel/traps.c | 28 +- arch/x86/mm/Makefile| 2 ++ arch/x86/mm/fault.c | 22 ++ 12 files changed, 178 insertions(+), 24 deletions(-) create mode 100644 arch/x86/include/asm/trace/exceptions.h diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index b90e5df..c04302b 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -327,10 +327,28 @@ static inline void write_trace_idt_entry(int entry, const gate_desc *gate) { write_idt_entry(trace_idt_table, entry, gate); } + +static inline void _trace_set_gate(int gate, unsigned type, void *addr, +unsigned dpl, unsigned ist, unsigned seg) +{ + gate_desc s; + + pack_gate(s, type, (unsigned long)addr, dpl, ist, seg); + /* + * does not need to be atomic because it is only done once at + * setup time + */ + write_trace_idt_entry(gate, s); +} #else static inline void write_trace_idt_entry(int entry, const gate_desc *gate) { } + +static inline void _trace_set_gate(int gate, unsigned type, void *addr, +unsigned dpl, unsigned ist, unsigned seg) +{ +} #endif static inline void _set_gate(int gate, unsigned type, void *addr, @@ -353,12 +371,20 @@ static inline void _set_gate(int gate, unsigned type, void *addr, * Pentium F0 0F bugfix can have resulted in the mapped * IDT being write-protected. */ -static inline void set_intr_gate(unsigned int n, void *addr) +static inline void set_intr_gate_raw(unsigned int n, void *addr) { BUG_ON((unsigned)n 0xFF); _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); } +#define set_intr_gate(n, addr) \ + do {\ + BUG_ON((unsigned)n 0xFF); \ + _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); \ + _trace_set_gate(n, GATE_INTERRUPT, trace_##addr, 0, 0, \ + __KERNEL_CS); \ + } while (0) + extern int first_system_vector; /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */ extern unsigned long used_vectors[]; @@ -395,10 +421,7 @@ static inline void trace_set_intr_gate(unsigned int gate, void *addr) #define __trace_alloc_intr_gate(n, addr) #endif -static inline void __alloc_intr_gate(unsigned int n, void *addr) -{ - set_intr_gate(n, addr); -} +#define __alloc_intr_gate(n, addr) set_intr_gate(n, addr) #define alloc_intr_gate(n, addr) \ do {\ diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
RE: [RFC][PATCH v2] efivars,efi-pstore: Hold off deletion of sysfs entry until the scan is completed
Thank you for reviewing. In my understanding, your point is that all accesses to efivar_entry should be done while holding __efivars->lock. > > @@ -88,8 +103,9 @@ static int efi_pstore_read_func(struct efivar_entry > > *entry, void *data) > > return 0; > > > > entry->var.DataSize = 1024; > > - __efivar_entry_get(entry, >var.Attributes, > > - >var.DataSize, entry->var.Data); > > + efivar_entry_get(entry, >var.Attributes, > > +>var.DataSize, entry->var.Data); > > + > > size = entry->var.DataSize; > > > > *cb_data->buf = kmemdup(entry->var.Data, size, GFP_KERNEL); > > This isn't safe to do without holding the __efivars->lock, because > there's the potential for someone else to update entry->var.Data and > entry->var.DataSize while you're in the middle of copying the data in > kmemdup(). This could leak to an information leak, though I think you're > safe from an out-of-bounds access because DataSize is never > 1024. > I see... Bu, kmemdup() cannot be called while holding the spinlock. So, for protecting efivar_entry, I will call kzalloc() before holding the lock in efi_pstore_read(). and use memcpy() in efi_pstore_read_func(). The pseudo code is as below. static ssize_t efi_pstore_read(u64 *id, enum pstore_type_id *type, struct pstore_info *psi) { *data.buf = kzalloc(1024, GFP_KERNEL); if (!*data.buf) return -ENOMEM; efivar_entry_iter_begin(); size = efi_pstore_sysfs_entry_iter(, (struct efivar_entry **)>data); efivar_entry_iter_end(); if (size <= 0) kfree(*data.buf); return size; } static int efi_pstore_read_func(struct efivar_entry *entry, void *data) { [...] entry->var.DataSize = 1024; __efivar_entry_get(entry, >var.Attributes, >var.DataSize, entry->var.Data); size = entry->var.DataSize; memcpy(*cb_data->buf, entry->var.Data, (size_t)min_t(unsigned long, 1024, size)); return size; } > This doesn't look correct to me. You can't access 'entry' outside of the > *_iter_begin() and *_iter_end() blocks. You can't do, > > efivar_entry_iter_end(): > > if (!entry->scanning) > efivar_unregister(entry); > > because 'entry' may have already been freed by another CPU. I will fix it as follows. if (!entry->scanning) { efivar_entry_iter_end(); efivar_unregister(entry); } else efivar_entry_iter_end(); (efivar_unregister(entry) still runs concurrently. But, it cannot move inside spinlock because kzalloc() may run while freeing kobject.) Is it your expectation? Seiji -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC][PATCH v2] efivars,efi-pstore: Hold off deletion of sysfs entry until the scan is completed
Thank you for reviewing. In my understanding, your point is that all accesses to efivar_entry should be done while holding __efivars-lock. @@ -88,8 +103,9 @@ static int efi_pstore_read_func(struct efivar_entry *entry, void *data) return 0; entry-var.DataSize = 1024; - __efivar_entry_get(entry, entry-var.Attributes, - entry-var.DataSize, entry-var.Data); + efivar_entry_get(entry, entry-var.Attributes, +entry-var.DataSize, entry-var.Data); + size = entry-var.DataSize; *cb_data-buf = kmemdup(entry-var.Data, size, GFP_KERNEL); This isn't safe to do without holding the __efivars-lock, because there's the potential for someone else to update entry-var.Data and entry-var.DataSize while you're in the middle of copying the data in kmemdup(). This could leak to an information leak, though I think you're safe from an out-of-bounds access because DataSize is never 1024. I see... Bu, kmemdup() cannot be called while holding the spinlock. So, for protecting efivar_entry, I will call kzalloc() before holding the lock in efi_pstore_read(). and use memcpy() in efi_pstore_read_func(). The pseudo code is as below. static ssize_t efi_pstore_read(u64 *id, enum pstore_type_id *type, struct pstore_info *psi) { *data.buf = kzalloc(1024, GFP_KERNEL); if (!*data.buf) return -ENOMEM; efivar_entry_iter_begin(); size = efi_pstore_sysfs_entry_iter(data, (struct efivar_entry **)psi-data); efivar_entry_iter_end(); if (size = 0) kfree(*data.buf); return size; } static int efi_pstore_read_func(struct efivar_entry *entry, void *data) { [...] entry-var.DataSize = 1024; __efivar_entry_get(entry, entry-var.Attributes, entry-var.DataSize, entry-var.Data); size = entry-var.DataSize; memcpy(*cb_data-buf, entry-var.Data, (size_t)min_t(unsigned long, 1024, size)); return size; } This doesn't look correct to me. You can't access 'entry' outside of the *_iter_begin() and *_iter_end() blocks. You can't do, efivar_entry_iter_end(): if (!entry-scanning) efivar_unregister(entry); because 'entry' may have already been freed by another CPU. I will fix it as follows. if (!entry-scanning) { efivar_entry_iter_end(); efivar_unregister(entry); } else efivar_entry_iter_end(); (efivar_unregister(entry) still runs concurrently. But, it cannot move inside spinlock because kzalloc() may run while freeing kobject.) Is it your expectation? Seiji -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC][PATCH v2] efivars,efi-pstore: Hold off deletion of sysfs entry until the scan is completed
Are there anyone who can review this bugfix? Seiji > -Original Message- > From: linux-kernel-ow...@vger.kernel.org > [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Seiji Aguchi > Sent: Friday, September 27, 2013 4:24 PM > To: linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; > tony.l...@intel.com; matt.flem...@intel.com > Cc: dle-deve...@lists.sourceforge.net; Tomoki Sekiyama > Subject: [RFC][PATCH v2] efivars,efi-pstore: Hold off deletion of sysfs entry > until the scan is completed > > Change form v1 > - Rebase to 3.12-rc2 > > Currently, when mounting pstore file system, a read callback of efi_pstore > driver runs mutiple times as below. > > - In the first read callback, scan efivar_sysfs_list from head and pass > a kmsg buffer of a entry to an upper pstore layer. > - In the second read callback, rescan efivar_sysfs_list from the entry and > pass > another kmsg buffer to it. > - Repeat the scan and pass until the end of efivar_sysfs_list. > > In this process, an entry is read across the multiple read function calls. > To avoid race between the read and erasion, the whole process above is > protected by a spinlock, holding in open() and releasing in close(). > > At the same time, kmemdup() is called to pass the buffer to pstore filesystem > during it. > And then, it causes a following lockdep warning. > > To make the read callback runnable without taking spinlok, > holding off a deletion of sysfs entry if it happens while scanning it > via efi_pstore, and deleting it after the scan is completed. > > To implement it, this patch introduces two flags, scanning and deleting, > to efivar_entry. > Also, __efivar_entry_get() is removed because it was used in efi_pstore only. > > [1.143710] [ cut here ] > [1.144058] WARNING: CPU: 1 PID: 1 at kernel/lockdep.c:2740 > lockdep_trace_alloc+0x104/0x110() > [1.144058] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags)) > [1.144058] Modules linked in: > > [1.144058] CPU: 1 PID: 1 Comm: systemd Not tainted 3.11.0-rc5 #2 > [1.144058] 0009 8800797e9ae0 816614a5 > 8800797e9b28 > [1.144058] 8800797e9b18 8105510d 0080 > 0046 > [1.144058] 00d0 03af 81ccd0c0 > 8800797e9b78 > [1.144058] Call Trace: > [1.144058] [] dump_stack+0x54/0x74 > [1.144058] [] warn_slowpath_common+0x7d/0xa0 > [1.144058] [] warn_slowpath_fmt+0x4c/0x50 > [1.144058] [] ? vsscanf+0x57f/0x7b0 > [1.144058] [] lockdep_trace_alloc+0x104/0x110 > [1.144058] [] __kmalloc_track_caller+0x50/0x280 > [1.144058] [] ? > efi_pstore_read_func.part.1+0x12b/0x170 > [1.144058] [] kmemdup+0x20/0x50 > [1.144058] [] efi_pstore_read_func.part.1+0x12b/0x170 > [1.144058] [] ? > efi_pstore_read_func.part.1+0x170/0x170 > [1.144058] [] efi_pstore_read_func+0xb4/0xe0 > [1.144058] [] __efivar_entry_iter+0xfb/0x120 > [1.144058] [] efi_pstore_read+0x3f/0x50 > [1.144058] [] pstore_get_records+0x9a/0x150 > [1.158207] [] ? selinux_d_instantiate+0x1c/0x20 > [1.158207] [] ? parse_options+0x80/0x80 > [1.158207] [] pstore_fill_super+0xa5/0xc0 > [1.158207] [] mount_single+0xa2/0xd0 > [1.158207] [] pstore_mount+0x18/0x20 > [1.158207] [] mount_fs+0x39/0x1b0 > [1.158207] [] ? __alloc_percpu+0x10/0x20 > [1.158207] [] vfs_kern_mount+0x63/0xf0 > [1.158207] [] do_mount+0x23e/0xa20 > [1.158207] [] ? strndup_user+0x4b/0xf0 > [1.158207] [] SyS_mount+0x83/0xc0 > [1.158207] [] system_call_fastpath+0x16/0x1b > [1.158207] ---[ end trace 61981bc62de9f6f4 ]--- > > Signed-off-by: Seiji Aguchi > --- > drivers/firmware/efi/efi-pstore.c | 145 > +++--- > drivers/firmware/efi/efivars.c| 3 +- > drivers/firmware/efi/vars.c | 39 +++--- > include/linux/efi.h | 4 +- > 4 files changed, 151 insertions(+), 40 deletions(-) > > diff --git a/drivers/firmware/efi/efi-pstore.c > b/drivers/firmware/efi/efi-pstore.c > index 5002d50..53001a5 100644 > --- a/drivers/firmware/efi/efi-pstore.c > +++ b/drivers/firmware/efi/efi-pstore.c > @@ -18,14 +18,12 @@ module_param_named(pstore_disable, > efivars_pstore_disable, bool, 0644); > > static int efi_pstore_open(struct pstore_info *psi) > { > - efivar_entry_iter_begin(); > psi->data = NULL; > return 0; > } > > static int efi_pstore_close(struct pstore_info *psi) > { > - efivar_entry_iter_end(); > psi->data = NULL; > return 0; > } > @@ -39,6 +37,23
RE: [RFC][PATCH v2] efivars,efi-pstore: Hold off deletion of sysfs entry until the scan is completed
Are there anyone who can review this bugfix? Seiji -Original Message- From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Seiji Aguchi Sent: Friday, September 27, 2013 4:24 PM To: linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; tony.l...@intel.com; matt.flem...@intel.com Cc: dle-deve...@lists.sourceforge.net; Tomoki Sekiyama Subject: [RFC][PATCH v2] efivars,efi-pstore: Hold off deletion of sysfs entry until the scan is completed Change form v1 - Rebase to 3.12-rc2 Currently, when mounting pstore file system, a read callback of efi_pstore driver runs mutiple times as below. - In the first read callback, scan efivar_sysfs_list from head and pass a kmsg buffer of a entry to an upper pstore layer. - In the second read callback, rescan efivar_sysfs_list from the entry and pass another kmsg buffer to it. - Repeat the scan and pass until the end of efivar_sysfs_list. In this process, an entry is read across the multiple read function calls. To avoid race between the read and erasion, the whole process above is protected by a spinlock, holding in open() and releasing in close(). At the same time, kmemdup() is called to pass the buffer to pstore filesystem during it. And then, it causes a following lockdep warning. To make the read callback runnable without taking spinlok, holding off a deletion of sysfs entry if it happens while scanning it via efi_pstore, and deleting it after the scan is completed. To implement it, this patch introduces two flags, scanning and deleting, to efivar_entry. Also, __efivar_entry_get() is removed because it was used in efi_pstore only. [1.143710] [ cut here ] [1.144058] WARNING: CPU: 1 PID: 1 at kernel/lockdep.c:2740 lockdep_trace_alloc+0x104/0x110() [1.144058] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags)) [1.144058] Modules linked in: [1.144058] CPU: 1 PID: 1 Comm: systemd Not tainted 3.11.0-rc5 #2 [1.144058] 0009 8800797e9ae0 816614a5 8800797e9b28 [1.144058] 8800797e9b18 8105510d 0080 0046 [1.144058] 00d0 03af 81ccd0c0 8800797e9b78 [1.144058] Call Trace: [1.144058] [816614a5] dump_stack+0x54/0x74 [1.144058] [8105510d] warn_slowpath_common+0x7d/0xa0 [1.144058] [8105517c] warn_slowpath_fmt+0x4c/0x50 [1.144058] [8131290f] ? vsscanf+0x57f/0x7b0 [1.144058] [810bbd74] lockdep_trace_alloc+0x104/0x110 [1.144058] [81192da0] __kmalloc_track_caller+0x50/0x280 [1.144058] [815147bb] ? efi_pstore_read_func.part.1+0x12b/0x170 [1.144058] [8115b260] kmemdup+0x20/0x50 [1.144058] [815147bb] efi_pstore_read_func.part.1+0x12b/0x170 [1.144058] [81514800] ? efi_pstore_read_func.part.1+0x170/0x170 [1.144058] [815148b4] efi_pstore_read_func+0xb4/0xe0 [1.144058] [81512b7b] __efivar_entry_iter+0xfb/0x120 [1.144058] [8151428f] efi_pstore_read+0x3f/0x50 [1.144058] [8128d7ba] pstore_get_records+0x9a/0x150 [1.158207] [812af25c] ? selinux_d_instantiate+0x1c/0x20 [1.158207] [8128ce30] ? parse_options+0x80/0x80 [1.158207] [8128ced5] pstore_fill_super+0xa5/0xc0 [1.158207] [811ae7d2] mount_single+0xa2/0xd0 [1.158207] [8128ccf8] pstore_mount+0x18/0x20 [1.158207] [811ae8b9] mount_fs+0x39/0x1b0 [1.158207] [81160550] ? __alloc_percpu+0x10/0x20 [1.158207] [811c9493] vfs_kern_mount+0x63/0xf0 [1.158207] [811cbb0e] do_mount+0x23e/0xa20 [1.158207] [8115b51b] ? strndup_user+0x4b/0xf0 [1.158207] [811cc373] SyS_mount+0x83/0xc0 [1.158207] [81673cc2] system_call_fastpath+0x16/0x1b [1.158207] ---[ end trace 61981bc62de9f6f4 ]--- Signed-off-by: Seiji Aguchi seiji.agu...@hds.com --- drivers/firmware/efi/efi-pstore.c | 145 +++--- drivers/firmware/efi/efivars.c| 3 +- drivers/firmware/efi/vars.c | 39 +++--- include/linux/efi.h | 4 +- 4 files changed, 151 insertions(+), 40 deletions(-) diff --git a/drivers/firmware/efi/efi-pstore.c b/drivers/firmware/efi/efi-pstore.c index 5002d50..53001a5 100644 --- a/drivers/firmware/efi/efi-pstore.c +++ b/drivers/firmware/efi/efi-pstore.c @@ -18,14 +18,12 @@ module_param_named(pstore_disable, efivars_pstore_disable, bool, 0644); static int efi_pstore_open(struct pstore_info *psi) { - efivar_entry_iter_begin(); psi-data = NULL; return 0; } static int efi_pstore_close(struct pstore_info *psi) { - efivar_entry_iter_end(); psi-data = NULL; return 0; } @@ -39,6 +37,23 @@ struct pstore_read_data
[RFC][PATCH v2] efivars,efi-pstore: Hold off deletion of sysfs entry until the scan is completed
Change form v1 - Rebase to 3.12-rc2 Currently, when mounting pstore file system, a read callback of efi_pstore driver runs mutiple times as below. - In the first read callback, scan efivar_sysfs_list from head and pass a kmsg buffer of a entry to an upper pstore layer. - In the second read callback, rescan efivar_sysfs_list from the entry and pass another kmsg buffer to it. - Repeat the scan and pass until the end of efivar_sysfs_list. In this process, an entry is read across the multiple read function calls. To avoid race between the read and erasion, the whole process above is protected by a spinlock, holding in open() and releasing in close(). At the same time, kmemdup() is called to pass the buffer to pstore filesystem during it. And then, it causes a following lockdep warning. To make the read callback runnable without taking spinlok, holding off a deletion of sysfs entry if it happens while scanning it via efi_pstore, and deleting it after the scan is completed. To implement it, this patch introduces two flags, scanning and deleting, to efivar_entry. Also, __efivar_entry_get() is removed because it was used in efi_pstore only. [1.143710] [ cut here ] [1.144058] WARNING: CPU: 1 PID: 1 at kernel/lockdep.c:2740 lockdep_trace_alloc+0x104/0x110() [1.144058] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags)) [1.144058] Modules linked in: [1.144058] CPU: 1 PID: 1 Comm: systemd Not tainted 3.11.0-rc5 #2 [1.144058] 0009 8800797e9ae0 816614a5 8800797e9b28 [1.144058] 8800797e9b18 8105510d 0080 0046 [1.144058] 00d0 03af 81ccd0c0 8800797e9b78 [1.144058] Call Trace: [1.144058] [] dump_stack+0x54/0x74 [1.144058] [] warn_slowpath_common+0x7d/0xa0 [1.144058] [] warn_slowpath_fmt+0x4c/0x50 [1.144058] [] ? vsscanf+0x57f/0x7b0 [1.144058] [] lockdep_trace_alloc+0x104/0x110 [1.144058] [] __kmalloc_track_caller+0x50/0x280 [1.144058] [] ? efi_pstore_read_func.part.1+0x12b/0x170 [1.144058] [] kmemdup+0x20/0x50 [1.144058] [] efi_pstore_read_func.part.1+0x12b/0x170 [1.144058] [] ? efi_pstore_read_func.part.1+0x170/0x170 [1.144058] [] efi_pstore_read_func+0xb4/0xe0 [1.144058] [] __efivar_entry_iter+0xfb/0x120 [1.144058] [] efi_pstore_read+0x3f/0x50 [1.144058] [] pstore_get_records+0x9a/0x150 [1.158207] [] ? selinux_d_instantiate+0x1c/0x20 [1.158207] [] ? parse_options+0x80/0x80 [1.158207] [] pstore_fill_super+0xa5/0xc0 [1.158207] [] mount_single+0xa2/0xd0 [1.158207] [] pstore_mount+0x18/0x20 [1.158207] [] mount_fs+0x39/0x1b0 [1.158207] [] ? __alloc_percpu+0x10/0x20 [1.158207] [] vfs_kern_mount+0x63/0xf0 [1.158207] [] do_mount+0x23e/0xa20 [1.158207] [] ? strndup_user+0x4b/0xf0 [1.158207] [] SyS_mount+0x83/0xc0 [1.158207] [] system_call_fastpath+0x16/0x1b [1.158207] ---[ end trace 61981bc62de9f6f4 ]--- Signed-off-by: Seiji Aguchi --- drivers/firmware/efi/efi-pstore.c | 145 +++--- drivers/firmware/efi/efivars.c| 3 +- drivers/firmware/efi/vars.c | 39 +++--- include/linux/efi.h | 4 +- 4 files changed, 151 insertions(+), 40 deletions(-) diff --git a/drivers/firmware/efi/efi-pstore.c b/drivers/firmware/efi/efi-pstore.c index 5002d50..53001a5 100644 --- a/drivers/firmware/efi/efi-pstore.c +++ b/drivers/firmware/efi/efi-pstore.c @@ -18,14 +18,12 @@ module_param_named(pstore_disable, efivars_pstore_disable, bool, 0644); static int efi_pstore_open(struct pstore_info *psi) { - efivar_entry_iter_begin(); psi->data = NULL; return 0; } static int efi_pstore_close(struct pstore_info *psi) { - efivar_entry_iter_end(); psi->data = NULL; return 0; } @@ -39,6 +37,23 @@ struct pstore_read_data { char **buf; }; +/** + * efi_pstore_read_func + * @entry: reading entry + * @data: data of the entry + * + * This function runs in non-atomic context. + * + * Also, it returns a size of NVRAM entry logged via efi_pstore_write(). + * pstore in accordance with the returned value as below. + * + * size > 0: Got data of an entry logged via efi_pstore_write() successfully, + * and pstore filesystem will continue reading subsequent entries. + * size == 0: Entry was not logged via efi_pstore_write(), + *and efi_pstore driver will continue reading subsequent entries. + * size < 0: Failed to get data of entry logging via efi_pstore_write(), + * and pstore will stop reading entry. + */ static int efi_pstore_read_func(struct efivar_entry *entry, void *data) { efi_guid_t vendor = LINUX_EFI_CRASH_GUID; @@ -88,8 +103,9 @@ static int efi_pstore_read_func(struct efivar_entry *entry, void *data) return 0; entry->var.DataSize = 1024; - __efivar_
[RFC][PATCH v2] efivars,efi-pstore: Hold off deletion of sysfs entry until the scan is completed
Change form v1 - Rebase to 3.12-rc2 Currently, when mounting pstore file system, a read callback of efi_pstore driver runs mutiple times as below. - In the first read callback, scan efivar_sysfs_list from head and pass a kmsg buffer of a entry to an upper pstore layer. - In the second read callback, rescan efivar_sysfs_list from the entry and pass another kmsg buffer to it. - Repeat the scan and pass until the end of efivar_sysfs_list. In this process, an entry is read across the multiple read function calls. To avoid race between the read and erasion, the whole process above is protected by a spinlock, holding in open() and releasing in close(). At the same time, kmemdup() is called to pass the buffer to pstore filesystem during it. And then, it causes a following lockdep warning. To make the read callback runnable without taking spinlok, holding off a deletion of sysfs entry if it happens while scanning it via efi_pstore, and deleting it after the scan is completed. To implement it, this patch introduces two flags, scanning and deleting, to efivar_entry. Also, __efivar_entry_get() is removed because it was used in efi_pstore only. [1.143710] [ cut here ] [1.144058] WARNING: CPU: 1 PID: 1 at kernel/lockdep.c:2740 lockdep_trace_alloc+0x104/0x110() [1.144058] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags)) [1.144058] Modules linked in: [1.144058] CPU: 1 PID: 1 Comm: systemd Not tainted 3.11.0-rc5 #2 [1.144058] 0009 8800797e9ae0 816614a5 8800797e9b28 [1.144058] 8800797e9b18 8105510d 0080 0046 [1.144058] 00d0 03af 81ccd0c0 8800797e9b78 [1.144058] Call Trace: [1.144058] [816614a5] dump_stack+0x54/0x74 [1.144058] [8105510d] warn_slowpath_common+0x7d/0xa0 [1.144058] [8105517c] warn_slowpath_fmt+0x4c/0x50 [1.144058] [8131290f] ? vsscanf+0x57f/0x7b0 [1.144058] [810bbd74] lockdep_trace_alloc+0x104/0x110 [1.144058] [81192da0] __kmalloc_track_caller+0x50/0x280 [1.144058] [815147bb] ? efi_pstore_read_func.part.1+0x12b/0x170 [1.144058] [8115b260] kmemdup+0x20/0x50 [1.144058] [815147bb] efi_pstore_read_func.part.1+0x12b/0x170 [1.144058] [81514800] ? efi_pstore_read_func.part.1+0x170/0x170 [1.144058] [815148b4] efi_pstore_read_func+0xb4/0xe0 [1.144058] [81512b7b] __efivar_entry_iter+0xfb/0x120 [1.144058] [8151428f] efi_pstore_read+0x3f/0x50 [1.144058] [8128d7ba] pstore_get_records+0x9a/0x150 [1.158207] [812af25c] ? selinux_d_instantiate+0x1c/0x20 [1.158207] [8128ce30] ? parse_options+0x80/0x80 [1.158207] [8128ced5] pstore_fill_super+0xa5/0xc0 [1.158207] [811ae7d2] mount_single+0xa2/0xd0 [1.158207] [8128ccf8] pstore_mount+0x18/0x20 [1.158207] [811ae8b9] mount_fs+0x39/0x1b0 [1.158207] [81160550] ? __alloc_percpu+0x10/0x20 [1.158207] [811c9493] vfs_kern_mount+0x63/0xf0 [1.158207] [811cbb0e] do_mount+0x23e/0xa20 [1.158207] [8115b51b] ? strndup_user+0x4b/0xf0 [1.158207] [811cc373] SyS_mount+0x83/0xc0 [1.158207] [81673cc2] system_call_fastpath+0x16/0x1b [1.158207] ---[ end trace 61981bc62de9f6f4 ]--- Signed-off-by: Seiji Aguchi seiji.agu...@hds.com --- drivers/firmware/efi/efi-pstore.c | 145 +++--- drivers/firmware/efi/efivars.c| 3 +- drivers/firmware/efi/vars.c | 39 +++--- include/linux/efi.h | 4 +- 4 files changed, 151 insertions(+), 40 deletions(-) diff --git a/drivers/firmware/efi/efi-pstore.c b/drivers/firmware/efi/efi-pstore.c index 5002d50..53001a5 100644 --- a/drivers/firmware/efi/efi-pstore.c +++ b/drivers/firmware/efi/efi-pstore.c @@ -18,14 +18,12 @@ module_param_named(pstore_disable, efivars_pstore_disable, bool, 0644); static int efi_pstore_open(struct pstore_info *psi) { - efivar_entry_iter_begin(); psi-data = NULL; return 0; } static int efi_pstore_close(struct pstore_info *psi) { - efivar_entry_iter_end(); psi-data = NULL; return 0; } @@ -39,6 +37,23 @@ struct pstore_read_data { char **buf; }; +/** + * efi_pstore_read_func + * @entry: reading entry + * @data: data of the entry + * + * This function runs in non-atomic context. + * + * Also, it returns a size of NVRAM entry logged via efi_pstore_write(). + * pstore in accordance with the returned value as below. + * + * size 0: Got data of an entry logged via efi_pstore_write() successfully, + * and pstore filesystem will continue reading subsequent entries. + * size == 0: Entry was not logged via efi_pstore_write(), + *and efi_pstore driver will continue reading subsequent entries. + * size 0
RE: [PATCH v2] pstore: Adjust buffer size for compression for smaller registered buffers
efivars works fine with this v2 patch. Tested-by: Seiji Aguchi > -Original Message- > From: Aruna Balakrishnaiah [mailto:ar...@linux.vnet.ibm.com] > Sent: Thursday, September 12, 2013 2:51 AM > To: linuxppc-...@ozlabs.org; tony.l...@intel.com; Seiji Aguchi; > linux-kernel@vger.kernel.org; keesc...@chromium.org > Cc: jkeni...@linux.vnet.ibm.com; mah...@linux.vnet.ibm.com; > cbouatmai...@gmail.com; ana...@in.ibm.com; ccr...@android.com > Subject: [PATCH v2] pstore: Adjust buffer size for compression for smaller > registered buffers > > When backends (ex: efivars) have smaller registered buffers, the big_oops_buf > is quite too big for them as number of repeated occurences in the text > captured > will be less. Patch takes care of adjusting the buffer size based on the > registered buffer size. cmpr values has been arrived after doing experiments > with > plain text for buffers of size 1k - 4k (Smaller the buffer size repeated > occurence > will be less) and with sample crash log for buffers ranging from 4k - 10k. > > Reported-by: Seiji Aguchi > Signed-off-by: Aruna Balakrishnaiah > --- > Changes from v1: > Retain the cmpr = 45 for buffers ranging of size 4k - 10k. 45 seems to > work. > I added an additional headroom of 3%. Revert it back to 45. > > fs/pstore/platform.c | 23 ++- > 1 file changed, 22 insertions(+), 1 deletion(-) > > diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c > index 4ffb7ab..57b4219 100644 > --- a/fs/pstore/platform.c > +++ b/fs/pstore/platform.c > @@ -195,8 +195,29 @@ error: > static void allocate_buf_for_compression(void) > { > size_t size; > + size_t cmpr; > + > + switch (psinfo->bufsize) { > + /* buffer range for efivars */ > + case 1000 ... 2000: > + cmpr = 56; > + break; > + case 2001 ... 3000: > + cmpr = 54; > + break; > + case 3001 ... 3999: > + cmpr = 52; > + break; > + /* buffer range for nvram, erst */ > + case 4000 ... 1: > + cmpr = 45; > + break; > + default: > + cmpr = 60; > + break; > + } > > - big_oops_buf_sz = (psinfo->bufsize * 100) / 45; > + big_oops_buf_sz = (psinfo->bufsize * 100) / cmpr; > big_oops_buf = kmalloc(big_oops_buf_sz, GFP_KERNEL); > if (big_oops_buf) { > size = max(zlib_deflate_workspacesize(WINDOW_BITS, MEM_LEVEL), N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
RE: [PATCH v2] pstore: Adjust buffer size for compression for smaller registered buffers
efivars works fine with this v2 patch. Tested-by: Seiji Aguchi seiji.agu...@hds.com -Original Message- From: Aruna Balakrishnaiah [mailto:ar...@linux.vnet.ibm.com] Sent: Thursday, September 12, 2013 2:51 AM To: linuxppc-...@ozlabs.org; tony.l...@intel.com; Seiji Aguchi; linux-kernel@vger.kernel.org; keesc...@chromium.org Cc: jkeni...@linux.vnet.ibm.com; mah...@linux.vnet.ibm.com; cbouatmai...@gmail.com; ana...@in.ibm.com; ccr...@android.com Subject: [PATCH v2] pstore: Adjust buffer size for compression for smaller registered buffers When backends (ex: efivars) have smaller registered buffers, the big_oops_buf is quite too big for them as number of repeated occurences in the text captured will be less. Patch takes care of adjusting the buffer size based on the registered buffer size. cmpr values has been arrived after doing experiments with plain text for buffers of size 1k - 4k (Smaller the buffer size repeated occurence will be less) and with sample crash log for buffers ranging from 4k - 10k. Reported-by: Seiji Aguchi seiji.agu...@hds.com Signed-off-by: Aruna Balakrishnaiah ar...@linux.vnet.ibm.com --- Changes from v1: Retain the cmpr = 45 for buffers ranging of size 4k - 10k. 45 seems to work. I added an additional headroom of 3%. Revert it back to 45. fs/pstore/platform.c | 23 ++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c index 4ffb7ab..57b4219 100644 --- a/fs/pstore/platform.c +++ b/fs/pstore/platform.c @@ -195,8 +195,29 @@ error: static void allocate_buf_for_compression(void) { size_t size; + size_t cmpr; + + switch (psinfo-bufsize) { + /* buffer range for efivars */ + case 1000 ... 2000: + cmpr = 56; + break; + case 2001 ... 3000: + cmpr = 54; + break; + case 3001 ... 3999: + cmpr = 52; + break; + /* buffer range for nvram, erst */ + case 4000 ... 1: + cmpr = 45; + break; + default: + cmpr = 60; + break; + } - big_oops_buf_sz = (psinfo-bufsize * 100) / 45; + big_oops_buf_sz = (psinfo-bufsize * 100) / cmpr; big_oops_buf = kmalloc(big_oops_buf_sz, GFP_KERNEL); if (big_oops_buf) { size = max(zlib_deflate_workspacesize(WINDOW_BITS, MEM_LEVEL), N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i
RE: [PATCH 1/3] pstore: Adjust buffer size for compression for smaller registered buffers
>+ /* buffer range for efivars */ >+ case 1000 ... 2000: >+ cmpr = 56; >+ break; > Seiji: let me know how the efivars tests go. efivars works fine. Uncompressed size about 1800 bytes. It matches the value of cmpr, 56. Please feel free to add my "Tested-by" to all three patches. Tested-by: Seiji Aguchi N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
RE: [PATCH 1/3] pstore: Adjust buffer size for compression for smaller registered buffers
+ /* buffer range for efivars */ + case 1000 ... 2000: + cmpr = 56; + break; Seiji: let me know how the efivars tests go. efivars works fine. Uncompressed size about 1800 bytes. It matches the value of cmpr, 56. Please feel free to add my Tested-by to all three patches. Tested-by: Seiji Aguchi seiji.agu...@hds.com N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i
[PATCH v3] Introduce page fault tracepoint
Change from v2 - Print entry->ip instead of entry->regs->ip to avoid kernel crash. - Use %pf instead of 0x%lx to print address and ip. This patch introduces page fault tracepoints to x86 architecture by switching IDT. [Use case of page fault events] Two events, for user and kernel spaces, are introduced at the beginning of page fault handler. - User space event There is a request of page fault event for user space as below. http://marc.info/?l=linux-mm=136807959830182=2 http://marc.info/?l=linux-mm=136807959130175=2 - Kernel space event: Overhead in kernel space is measurable by enabling it. [Creating IDT] A way to create IDT is as below. - Introduce set_intr_gate_raw() to register just non-trace handler to IDT. This is used at boot time which tracing is disabled. - Make set_intr_gate() macro so that it can register trace handler to trace IDT and non-trace handler to normal IDT. Signed-off-by: Seiji Aguchi --- arch/x86/include/asm/desc.h | 33 + arch/x86/include/asm/hw_irq.h | 14 - arch/x86/include/asm/trace/exceptions.h | 52 + arch/x86/include/asm/traps.h| 22 ++ arch/x86/kernel/entry_32.S | 10 +++ arch/x86/kernel/entry_64.S | 13 - arch/x86/kernel/head64.c| 2 +- arch/x86/kernel/irqinit.c | 2 +- arch/x86/kernel/kvm.c | 2 +- arch/x86/kernel/traps.c | 28 +- arch/x86/mm/Makefile| 2 ++ arch/x86/mm/fault.c | 22 ++ 12 files changed, 178 insertions(+), 24 deletions(-) create mode 100644 arch/x86/include/asm/trace/exceptions.h diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index b90e5df..c04302b 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -327,10 +327,28 @@ static inline void write_trace_idt_entry(int entry, const gate_desc *gate) { write_idt_entry(trace_idt_table, entry, gate); } + +static inline void _trace_set_gate(int gate, unsigned type, void *addr, + unsigned dpl, unsigned ist, unsigned seg) +{ + gate_desc s; + + pack_gate(, type, (unsigned long)addr, dpl, ist, seg); + /* +* does not need to be atomic because it is only done once at +* setup time +*/ + write_trace_idt_entry(gate, ); +} #else static inline void write_trace_idt_entry(int entry, const gate_desc *gate) { } + +static inline void _trace_set_gate(int gate, unsigned type, void *addr, + unsigned dpl, unsigned ist, unsigned seg) +{ +} #endif static inline void _set_gate(int gate, unsigned type, void *addr, @@ -353,12 +371,20 @@ static inline void _set_gate(int gate, unsigned type, void *addr, * Pentium F0 0F bugfix can have resulted in the mapped * IDT being write-protected. */ -static inline void set_intr_gate(unsigned int n, void *addr) +static inline void set_intr_gate_raw(unsigned int n, void *addr) { BUG_ON((unsigned)n > 0xFF); _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); } +#define set_intr_gate(n, addr) \ + do {\ + BUG_ON((unsigned)n > 0xFF); \ + _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); \ + _trace_set_gate(n, GATE_INTERRUPT, trace_##addr, 0, 0, \ + __KERNEL_CS); \ + } while (0) + extern int first_system_vector; /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */ extern unsigned long used_vectors[]; @@ -395,10 +421,7 @@ static inline void trace_set_intr_gate(unsigned int gate, void *addr) #define __trace_alloc_intr_gate(n, addr) #endif -static inline void __alloc_intr_gate(unsigned int n, void *addr) -{ - set_intr_gate(n, addr); -} +#define __alloc_intr_gate(n, addr) set_intr_gate(n, addr) #define alloc_intr_gate(n, addr) \ do {\ diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h index 92b3bae..c856e69 100644 --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -89,10 +89,22 @@ extern void trace_reschedule_interrupt(void); extern void trace_threshold_interrupt(void); extern void trace_call_function_interrupt(void); extern void trace_call_function_single_interrupt(void); +#else /* CONFIG_TRACING */ +#define trace_apic_timer_interrupt apic_timer_interrupt +#define trace_x86_platform_ipi x86_platform_ipi +#define trace_error_interrupt error_interrupt +#define trace_irq_work_interrupt irq_work_interrupt +#define trace_spurious_interrupt spurious_int
[PATCH v3] Introduce page fault tracepoint
Change from v2 - Print entry-ip instead of entry-regs-ip to avoid kernel crash. - Use %pf instead of 0x%lx to print address and ip. This patch introduces page fault tracepoints to x86 architecture by switching IDT. [Use case of page fault events] Two events, for user and kernel spaces, are introduced at the beginning of page fault handler. - User space event There is a request of page fault event for user space as below. http://marc.info/?l=linux-mmm=136807959830182w=2 http://marc.info/?l=linux-mmm=136807959130175w=2 - Kernel space event: Overhead in kernel space is measurable by enabling it. [Creating IDT] A way to create IDT is as below. - Introduce set_intr_gate_raw() to register just non-trace handler to IDT. This is used at boot time which tracing is disabled. - Make set_intr_gate() macro so that it can register trace handler to trace IDT and non-trace handler to normal IDT. Signed-off-by: Seiji Aguchi seiji.agu...@hds.com --- arch/x86/include/asm/desc.h | 33 + arch/x86/include/asm/hw_irq.h | 14 - arch/x86/include/asm/trace/exceptions.h | 52 + arch/x86/include/asm/traps.h| 22 ++ arch/x86/kernel/entry_32.S | 10 +++ arch/x86/kernel/entry_64.S | 13 - arch/x86/kernel/head64.c| 2 +- arch/x86/kernel/irqinit.c | 2 +- arch/x86/kernel/kvm.c | 2 +- arch/x86/kernel/traps.c | 28 +- arch/x86/mm/Makefile| 2 ++ arch/x86/mm/fault.c | 22 ++ 12 files changed, 178 insertions(+), 24 deletions(-) create mode 100644 arch/x86/include/asm/trace/exceptions.h diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index b90e5df..c04302b 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -327,10 +327,28 @@ static inline void write_trace_idt_entry(int entry, const gate_desc *gate) { write_idt_entry(trace_idt_table, entry, gate); } + +static inline void _trace_set_gate(int gate, unsigned type, void *addr, + unsigned dpl, unsigned ist, unsigned seg) +{ + gate_desc s; + + pack_gate(s, type, (unsigned long)addr, dpl, ist, seg); + /* +* does not need to be atomic because it is only done once at +* setup time +*/ + write_trace_idt_entry(gate, s); +} #else static inline void write_trace_idt_entry(int entry, const gate_desc *gate) { } + +static inline void _trace_set_gate(int gate, unsigned type, void *addr, + unsigned dpl, unsigned ist, unsigned seg) +{ +} #endif static inline void _set_gate(int gate, unsigned type, void *addr, @@ -353,12 +371,20 @@ static inline void _set_gate(int gate, unsigned type, void *addr, * Pentium F0 0F bugfix can have resulted in the mapped * IDT being write-protected. */ -static inline void set_intr_gate(unsigned int n, void *addr) +static inline void set_intr_gate_raw(unsigned int n, void *addr) { BUG_ON((unsigned)n 0xFF); _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); } +#define set_intr_gate(n, addr) \ + do {\ + BUG_ON((unsigned)n 0xFF); \ + _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); \ + _trace_set_gate(n, GATE_INTERRUPT, trace_##addr, 0, 0, \ + __KERNEL_CS); \ + } while (0) + extern int first_system_vector; /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */ extern unsigned long used_vectors[]; @@ -395,10 +421,7 @@ static inline void trace_set_intr_gate(unsigned int gate, void *addr) #define __trace_alloc_intr_gate(n, addr) #endif -static inline void __alloc_intr_gate(unsigned int n, void *addr) -{ - set_intr_gate(n, addr); -} +#define __alloc_intr_gate(n, addr) set_intr_gate(n, addr) #define alloc_intr_gate(n, addr) \ do {\ diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h index 92b3bae..c856e69 100644 --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -89,10 +89,22 @@ extern void trace_reschedule_interrupt(void); extern void trace_threshold_interrupt(void); extern void trace_call_function_interrupt(void); extern void trace_call_function_single_interrupt(void); +#else /* CONFIG_TRACING */ +#define trace_apic_timer_interrupt apic_timer_interrupt +#define trace_x86_platform_ipi x86_platform_ipi +#define trace_error_interrupt error_interrupt +#define trace_irq_work_interrupt irq_work_interrupt +#define trace_spurious_interrupt
RE: [PATCH v2] Introduce page fault tracepoint
> -Original Message- > From: Steven Rostedt [mailto:rost...@goodmis.org] > Sent: Friday, September 06, 2013 12:50 PM > To: Seiji Aguchi > Cc: linux-kernel@vger.kernel.org; x...@kernel.org; h...@zytor.com; > mi...@elte.hu; b...@alien8.de; t...@linutronix.de; > fdesl...@gmail.com; raphael.beamo...@gmail.com; > dle-deve...@lists.sourceforge.net; Tomoki Sekiyama > Subject: Re: [PATCH v2] Introduce page fault tracepoint > > On Fri, 23 Aug 2013 11:37:43 -0400 > Seiji Aguchi wrote: > > > > +#if !defined(_TRACE_PAGE_FAULT_H) || defined(TRACE_HEADER_MULTI_READ) > > +#define _TRACE_PAGE_FAULT_H > > + > > +#include > > + > > +extern void trace_irq_vector_regfunc(void); > > +extern void trace_irq_vector_unregfunc(void); > > + > > +DECLARE_EVENT_CLASS(x86_exceptions, > > + > > + TP_PROTO(unsigned long address, struct pt_regs *regs, > > +unsigned long error_code), > > + > > + TP_ARGS(address, regs, error_code), > > + > > + TP_STRUCT__entry( > > + __field(unsigned long, address ) > > + __field(struct pt_regs *, regs ) > > + __field(unsigned long, error_code ) > > + ), > > + > > + TP_fast_assign( > > + __entry->address = address; > > + __entry->regs = regs; > > + __entry->error_code = error_code; > > + ), > > + > > + TP_printk("address=0x%lx ip=0x%lx error_code=0x%lx", > > + __entry->address, __entry->regs->ip, __entry->error_code) ); > > This is sure to crash the kernel. > > You just saved the address of a pointer to some task's stack in the > ring buffer. And then on output (which can happen a long time from when > it was recorded), you are dereferencing that same address! > > That __entry->regs->ip *will* crash the kernel! > > What you want is to save ip in the fast_assign: > > __entry->ip = regs->ip > > And then print that. Never dereference a pointer directly from the ring > buffer unless it's a constant value (like a global string). I see.. Thank you for reviewing. I will fix it. Seiji -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v2] Introduce page fault tracepoint
Any comment? > -Original Message- > From: linux-kernel-ow...@vger.kernel.org > [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Seiji Aguchi > Sent: Friday, August 23, 2013 11:38 AM > To: linux-kernel@vger.kernel.org; x...@kernel.org > Cc: h...@zytor.com; rost...@goodmis.org; mi...@elte.hu; b...@alien8.de; > t...@linutronix.de; fdesl...@gmail.com; > raphael.beamo...@gmail.com; dle-deve...@lists.sourceforge.net; Tomoki Sekiyama > Subject: [PATCH v2] Introduce page fault tracepoint > > Change from v1 > - Print regs->ip instead of regs to know where the page fault event happened >(Suggested by Steven Rostedt) > > > This patch introduces page fault tracepoints to x86 architecture > by switching IDT. > > [Use case of page fault events] > > Two events, for user and kernel spaces, are introduced at the beginning of > page fault handler. > > - User space event > There is a request of page fault event for user space as below. > > http://marc.info/?l=linux-mm=136807959830182=2 > http://marc.info/?l=linux-mm=136807959130175=2 > > - Kernel space event: > Overhead in kernel space is measurable by enabling it. > > [Creating IDT] > > A way to create IDT is as below. > > - Introduce set_intr_gate_raw() to register just non-trace handler to IDT. >This is used at boot time which tracing is disabled. > - Make set_intr_gate() macro so that it can register trace handler to >trace IDT and non-trace handler to normal IDT. > > Signed-off-by: Seiji Aguchi > --- > arch/x86/include/asm/desc.h | 33 + > arch/x86/include/asm/hw_irq.h | 14 - > arch/x86/include/asm/trace/exceptions.h | 51 > + > arch/x86/include/asm/traps.h| 22 ++ > arch/x86/kernel/entry_32.S | 10 +++ > arch/x86/kernel/entry_64.S | 13 - > arch/x86/kernel/head64.c| 2 +- > arch/x86/kernel/irqinit.c | 2 +- > arch/x86/kernel/kvm.c | 2 +- > arch/x86/kernel/traps.c | 28 +- > arch/x86/mm/Makefile| 2 ++ > arch/x86/mm/fault.c | 22 ++ > 12 files changed, 177 insertions(+), 24 deletions(-) > create mode 100644 arch/x86/include/asm/trace/exceptions.h > > diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h > index b90e5df..c04302b 100644 > --- a/arch/x86/include/asm/desc.h > +++ b/arch/x86/include/asm/desc.h > @@ -327,10 +327,28 @@ static inline void write_trace_idt_entry(int entry, > const gate_desc *gate) > { > write_idt_entry(trace_idt_table, entry, gate); > } > + > +static inline void _trace_set_gate(int gate, unsigned type, void *addr, > +unsigned dpl, unsigned ist, unsigned seg) > +{ > + gate_desc s; > + > + pack_gate(, type, (unsigned long)addr, dpl, ist, seg); > + /* > + * does not need to be atomic because it is only done once at > + * setup time > + */ > + write_trace_idt_entry(gate, ); > +} > #else > static inline void write_trace_idt_entry(int entry, const gate_desc *gate) > { > } > + > +static inline void _trace_set_gate(int gate, unsigned type, void *addr, > +unsigned dpl, unsigned ist, unsigned seg) > +{ > +} > #endif > > static inline void _set_gate(int gate, unsigned type, void *addr, > @@ -353,12 +371,20 @@ static inline void _set_gate(int gate, unsigned type, > void *addr, > * Pentium F0 0F bugfix can have resulted in the mapped > * IDT being write-protected. > */ > -static inline void set_intr_gate(unsigned int n, void *addr) > +static inline void set_intr_gate_raw(unsigned int n, void *addr) > { > BUG_ON((unsigned)n > 0xFF); > _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); > } > > +#define set_intr_gate(n, addr) > \ > + do {\ > + BUG_ON((unsigned)n > 0xFF); \ > + _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); \ > + _trace_set_gate(n, GATE_INTERRUPT, trace_##addr, 0, 0, \ > + __KERNEL_CS); \ > + } while (0) > + > extern int first_system_vector; > /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */ > extern unsigned long used_vectors[]; > @@ -395,10 +421,7 @@ static inline voi
RE: [PATCH v2] Introduce page fault tracepoint
Any comment? -Original Message- From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Seiji Aguchi Sent: Friday, August 23, 2013 11:38 AM To: linux-kernel@vger.kernel.org; x...@kernel.org Cc: h...@zytor.com; rost...@goodmis.org; mi...@elte.hu; b...@alien8.de; t...@linutronix.de; fdesl...@gmail.com; raphael.beamo...@gmail.com; dle-deve...@lists.sourceforge.net; Tomoki Sekiyama Subject: [PATCH v2] Introduce page fault tracepoint Change from v1 - Print regs-ip instead of regs to know where the page fault event happened (Suggested by Steven Rostedt) This patch introduces page fault tracepoints to x86 architecture by switching IDT. [Use case of page fault events] Two events, for user and kernel spaces, are introduced at the beginning of page fault handler. - User space event There is a request of page fault event for user space as below. http://marc.info/?l=linux-mmm=136807959830182w=2 http://marc.info/?l=linux-mmm=136807959130175w=2 - Kernel space event: Overhead in kernel space is measurable by enabling it. [Creating IDT] A way to create IDT is as below. - Introduce set_intr_gate_raw() to register just non-trace handler to IDT. This is used at boot time which tracing is disabled. - Make set_intr_gate() macro so that it can register trace handler to trace IDT and non-trace handler to normal IDT. Signed-off-by: Seiji Aguchi seiji.agu...@hds.com --- arch/x86/include/asm/desc.h | 33 + arch/x86/include/asm/hw_irq.h | 14 - arch/x86/include/asm/trace/exceptions.h | 51 + arch/x86/include/asm/traps.h| 22 ++ arch/x86/kernel/entry_32.S | 10 +++ arch/x86/kernel/entry_64.S | 13 - arch/x86/kernel/head64.c| 2 +- arch/x86/kernel/irqinit.c | 2 +- arch/x86/kernel/kvm.c | 2 +- arch/x86/kernel/traps.c | 28 +- arch/x86/mm/Makefile| 2 ++ arch/x86/mm/fault.c | 22 ++ 12 files changed, 177 insertions(+), 24 deletions(-) create mode 100644 arch/x86/include/asm/trace/exceptions.h diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index b90e5df..c04302b 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -327,10 +327,28 @@ static inline void write_trace_idt_entry(int entry, const gate_desc *gate) { write_idt_entry(trace_idt_table, entry, gate); } + +static inline void _trace_set_gate(int gate, unsigned type, void *addr, +unsigned dpl, unsigned ist, unsigned seg) +{ + gate_desc s; + + pack_gate(s, type, (unsigned long)addr, dpl, ist, seg); + /* + * does not need to be atomic because it is only done once at + * setup time + */ + write_trace_idt_entry(gate, s); +} #else static inline void write_trace_idt_entry(int entry, const gate_desc *gate) { } + +static inline void _trace_set_gate(int gate, unsigned type, void *addr, +unsigned dpl, unsigned ist, unsigned seg) +{ +} #endif static inline void _set_gate(int gate, unsigned type, void *addr, @@ -353,12 +371,20 @@ static inline void _set_gate(int gate, unsigned type, void *addr, * Pentium F0 0F bugfix can have resulted in the mapped * IDT being write-protected. */ -static inline void set_intr_gate(unsigned int n, void *addr) +static inline void set_intr_gate_raw(unsigned int n, void *addr) { BUG_ON((unsigned)n 0xFF); _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); } +#define set_intr_gate(n, addr) \ + do {\ + BUG_ON((unsigned)n 0xFF); \ + _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); \ + _trace_set_gate(n, GATE_INTERRUPT, trace_##addr, 0, 0, \ + __KERNEL_CS); \ + } while (0) + extern int first_system_vector; /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */ extern unsigned long used_vectors[]; @@ -395,10 +421,7 @@ static inline void trace_set_intr_gate(unsigned int gate, void *addr) #define __trace_alloc_intr_gate(n, addr) #endif -static inline void __alloc_intr_gate(unsigned int n, void *addr) -{ - set_intr_gate(n, addr); -} +#define __alloc_intr_gate(n, addr) set_intr_gate(n, addr) #define alloc_intr_gate(n, addr) \ do {\ diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h index e4ac559..fbd73b7
RE: [PATCH v2] Introduce page fault tracepoint
-Original Message- From: Steven Rostedt [mailto:rost...@goodmis.org] Sent: Friday, September 06, 2013 12:50 PM To: Seiji Aguchi Cc: linux-kernel@vger.kernel.org; x...@kernel.org; h...@zytor.com; mi...@elte.hu; b...@alien8.de; t...@linutronix.de; fdesl...@gmail.com; raphael.beamo...@gmail.com; dle-deve...@lists.sourceforge.net; Tomoki Sekiyama Subject: Re: [PATCH v2] Introduce page fault tracepoint On Fri, 23 Aug 2013 11:37:43 -0400 Seiji Aguchi seiji.agu...@hds.com wrote: +#if !defined(_TRACE_PAGE_FAULT_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_PAGE_FAULT_H + +#include linux/tracepoint.h + +extern void trace_irq_vector_regfunc(void); +extern void trace_irq_vector_unregfunc(void); + +DECLARE_EVENT_CLASS(x86_exceptions, + + TP_PROTO(unsigned long address, struct pt_regs *regs, +unsigned long error_code), + + TP_ARGS(address, regs, error_code), + + TP_STRUCT__entry( + __field(unsigned long, address ) + __field(struct pt_regs *, regs ) + __field(unsigned long, error_code ) + ), + + TP_fast_assign( + __entry-address = address; + __entry-regs = regs; + __entry-error_code = error_code; + ), + + TP_printk(address=0x%lx ip=0x%lx error_code=0x%lx, + __entry-address, __entry-regs-ip, __entry-error_code) ); This is sure to crash the kernel. You just saved the address of a pointer to some task's stack in the ring buffer. And then on output (which can happen a long time from when it was recorded), you are dereferencing that same address! That __entry-regs-ip *will* crash the kernel! What you want is to save ip in the fast_assign: __entry-ip = regs-ip And then print that. Never dereference a pointer directly from the ring buffer unless it's a constant value (like a global string). I see.. Thank you for reviewing. I will fix it. Seiji -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC PATCH v2 04/11] pstore: Add compression support to pstore
> But let's make sure that efivars, erst, > etc. are all happy with the changes we make before I ask Linus to pull another > pstore piece. I will test efivars when Aruna posts the bugfix patches. Seiji > -Original Message- > From: Luck, Tony [mailto:tony.l...@intel.com] > Sent: Wednesday, September 04, 2013 12:11 PM > To: Aruna Balakrishnaiah; Seiji Aguchi > Cc: jkeni...@linux.vnet.ibm.com; keesc...@chromium.org; > mah...@linux.vnet.ibm.com; ccr...@android.com; linux- > ker...@vger.kernel.org; linuxppc-...@ozlabs.org; cbouatmai...@gmail.com > Subject: RE: [RFC PATCH v2 04/11] pstore: Add compression support to pstore > > > The reason behind compression failure is the size of big_oops_buf which is > > too > > big for efivars case. I will do some experiments with different kind of > > texts > > for buffer size 1024 to check if 100/53 suits for all the cases. > ... > > > Yes this can be changed to zlib_inflateInit2(). > > Original patch series was just pulled by Linus ... so we'll need a patch on > top > of current Linus git tree to fix these issues. But let's make sure that > efivars, erst, > etc. are all happy with the changes we make before I ask Linus to pull another > pstore piece. > > Thanks > > -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC PATCH v2 04/11] pstore: Add compression support to pstore
But let's make sure that efivars, erst, etc. are all happy with the changes we make before I ask Linus to pull another pstore piece. I will test efivars when Aruna posts the bugfix patches. Seiji -Original Message- From: Luck, Tony [mailto:tony.l...@intel.com] Sent: Wednesday, September 04, 2013 12:11 PM To: Aruna Balakrishnaiah; Seiji Aguchi Cc: jkeni...@linux.vnet.ibm.com; keesc...@chromium.org; mah...@linux.vnet.ibm.com; ccr...@android.com; linux- ker...@vger.kernel.org; linuxppc-...@ozlabs.org; cbouatmai...@gmail.com Subject: RE: [RFC PATCH v2 04/11] pstore: Add compression support to pstore The reason behind compression failure is the size of big_oops_buf which is too big for efivars case. I will do some experiments with different kind of texts for buffer size 1024 to check if 100/53 suits for all the cases. ... Yes this can be changed to zlib_inflateInit2(). Original patch series was just pulled by Linus ... so we'll need a patch on top of current Linus git tree to fix these issues. But let's make sure that efivars, erst, etc. are all happy with the changes we make before I ask Linus to pull another pstore piece. Thanks -Tony -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC PATCH v2 04/11] pstore: Add compression support to pstore
Aruna, Sorry for the late response. > Seiji, > > Could you let us know the efivars buffer size with which the pstore is > registered when > the failure occurred. I looked into the issue today. I added some debug message just before pstore_compress(). As you can see below, the buffer size is a fixed value(1024). Therefore, the size doesn't seem to be related to the failure directly. Also, in the failure case, zlib_deflate() returns Z_OK and stream.avail_out is zero. So, I thought big_oops_buf_sz is too big in my environment. And then, I tuned big_oops_buf_sz to (psinfo->bufsize * 100) / 53. At the same time, while looking into this issue, I had two concerns about current cording. 1) In pstore_decompress(), why not use zlib_inflateInit2() instead of zlib_inflateInit()? If you use zlib_deflateInit2() for specifying window_bit at compressing time, zlib_inflateInit2() seems to be appropriate for decompressing. (Please see a comment about inflateInit2() in include/linux/zlib.h and source code of crypto/deflate.c) 2) As looking at crypto/deflate.c, stream->workspace is kzalloced at the beginning of compressing/decompressing time. So, in pstore case, stream.workspace must be initialized to 0 with memset() in pstore_compress()/decompress(). I did three things above, tuning big_oops_buf_sz, using zlib_inflateInit2() and initializing stream.workspace , in my environment. As a result, compressions/decmpressions of all entries succeeded with efivars driver. Panic#2 Part1 <4>[ 75.665020] [] SyS_write+0x4c/0xa0 <4>[ 75.665020] [] system_call_fastpath+0x16/0x1b <4>[ 75.665020] Code: ef e8 ff f7 ff ff eb d8 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 00 00 55 c7 05 cc f3 c9 00 01 00 00 00 48 89 e5 0f ae f8 04 25 00 00 00 00 01 5d c3 0f 1f 44 00 00 55 31 c0 c7 05 5e <1>[ 75.665020] RIP [] sysrq_handle_crash+0x16/0x20 <4>[ 75.665020] RSP <4>[ 75.665020] CR2: <4>[ 75.665020] ---[ end trace 97bb4a1f8d3fe7b2 ]--- <3>[ 75.665020] pstore_dump hsize=13 len=2155 big_oops_buf_sz=2275 psinfo->bufsize=1024 <3>[ 75.665020] pstore_dump hsize=13 len=2246 big_oops_buf_sz=2275 psinfo->bufsize=1024 <3>[ 75.665020] pstore: compression failed for Part 2 returned -5 <3>[ 75.665020] pstore: Capture uncompressed oops/panic report of Part 2 <3>[ 75.665020] pstore_dump hsize=13 len=2239 big_oops_buf_sz=2275 psinfo->bufsize=1024 <3>[ 75.665020] pstore_dump hsize=13 len=2231 big_oops_buf_sz=2275 psinfo->bufsize=1024 <3>[ 75.665020] pstore_dump hsize=13 len=2185 big_oops_buf_sz=2275 psinfo->bufsize=1024 <3>[ 75.665020] pstore: compression failed for Part 5 returned -5 <3>[ 75.665020] pstore: Capture uncompressed oops/panic report of Part 5 <3>[ 75.665020] pstore_dump hsize=13 len=2234 big_oops_buf_sz=2275 psinfo->bufsize=1024 <3>[ 75.665020] pstore_dump hsize=13 len=2208 big_oops_buf_sz=2275 psinfo->bufsize=1024 <3>[ 75.665020] pstore_dump hsize=13 len=2218 big_oops_buf_sz=2275 psinfo->bufsize=1024 <3>[ 75.665020] pstore_dump hsize=13 len= big_oops_buf_sz=2275 psinfo->bufsize=1024 <3>[ 75.665020] pstore: compression failed for Part 9 returned -5 <3>[ 75.665020] pstore: Capture uncompressed oops/panic report of Part 9 <3>[ 75.665020] pstore_dump hsize=14 len=2256 big_oops_buf_sz=2275 psinfo->bufsize=1024 <3>[ 75.665020] pstore_dump hsize=14 len=2219 big_oops_buf_sz=2275 psinfo->bufsize=1024 <3>[ 75.665020] pstore_dump hsize=14 len=2226 big_oops_buf_sz=2275 psinfo->bufsize=1024 <0>[ 75.665020] Kernel panic - not syncing: Fatal exception <3>[ 75.665020] drm_kms_helper: panic occurred, switching back to text console Seiji -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC PATCH v2 04/11] pstore: Add compression support to pstore
Aruna, Sorry for the late response. Seiji, Could you let us know the efivars buffer size with which the pstore is registered when the failure occurred. I looked into the issue today. I added some debug message just before pstore_compress(). As you can see below, the buffer size is a fixed value(1024). Therefore, the size doesn't seem to be related to the failure directly. Also, in the failure case, zlib_deflate() returns Z_OK and stream.avail_out is zero. So, I thought big_oops_buf_sz is too big in my environment. And then, I tuned big_oops_buf_sz to (psinfo-bufsize * 100) / 53. At the same time, while looking into this issue, I had two concerns about current cording. 1) In pstore_decompress(), why not use zlib_inflateInit2() instead of zlib_inflateInit()? If you use zlib_deflateInit2() for specifying window_bit at compressing time, zlib_inflateInit2() seems to be appropriate for decompressing. (Please see a comment about inflateInit2() in include/linux/zlib.h and source code of crypto/deflate.c) 2) As looking at crypto/deflate.c, stream-workspace is kzalloced at the beginning of compressing/decompressing time. So, in pstore case, stream.workspace must be initialized to 0 with memset() in pstore_compress()/decompress(). I did three things above, tuning big_oops_buf_sz, using zlib_inflateInit2() and initializing stream.workspace , in my environment. As a result, compressions/decmpressions of all entries succeeded with efivars driver. snip Panic#2 Part1 4[ 75.665020] [811af59c] SyS_write+0x4c/0xa0 4[ 75.665020] [8168be82] system_call_fastpath+0x16/0x1b 4[ 75.665020] Code: ef e8 ff f7 ff ff eb d8 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 00 00 55 c7 05 cc f3 c9 00 01 00 00 00 48 89 e5 0f ae f8 c6 04 25 00 00 00 00 01 5d c3 0f 1f 44 00 00 55 31 c0 c7 05 5e 1[ 75.665020] RIP [813d3946] sysrq_handle_crash+0x16/0x20 4[ 75.665020] RSP 88007852de80 4[ 75.665020] CR2: 4[ 75.665020] ---[ end trace 97bb4a1f8d3fe7b2 ]--- 3[ 75.665020] pstore_dump hsize=13 len=2155 big_oops_buf_sz=2275 psinfo-bufsize=1024 3[ 75.665020] pstore_dump hsize=13 len=2246 big_oops_buf_sz=2275 psinfo-bufsize=1024 3[ 75.665020] pstore: compression failed for Part 2 returned -5 3[ 75.665020] pstore: Capture uncompressed oops/panic report of Part 2 3[ 75.665020] pstore_dump hsize=13 len=2239 big_oops_buf_sz=2275 psinfo-bufsize=1024 3[ 75.665020] pstore_dump hsize=13 len=2231 big_oops_buf_sz=2275 psinfo-bufsize=1024 3[ 75.665020] pstore_dump hsize=13 len=2185 big_oops_buf_sz=2275 psinfo-bufsize=1024 3[ 75.665020] pstore: compression failed for Part 5 returned -5 3[ 75.665020] pstore: Capture uncompressed oops/panic report of Part 5 3[ 75.665020] pstore_dump hsize=13 len=2234 big_oops_buf_sz=2275 psinfo-bufsize=1024 3[ 75.665020] pstore_dump hsize=13 len=2208 big_oops_buf_sz=2275 psinfo-bufsize=1024 3[ 75.665020] pstore_dump hsize=13 len=2218 big_oops_buf_sz=2275 psinfo-bufsize=1024 3[ 75.665020] pstore_dump hsize=13 len= big_oops_buf_sz=2275 psinfo-bufsize=1024 3[ 75.665020] pstore: compression failed for Part 9 returned -5 3[ 75.665020] pstore: Capture uncompressed oops/panic report of Part 9 3[ 75.665020] pstore_dump hsize=14 len=2256 big_oops_buf_sz=2275 psinfo-bufsize=1024 3[ 75.665020] pstore_dump hsize=14 len=2219 big_oops_buf_sz=2275 psinfo-bufsize=1024 3[ 75.665020] pstore_dump hsize=14 len=2226 big_oops_buf_sz=2275 psinfo-bufsize=1024 0[ 75.665020] Kernel panic - not syncing: Fatal exception 3[ 75.665020] drm_kms_helper: panic occurred, switching back to text console snip Seiji -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] Introduce page fault tracepoint
Change from v1 - Print regs->ip instead of regs to know where the page fault event happened (Suggested by Steven Rostedt) This patch introduces page fault tracepoints to x86 architecture by switching IDT. [Use case of page fault events] Two events, for user and kernel spaces, are introduced at the beginning of page fault handler. - User space event There is a request of page fault event for user space as below. http://marc.info/?l=linux-mm=136807959830182=2 http://marc.info/?l=linux-mm=136807959130175=2 - Kernel space event: Overhead in kernel space is measurable by enabling it. [Creating IDT] A way to create IDT is as below. - Introduce set_intr_gate_raw() to register just non-trace handler to IDT. This is used at boot time which tracing is disabled. - Make set_intr_gate() macro so that it can register trace handler to trace IDT and non-trace handler to normal IDT. Signed-off-by: Seiji Aguchi --- arch/x86/include/asm/desc.h | 33 + arch/x86/include/asm/hw_irq.h | 14 - arch/x86/include/asm/trace/exceptions.h | 51 + arch/x86/include/asm/traps.h| 22 ++ arch/x86/kernel/entry_32.S | 10 +++ arch/x86/kernel/entry_64.S | 13 - arch/x86/kernel/head64.c| 2 +- arch/x86/kernel/irqinit.c | 2 +- arch/x86/kernel/kvm.c | 2 +- arch/x86/kernel/traps.c | 28 +- arch/x86/mm/Makefile| 2 ++ arch/x86/mm/fault.c | 22 ++ 12 files changed, 177 insertions(+), 24 deletions(-) create mode 100644 arch/x86/include/asm/trace/exceptions.h diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index b90e5df..c04302b 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -327,10 +327,28 @@ static inline void write_trace_idt_entry(int entry, const gate_desc *gate) { write_idt_entry(trace_idt_table, entry, gate); } + +static inline void _trace_set_gate(int gate, unsigned type, void *addr, + unsigned dpl, unsigned ist, unsigned seg) +{ + gate_desc s; + + pack_gate(, type, (unsigned long)addr, dpl, ist, seg); + /* +* does not need to be atomic because it is only done once at +* setup time +*/ + write_trace_idt_entry(gate, ); +} #else static inline void write_trace_idt_entry(int entry, const gate_desc *gate) { } + +static inline void _trace_set_gate(int gate, unsigned type, void *addr, + unsigned dpl, unsigned ist, unsigned seg) +{ +} #endif static inline void _set_gate(int gate, unsigned type, void *addr, @@ -353,12 +371,20 @@ static inline void _set_gate(int gate, unsigned type, void *addr, * Pentium F0 0F bugfix can have resulted in the mapped * IDT being write-protected. */ -static inline void set_intr_gate(unsigned int n, void *addr) +static inline void set_intr_gate_raw(unsigned int n, void *addr) { BUG_ON((unsigned)n > 0xFF); _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); } +#define set_intr_gate(n, addr) \ + do {\ + BUG_ON((unsigned)n > 0xFF); \ + _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); \ + _trace_set_gate(n, GATE_INTERRUPT, trace_##addr, 0, 0, \ + __KERNEL_CS); \ + } while (0) + extern int first_system_vector; /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */ extern unsigned long used_vectors[]; @@ -395,10 +421,7 @@ static inline void trace_set_intr_gate(unsigned int gate, void *addr) #define __trace_alloc_intr_gate(n, addr) #endif -static inline void __alloc_intr_gate(unsigned int n, void *addr) -{ - set_intr_gate(n, addr); -} +#define __alloc_intr_gate(n, addr) set_intr_gate(n, addr) #define alloc_intr_gate(n, addr) \ do {\ diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h index e4ac559..fbd73b7 100644 --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -89,10 +89,22 @@ extern void trace_reschedule_interrupt(void); extern void trace_threshold_interrupt(void); extern void trace_call_function_interrupt(void); extern void trace_call_function_single_interrupt(void); +#else /* CONFIG_TRACING */ +#define trace_apic_timer_interrupt apic_timer_interrupt +#define trace_x86_platform_ipi x86_platform_ipi +#define trace_error_interrupt error_interrupt +#define trace_irq_work_interrupt irq_work_interrupt +#define trace_spurious_interrupt spurious_interrupt
[PATCH v2] Introduce page fault tracepoint
Change from v1 - Print regs-ip instead of regs to know where the page fault event happened (Suggested by Steven Rostedt) This patch introduces page fault tracepoints to x86 architecture by switching IDT. [Use case of page fault events] Two events, for user and kernel spaces, are introduced at the beginning of page fault handler. - User space event There is a request of page fault event for user space as below. http://marc.info/?l=linux-mmm=136807959830182w=2 http://marc.info/?l=linux-mmm=136807959130175w=2 - Kernel space event: Overhead in kernel space is measurable by enabling it. [Creating IDT] A way to create IDT is as below. - Introduce set_intr_gate_raw() to register just non-trace handler to IDT. This is used at boot time which tracing is disabled. - Make set_intr_gate() macro so that it can register trace handler to trace IDT and non-trace handler to normal IDT. Signed-off-by: Seiji Aguchi seiji.agu...@hds.com --- arch/x86/include/asm/desc.h | 33 + arch/x86/include/asm/hw_irq.h | 14 - arch/x86/include/asm/trace/exceptions.h | 51 + arch/x86/include/asm/traps.h| 22 ++ arch/x86/kernel/entry_32.S | 10 +++ arch/x86/kernel/entry_64.S | 13 - arch/x86/kernel/head64.c| 2 +- arch/x86/kernel/irqinit.c | 2 +- arch/x86/kernel/kvm.c | 2 +- arch/x86/kernel/traps.c | 28 +- arch/x86/mm/Makefile| 2 ++ arch/x86/mm/fault.c | 22 ++ 12 files changed, 177 insertions(+), 24 deletions(-) create mode 100644 arch/x86/include/asm/trace/exceptions.h diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index b90e5df..c04302b 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -327,10 +327,28 @@ static inline void write_trace_idt_entry(int entry, const gate_desc *gate) { write_idt_entry(trace_idt_table, entry, gate); } + +static inline void _trace_set_gate(int gate, unsigned type, void *addr, + unsigned dpl, unsigned ist, unsigned seg) +{ + gate_desc s; + + pack_gate(s, type, (unsigned long)addr, dpl, ist, seg); + /* +* does not need to be atomic because it is only done once at +* setup time +*/ + write_trace_idt_entry(gate, s); +} #else static inline void write_trace_idt_entry(int entry, const gate_desc *gate) { } + +static inline void _trace_set_gate(int gate, unsigned type, void *addr, + unsigned dpl, unsigned ist, unsigned seg) +{ +} #endif static inline void _set_gate(int gate, unsigned type, void *addr, @@ -353,12 +371,20 @@ static inline void _set_gate(int gate, unsigned type, void *addr, * Pentium F0 0F bugfix can have resulted in the mapped * IDT being write-protected. */ -static inline void set_intr_gate(unsigned int n, void *addr) +static inline void set_intr_gate_raw(unsigned int n, void *addr) { BUG_ON((unsigned)n 0xFF); _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); } +#define set_intr_gate(n, addr) \ + do {\ + BUG_ON((unsigned)n 0xFF); \ + _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); \ + _trace_set_gate(n, GATE_INTERRUPT, trace_##addr, 0, 0, \ + __KERNEL_CS); \ + } while (0) + extern int first_system_vector; /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */ extern unsigned long used_vectors[]; @@ -395,10 +421,7 @@ static inline void trace_set_intr_gate(unsigned int gate, void *addr) #define __trace_alloc_intr_gate(n, addr) #endif -static inline void __alloc_intr_gate(unsigned int n, void *addr) -{ - set_intr_gate(n, addr); -} +#define __alloc_intr_gate(n, addr) set_intr_gate(n, addr) #define alloc_intr_gate(n, addr) \ do {\ diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h index e4ac559..fbd73b7 100644 --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -89,10 +89,22 @@ extern void trace_reschedule_interrupt(void); extern void trace_threshold_interrupt(void); extern void trace_call_function_interrupt(void); extern void trace_call_function_single_interrupt(void); +#else /* CONFIG_TRACING */ +#define trace_apic_timer_interrupt apic_timer_interrupt +#define trace_x86_platform_ipi x86_platform_ipi +#define trace_error_interrupt error_interrupt +#define trace_irq_work_interrupt irq_work_interrupt +#define trace_spurious_interrupt spurious_interrupt
RE: [RFC PATCH v2 04/11] pstore: Add compression support to pstore
> -Original Message- > From: Luck, Tony [mailto:tony.l...@intel.com] > Sent: Thursday, August 22, 2013 7:17 PM > To: Seiji Aguchi; Aruna Balakrishnaiah; linuxppc-...@ozlabs.org; > linux-kernel@vger.kernel.org; keesc...@chromium.org > Cc: jkeni...@linux.vnet.ibm.com; ana...@in.ibm.com; b...@kernel.crashing.org; > cbouatmai...@gmail.com; > mah...@linux.vnet.ibm.com; ccr...@android.com > Subject: RE: [RFC PATCH v2 04/11] pstore: Add compression support to pstore > > <1>[ 383.209057] RIP [] sysrq_handle_crash+0x16/0x20 > <4>[ 383.209057] RSP > <4>[ 383.209057] CR2: > <4>[ 383.209057] ---[ end trace 04a1cddad37b4b33 ]--- > <3>[ 383.209057] pstore: compression failed for Part 2 returned -5 > <3>[ 383.209057] pstore: Capture uncompressed oops/panic report of Part 2 > <3>[ 383.209057] pstore: compression failed for Part 5 returned -5 > > Interesting. With ERST backend I didn't see these messages. Traces in > pstore recovered files go as far as the line before the "---[ end trace > 04a1cddad37b4b33 ]---" > > Why the difference depending on which back end is in use? I think the difference doesn't depend on the back end. Rather it depends on the environment. I tested on a kvm guest with OVMF. Seiji > > But I agree that we shouldn't have these messages. They use up space > in the persistent store that could be better used saving some more lines > from earlier in the console log. > > -Tony
RE: [RFC PATCH v2 04/11] pstore: Add compression support to pstore
> * callback from kmsg_dump. (s2,l2) has the most recently > * written bytes, older bytes are in (s1,l1). Save as much > @@ -148,23 +243,56 @@ static void pstore_dump(struct kmsg_dumper *dumper, > char *dst; > unsigned long size; > int hsize; > + int zipped_len = -1; > size_t len; > - bool compressed = false; > + bool compressed; > + size_t total_len; > > - dst = psinfo->buf; > - hsize = sprintf(dst, "%s#%d Part%d\n", why, oopscount, part); > - size = psinfo->bufsize - hsize; > - dst += hsize; > + if (big_oops_buf) { > + dst = big_oops_buf; > + hsize = sprintf(dst, "%s#%d Part%d\n", why, > + oopscount, part); > + size = big_oops_buf_sz - hsize; > > - if (!kmsg_dump_get_buffer(dumper, true, dst, size, )) > - break; > + if (!kmsg_dump_get_buffer(dumper, true, dst + hsize, > + size, )) > + break; > + > + zipped_len = pstore_compress(dst, psinfo->buf, > + hsize + len, psinfo->bufsize); > + > + if (zipped_len > 0) { > + compressed = true; > + total_len = zipped_len; > + } else { > + pr_err("pstore: compression failed for Part %d" > + " returned %d\n", part, zipped_len); > + pr_err("pstore: Capture uncompressed" > + " oops/panic report of Part %d\n", > part); Why did you add these messages in pstore_dump()? In my test case, they are not needed # cat dmesg-efi-1 Panic#2 Part1 <4>[ 383.209057] RBP: 88006f551e80 R08: 0002 R09: <4>[ 383.209057] R10: 0382 R11: R12: 0063 <4>[ 383.209057] R13: 0286 R14: 000f R15: <4>[ 383.209057] FS: 7f53317cc740() GS:88007c40() knlGS: <4>[ 383.209057] CS: 0010 DS: ES: CR0: 80050033 <4>[ 383.209057] CR2: CR3: 78a73000 CR4: 06f0 <4>[ 383.209057] Stack: <4>[ 383.209057] 88006f551eb8 813d40a2 0002 7f53317db000 <4>[ 383.209057] 88006f551f50 0002 88006f551ed8 <4>[ 383.209057] 813d45aa 7f53317db000 88003f8c2b00 88006f551ef8 <4>[ 383.209057] Call Trace: <4>[ 383.209057] [] __handle_sysrq+0xa2/0x170 <4>[ 383.209057] [] write_sysrq_trigger+0x4a/0x50 <4>[ 383.209057] [] proc_reg_write+0x3d/0x80 <4>[ 383.209057] [] vfs_write+0xc0/0x1f0 <4>[ 383.209057] [] SyS_write+0x4c/0xa0 <4>[ 383.209057] [] system_call_fastpath+0x16/0x1b <4>[ 383.209057] Code: ef e8 ff f7 ff ff eb d8 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 00 00 55 c7 05 cc f3 c9 00 01 00 00 00 48 89 e5 0f ae f8 04 25 00 00 00 00 01 5d c3 0f 1f 44 00 00 55 31 c0 c7 05 5e <1>[ 383.209057] RIP [] sysrq_handle_crash+0x16/0x20 <4>[ 383.209057] RSP <4>[ 383.209057] CR2: <4>[ 383.209057] ---[ end trace 04a1cddad37b4b33 ]--- <3>[ 383.209057] pstore: compression failed for Part 2 returned -5 <3>[ 383.209057] pstore: Capture uncompressed oops/panic report of Part 2 <3>[ 383.209057] pstore: compression failed for Part 5 returned -5 <3>[ 383.209057] pstore: Capture uncompressed oops/panic report of Part 5 <3>[ 383.209057] pstore: compression failed for Part 12 returned -5 <3>[ 383.209057] pstore: Capture uncompressed oops/panic report of Part 12 <0>[ 383.209057] Kernel panic - not syncing: Fatal exception <3>[ 383.209057] drm_kms_helper: panic occurred, switching back to text console In efi-pstore case, after rebooting a system, we are able to know which entry failed to compress with 'C' or 'D' as below. #ls /sys/firmware/efi/vars/ |grep dump dump-type0-10-1-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0 dump-type0-10-2-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0 dump-type0-11-1-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0 dump-type0-1-1-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0 dump-type0-11-2-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0 dump-type0-12-1-1377204734-D-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0 dump-type0-1-2-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0 dump-type0-12-2-1377204734-D-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0 dump-type0-13-1-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0 dump-type0-13-2-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0 dump-type0-2-1-1377204734-D-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
RE: [RFC PATCH v2 06/11] pstore: Add decompression support to pstore
> -Original Message- > From: linux-kernel-ow...@vger.kernel.org > [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Aruna Balakrishnaiah > Sent: Friday, August 16, 2013 9:18 AM > To: linuxppc-...@ozlabs.org; tony.l...@intel.com; > linux-kernel@vger.kernel.org; keesc...@chromium.org > Cc: jkeni...@linux.vnet.ibm.com; ana...@in.ibm.com; b...@kernel.crashing.org; > cbouatmai...@gmail.com; > mah...@linux.vnet.ibm.com; ccr...@android.com > Subject: [RFC PATCH v2 06/11] pstore: Add decompression support to pstore > > Based on the flag 'compressed' set or not, pstore will decompress the > data returning a plain text file. If decompression fails for a particular > record it will have the compressed data in the file which can be > decompressed with 'openssl' command line tool. If the decompression fails and openssl doesn't work, the worst case is that users can't read the entry. In that case, pstore is meaningless at all. Also, for users who want to get a single panic message, a compression is not needed. So, I think we still have to support non-compression mode. (IMO, pstore can take kdump as a model. Kdump supports both compression and non-compression mode.) But, if you think my comment is outside this patchset, it's OK. We can make it with a separate patch. Seiji
RE: [RFC][PATCH] Introduce page fault tracepoint
> Printing the regs pointer is rather useless. This is specific for x86, > why not print the ip of where it happened and the faulting address > itself? Thank you for reviewing. I will change the regs pointer to ip. > Note, you only need to change the TP_printk() to do that. For > efficiency reasons, only pass in regs. OK. Will change the TP_printk(). Seiji -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC][PATCH] Introduce page fault tracepoint
Printing the regs pointer is rather useless. This is specific for x86, why not print the ip of where it happened and the faulting address itself? Thank you for reviewing. I will change the regs pointer to ip. Note, you only need to change the TP_printk() to do that. For efficiency reasons, only pass in regs. OK. Will change the TP_printk(). Seiji -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC PATCH v2 06/11] pstore: Add decompression support to pstore
-Original Message- From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Aruna Balakrishnaiah Sent: Friday, August 16, 2013 9:18 AM To: linuxppc-...@ozlabs.org; tony.l...@intel.com; linux-kernel@vger.kernel.org; keesc...@chromium.org Cc: jkeni...@linux.vnet.ibm.com; ana...@in.ibm.com; b...@kernel.crashing.org; cbouatmai...@gmail.com; mah...@linux.vnet.ibm.com; ccr...@android.com Subject: [RFC PATCH v2 06/11] pstore: Add decompression support to pstore Based on the flag 'compressed' set or not, pstore will decompress the data returning a plain text file. If decompression fails for a particular record it will have the compressed data in the file which can be decompressed with 'openssl' command line tool. If the decompression fails and openssl doesn't work, the worst case is that users can't read the entry. In that case, pstore is meaningless at all. Also, for users who want to get a single panic message, a compression is not needed. So, I think we still have to support non-compression mode. (IMO, pstore can take kdump as a model. Kdump supports both compression and non-compression mode.) But, if you think my comment is outside this patchset, it's OK. We can make it with a separate patch. Seiji
RE: [RFC PATCH v2 04/11] pstore: Add compression support to pstore
* callback from kmsg_dump. (s2,l2) has the most recently * written bytes, older bytes are in (s1,l1). Save as much @@ -148,23 +243,56 @@ static void pstore_dump(struct kmsg_dumper *dumper, char *dst; unsigned long size; int hsize; + int zipped_len = -1; size_t len; - bool compressed = false; + bool compressed; + size_t total_len; - dst = psinfo-buf; - hsize = sprintf(dst, %s#%d Part%d\n, why, oopscount, part); - size = psinfo-bufsize - hsize; - dst += hsize; + if (big_oops_buf) { + dst = big_oops_buf; + hsize = sprintf(dst, %s#%d Part%d\n, why, + oopscount, part); + size = big_oops_buf_sz - hsize; - if (!kmsg_dump_get_buffer(dumper, true, dst, size, len)) - break; + if (!kmsg_dump_get_buffer(dumper, true, dst + hsize, + size, len)) + break; + + zipped_len = pstore_compress(dst, psinfo-buf, + hsize + len, psinfo-bufsize); + + if (zipped_len 0) { + compressed = true; + total_len = zipped_len; + } else { + pr_err(pstore: compression failed for Part %d + returned %d\n, part, zipped_len); + pr_err(pstore: Capture uncompressed + oops/panic report of Part %d\n, part); Why did you add these messages in pstore_dump()? In my test case, they are not needed snip # cat dmesg-efi-1 Panic#2 Part1 4[ 383.209057] RBP: 88006f551e80 R08: 0002 R09: 4[ 383.209057] R10: 0382 R11: R12: 0063 4[ 383.209057] R13: 0286 R14: 000f R15: 4[ 383.209057] FS: 7f53317cc740() GS:88007c40() knlGS: 4[ 383.209057] CS: 0010 DS: ES: CR0: 80050033 4[ 383.209057] CR2: CR3: 78a73000 CR4: 06f0 4[ 383.209057] Stack: 4[ 383.209057] 88006f551eb8 813d40a2 0002 7f53317db000 4[ 383.209057] 88006f551f50 0002 88006f551ed8 4[ 383.209057] 813d45aa 7f53317db000 88003f8c2b00 88006f551ef8 4[ 383.209057] Call Trace: 4[ 383.209057] [813d40a2] __handle_sysrq+0xa2/0x170 4[ 383.209057] [813d45aa] write_sysrq_trigger+0x4a/0x50 4[ 383.209057] [8121981d] proc_reg_write+0x3d/0x80 4[ 383.209057] [811aeb20] vfs_write+0xc0/0x1f0 4[ 383.209057] [811af59c] SyS_write+0x4c/0xa0 4[ 383.209057] [8168be82] system_call_fastpath+0x16/0x1b 4[ 383.209057] Code: ef e8 ff f7 ff ff eb d8 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 00 00 55 c7 05 cc f3 c9 00 01 00 00 00 48 89 e5 0f ae f8 c6 04 25 00 00 00 00 01 5d c3 0f 1f 44 00 00 55 31 c0 c7 05 5e 1[ 383.209057] RIP [813d3946] sysrq_handle_crash+0x16/0x20 4[ 383.209057] RSP 88006f551e80 4[ 383.209057] CR2: 4[ 383.209057] ---[ end trace 04a1cddad37b4b33 ]--- 3[ 383.209057] pstore: compression failed for Part 2 returned -5 3[ 383.209057] pstore: Capture uncompressed oops/panic report of Part 2 3[ 383.209057] pstore: compression failed for Part 5 returned -5 3[ 383.209057] pstore: Capture uncompressed oops/panic report of Part 5 3[ 383.209057] pstore: compression failed for Part 12 returned -5 3[ 383.209057] pstore: Capture uncompressed oops/panic report of Part 12 0[ 383.209057] Kernel panic - not syncing: Fatal exception 3[ 383.209057] drm_kms_helper: panic occurred, switching back to text console snip In efi-pstore case, after rebooting a system, we are able to know which entry failed to compress with 'C' or 'D' as below. #ls /sys/firmware/efi/vars/ |grep dump dump-type0-10-1-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0 dump-type0-10-2-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0 dump-type0-11-1-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0 dump-type0-1-1-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0 dump-type0-11-2-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0 dump-type0-12-1-1377204734-D-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0 dump-type0-1-2-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0 dump-type0-12-2-1377204734-D-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0 dump-type0-13-1-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0 dump-type0-13-2-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0 dump-type0-2-1-1377204734-D-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
RE: [RFC PATCH v2 04/11] pstore: Add compression support to pstore
-Original Message- From: Luck, Tony [mailto:tony.l...@intel.com] Sent: Thursday, August 22, 2013 7:17 PM To: Seiji Aguchi; Aruna Balakrishnaiah; linuxppc-...@ozlabs.org; linux-kernel@vger.kernel.org; keesc...@chromium.org Cc: jkeni...@linux.vnet.ibm.com; ana...@in.ibm.com; b...@kernel.crashing.org; cbouatmai...@gmail.com; mah...@linux.vnet.ibm.com; ccr...@android.com Subject: RE: [RFC PATCH v2 04/11] pstore: Add compression support to pstore 1[ 383.209057] RIP [813d3946] sysrq_handle_crash+0x16/0x20 4[ 383.209057] RSP 88006f551e80 4[ 383.209057] CR2: 4[ 383.209057] ---[ end trace 04a1cddad37b4b33 ]--- 3[ 383.209057] pstore: compression failed for Part 2 returned -5 3[ 383.209057] pstore: Capture uncompressed oops/panic report of Part 2 3[ 383.209057] pstore: compression failed for Part 5 returned -5 Interesting. With ERST backend I didn't see these messages. Traces in pstore recovered files go as far as the line before the ---[ end trace 04a1cddad37b4b33 ]--- Why the difference depending on which back end is in use? I think the difference doesn't depend on the back end. Rather it depends on the environment. I tested on a kvm guest with OVMF. Seiji But I agree that we shouldn't have these messages. They use up space in the persistent store that could be better used saving some more lines from earlier in the console log. -Tony
[RFC][PATCH] efivars,efi-pstore: Hold off deletion of sysfs entry until the scan is completed
Currently, when mounting pstore file system, a read callback of efi_pstore driver runs mutiple times as below. - In the first read callback, scan efivar_sysfs_list from head and pass a kmsg buffer of a entry to an upper pstore layer. - In the second read callback, rescan efivar_sysfs_list from the entry and pass another kmsg buffer to it. - Repeat the scan and pass until the end of efivar_sysfs_list. In this process, an entry is read across the multiple read function calls. To avoid race between the read and erasion, the whole process above is protected by a spinlock, holding in open() and releasing in close(). At the same time, kmemdup() is called to pass the buffer to pstore filesystem during it. And then, it causes a following lockdep warning. To make the read callback runnable without taking spinlok, holding off a deletion of sysfs entry if it happens while scanning it via efi_pstore, and deleting it after the scan is completed. To implement it, this patch introduces two flags, scanning and deleting, to efivar_entry. Also, __efivar_entry_get() is removed because it was used in efi_pstore only. [1.143710] [ cut here ] [1.144058] WARNING: CPU: 1 PID: 1 at kernel/lockdep.c:2740 lockdep_trace_alloc+0x104/0x110() [1.144058] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags)) [1.144058] Modules linked in: [1.144058] CPU: 1 PID: 1 Comm: systemd Not tainted 3.11.0-rc5 #2 [1.144058] 0009 8800797e9ae0 816614a5 8800797e9b28 [1.144058] 8800797e9b18 8105510d 0080 0046 [1.144058] 00d0 03af 81ccd0c0 8800797e9b78 [1.144058] Call Trace: [1.144058] [] dump_stack+0x54/0x74 [1.144058] [] warn_slowpath_common+0x7d/0xa0 [1.144058] [] warn_slowpath_fmt+0x4c/0x50 [1.144058] [] ? vsscanf+0x57f/0x7b0 [1.144058] [] lockdep_trace_alloc+0x104/0x110 [1.144058] [] __kmalloc_track_caller+0x50/0x280 [1.144058] [] ? efi_pstore_read_func.part.1+0x12b/0x170 [1.144058] [] kmemdup+0x20/0x50 [1.144058] [] efi_pstore_read_func.part.1+0x12b/0x170 [1.144058] [] ? efi_pstore_read_func.part.1+0x170/0x170 [1.144058] [] efi_pstore_read_func+0xb4/0xe0 [1.144058] [] __efivar_entry_iter+0xfb/0x120 [1.144058] [] efi_pstore_read+0x3f/0x50 [1.144058] [] pstore_get_records+0x9a/0x150 [1.158207] [] ? selinux_d_instantiate+0x1c/0x20 [1.158207] [] ? parse_options+0x80/0x80 [1.158207] [] pstore_fill_super+0xa5/0xc0 [1.158207] [] mount_single+0xa2/0xd0 [1.158207] [] pstore_mount+0x18/0x20 [1.158207] [] mount_fs+0x39/0x1b0 [1.158207] [] ? __alloc_percpu+0x10/0x20 [1.158207] [] vfs_kern_mount+0x63/0xf0 [1.158207] [] do_mount+0x23e/0xa20 [1.158207] [] ? strndup_user+0x4b/0xf0 [1.158207] [] SyS_mount+0x83/0xc0 [1.158207] [] system_call_fastpath+0x16/0x1b [1.158207] ---[ end trace 61981bc62de9f6f4 ]--- Signed-off-by: Seiji Aguchi --- drivers/firmware/efi/efi-pstore.c | 145 +++--- drivers/firmware/efi/efivars.c| 3 +- drivers/firmware/efi/vars.c | 39 +++--- include/linux/efi.h | 4 +- 4 files changed, 151 insertions(+), 40 deletions(-) diff --git a/drivers/firmware/efi/efi-pstore.c b/drivers/firmware/efi/efi-pstore.c index 73de5a9..db96b0d 100644 --- a/drivers/firmware/efi/efi-pstore.c +++ b/drivers/firmware/efi/efi-pstore.c @@ -18,14 +18,12 @@ module_param_named(pstore_disable, efivars_pstore_disable, bool, 0644); static int efi_pstore_open(struct pstore_info *psi) { - efivar_entry_iter_begin(); psi->data = NULL; return 0; } static int efi_pstore_close(struct pstore_info *psi) { - efivar_entry_iter_end(); psi->data = NULL; return 0; } @@ -38,6 +36,23 @@ struct pstore_read_data { char **buf; }; +/** + * efi_pstore_read_func + * @entry: reading entry + * @data: data of the entry + * + * This function runs in non-atomic context. + * + * Also, it returns a size of NVRAM entry logged via efi_pstore_write(). + * pstore in accordance with the returned value as below. + * + * size > 0: Got data of an entry logged via efi_pstore_write() successfully, + * and pstore filesystem will continue reading subsequent entries. + * size == 0: Entry was not logged via efi_pstore_write(), + *and efi_pstore driver will continue reading subsequent entries. + * size < 0: Failed to get data of entry logging via efi_pstore_write(), + * and pstore will stop reading entry. + */ static int efi_pstore_read_func(struct efivar_entry *entry, void *data) { efi_guid_t vendor = LINUX_EFI_CRASH_GUID; @@ -75,8 +90,9 @@ static int efi_pstore_read_func(struct efivar_entry *entry, void *data) return 0; entry->var.DataSize = 1024; - __efivar_entry_get(entry,
[RFC][PATCH] efivars,efi-pstore: Hold off deletion of sysfs entry until the scan is completed
Currently, when mounting pstore file system, a read callback of efi_pstore driver runs mutiple times as below. - In the first read callback, scan efivar_sysfs_list from head and pass a kmsg buffer of a entry to an upper pstore layer. - In the second read callback, rescan efivar_sysfs_list from the entry and pass another kmsg buffer to it. - Repeat the scan and pass until the end of efivar_sysfs_list. In this process, an entry is read across the multiple read function calls. To avoid race between the read and erasion, the whole process above is protected by a spinlock, holding in open() and releasing in close(). At the same time, kmemdup() is called to pass the buffer to pstore filesystem during it. And then, it causes a following lockdep warning. To make the read callback runnable without taking spinlok, holding off a deletion of sysfs entry if it happens while scanning it via efi_pstore, and deleting it after the scan is completed. To implement it, this patch introduces two flags, scanning and deleting, to efivar_entry. Also, __efivar_entry_get() is removed because it was used in efi_pstore only. [1.143710] [ cut here ] [1.144058] WARNING: CPU: 1 PID: 1 at kernel/lockdep.c:2740 lockdep_trace_alloc+0x104/0x110() [1.144058] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags)) [1.144058] Modules linked in: [1.144058] CPU: 1 PID: 1 Comm: systemd Not tainted 3.11.0-rc5 #2 [1.144058] 0009 8800797e9ae0 816614a5 8800797e9b28 [1.144058] 8800797e9b18 8105510d 0080 0046 [1.144058] 00d0 03af 81ccd0c0 8800797e9b78 [1.144058] Call Trace: [1.144058] [816614a5] dump_stack+0x54/0x74 [1.144058] [8105510d] warn_slowpath_common+0x7d/0xa0 [1.144058] [8105517c] warn_slowpath_fmt+0x4c/0x50 [1.144058] [8131290f] ? vsscanf+0x57f/0x7b0 [1.144058] [810bbd74] lockdep_trace_alloc+0x104/0x110 [1.144058] [81192da0] __kmalloc_track_caller+0x50/0x280 [1.144058] [815147bb] ? efi_pstore_read_func.part.1+0x12b/0x170 [1.144058] [8115b260] kmemdup+0x20/0x50 [1.144058] [815147bb] efi_pstore_read_func.part.1+0x12b/0x170 [1.144058] [81514800] ? efi_pstore_read_func.part.1+0x170/0x170 [1.144058] [815148b4] efi_pstore_read_func+0xb4/0xe0 [1.144058] [81512b7b] __efivar_entry_iter+0xfb/0x120 [1.144058] [8151428f] efi_pstore_read+0x3f/0x50 [1.144058] [8128d7ba] pstore_get_records+0x9a/0x150 [1.158207] [812af25c] ? selinux_d_instantiate+0x1c/0x20 [1.158207] [8128ce30] ? parse_options+0x80/0x80 [1.158207] [8128ced5] pstore_fill_super+0xa5/0xc0 [1.158207] [811ae7d2] mount_single+0xa2/0xd0 [1.158207] [8128ccf8] pstore_mount+0x18/0x20 [1.158207] [811ae8b9] mount_fs+0x39/0x1b0 [1.158207] [81160550] ? __alloc_percpu+0x10/0x20 [1.158207] [811c9493] vfs_kern_mount+0x63/0xf0 [1.158207] [811cbb0e] do_mount+0x23e/0xa20 [1.158207] [8115b51b] ? strndup_user+0x4b/0xf0 [1.158207] [811cc373] SyS_mount+0x83/0xc0 [1.158207] [81673cc2] system_call_fastpath+0x16/0x1b [1.158207] ---[ end trace 61981bc62de9f6f4 ]--- Signed-off-by: Seiji Aguchi seiji.agu...@hds.com --- drivers/firmware/efi/efi-pstore.c | 145 +++--- drivers/firmware/efi/efivars.c| 3 +- drivers/firmware/efi/vars.c | 39 +++--- include/linux/efi.h | 4 +- 4 files changed, 151 insertions(+), 40 deletions(-) diff --git a/drivers/firmware/efi/efi-pstore.c b/drivers/firmware/efi/efi-pstore.c index 73de5a9..db96b0d 100644 --- a/drivers/firmware/efi/efi-pstore.c +++ b/drivers/firmware/efi/efi-pstore.c @@ -18,14 +18,12 @@ module_param_named(pstore_disable, efivars_pstore_disable, bool, 0644); static int efi_pstore_open(struct pstore_info *psi) { - efivar_entry_iter_begin(); psi-data = NULL; return 0; } static int efi_pstore_close(struct pstore_info *psi) { - efivar_entry_iter_end(); psi-data = NULL; return 0; } @@ -38,6 +36,23 @@ struct pstore_read_data { char **buf; }; +/** + * efi_pstore_read_func + * @entry: reading entry + * @data: data of the entry + * + * This function runs in non-atomic context. + * + * Also, it returns a size of NVRAM entry logged via efi_pstore_write(). + * pstore in accordance with the returned value as below. + * + * size 0: Got data of an entry logged via efi_pstore_write() successfully, + * and pstore filesystem will continue reading subsequent entries. + * size == 0: Entry was not logged via efi_pstore_write(), + *and efi_pstore driver will continue reading subsequent entries. + * size 0: Failed to get data of entry logging
RE: [RFC][PATCH] Introduce page fault tracepoint
Any comment? > -Original Message- > From: Seiji Aguchi [mailto:seiji.agu...@hds.com] > Sent: Tuesday, July 30, 2013 6:53 PM > To: linux-kernel@vger.kernel.org; x...@kernel.org > Cc: h...@zytor.com; rost...@goodmis.org; mi...@elte.hu; b...@alien8.de; > t...@linutronix.de; fdesl...@gmail.com; > raphael.beamo...@gmail.com; dle-deve...@lists.sourceforge.net; Tomoki Sekiyama > Subject: [RFC][PATCH] Introduce page fault tracepoint > > This patch introduces page fault tracepoints to x86 architecture > by switching IDT. > > [Use case of page fault events] > > Two events, for user and kernel spaces, are introduced at the beginning of > page fault handler. > > - User space event > There is a request of page fault event for user space as below. > > http://marc.info/?l=linux-mm=136807959830182=2 > http://marc.info/?l=linux-mm=136807959130175=2 > > - Kernel space event: > Overhead in kernel space is measurable by enabling it. > > [Creating IDT] > > A way to create IDT is as below. > > - Introduce set_intr_gate_raw() to register just non-trace handler to IDT. >This is used at boot time which tracing is disabled. > - Make set_intr_gate() macro so that it can register trace handler to >trace IDT and non-trace handler to normal IDT. > > Signed-off-by: Seiji Aguchi > --- > arch/x86/include/asm/desc.h | 33 +--- > arch/x86/include/asm/hw_irq.h | 14 - > arch/x86/include/asm/trace/exceptions.h | 51 > +++ > arch/x86/include/asm/traps.h| 22 + > arch/x86/kernel/entry_32.S | 10 ++ > arch/x86/kernel/entry_64.S | 13 +++- > arch/x86/kernel/head64.c|2 +- > arch/x86/kernel/irqinit.c |2 +- > arch/x86/kernel/kvm.c |2 +- > arch/x86/kernel/traps.c | 28 > arch/x86/mm/Makefile|2 + > arch/x86/mm/fault.c | 22 + > 12 files changed, 177 insertions(+), 24 deletions(-) > create mode 100644 arch/x86/include/asm/trace/exceptions.h > > diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h > index b90e5df..c04302b 100644 > --- a/arch/x86/include/asm/desc.h > +++ b/arch/x86/include/asm/desc.h > @@ -327,10 +327,28 @@ static inline void write_trace_idt_entry(int entry, > const gate_desc *gate) > { > write_idt_entry(trace_idt_table, entry, gate); > } > + > +static inline void _trace_set_gate(int gate, unsigned type, void *addr, > +unsigned dpl, unsigned ist, unsigned seg) > +{ > + gate_desc s; > + > + pack_gate(, type, (unsigned long)addr, dpl, ist, seg); > + /* > + * does not need to be atomic because it is only done once at > + * setup time > + */ > + write_trace_idt_entry(gate, ); > +} > #else > static inline void write_trace_idt_entry(int entry, const gate_desc *gate) > { > } > + > +static inline void _trace_set_gate(int gate, unsigned type, void *addr, > +unsigned dpl, unsigned ist, unsigned seg) > +{ > +} > #endif > > static inline void _set_gate(int gate, unsigned type, void *addr, > @@ -353,12 +371,20 @@ static inline void _set_gate(int gate, unsigned type, > void *addr, > * Pentium F0 0F bugfix can have resulted in the mapped > * IDT being write-protected. > */ > -static inline void set_intr_gate(unsigned int n, void *addr) > +static inline void set_intr_gate_raw(unsigned int n, void *addr) > { > BUG_ON((unsigned)n > 0xFF); > _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); > } > > +#define set_intr_gate(n, addr) > \ > + do {\ > + BUG_ON((unsigned)n > 0xFF); \ > + _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); \ > + _trace_set_gate(n, GATE_INTERRUPT, trace_##addr, 0, 0, \ > + __KERNEL_CS); \ > + } while (0) > + > extern int first_system_vector; > /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */ > extern unsigned long used_vectors[]; > @@ -395,10 +421,7 @@ static inline void trace_set_intr_gate(unsigned int > gate, void *addr) > #define __trace_alloc_intr_gate(n, addr) > #endif > > -static inline void __alloc_intr_gate(unsigned int n, void *addr) > -{ > - set_intr_gate(n, addr); > -} > +#define
RE: [RFC][PATCH] Introduce page fault tracepoint
Any comment? -Original Message- From: Seiji Aguchi [mailto:seiji.agu...@hds.com] Sent: Tuesday, July 30, 2013 6:53 PM To: linux-kernel@vger.kernel.org; x...@kernel.org Cc: h...@zytor.com; rost...@goodmis.org; mi...@elte.hu; b...@alien8.de; t...@linutronix.de; fdesl...@gmail.com; raphael.beamo...@gmail.com; dle-deve...@lists.sourceforge.net; Tomoki Sekiyama Subject: [RFC][PATCH] Introduce page fault tracepoint This patch introduces page fault tracepoints to x86 architecture by switching IDT. [Use case of page fault events] Two events, for user and kernel spaces, are introduced at the beginning of page fault handler. - User space event There is a request of page fault event for user space as below. http://marc.info/?l=linux-mmm=136807959830182w=2 http://marc.info/?l=linux-mmm=136807959130175w=2 - Kernel space event: Overhead in kernel space is measurable by enabling it. [Creating IDT] A way to create IDT is as below. - Introduce set_intr_gate_raw() to register just non-trace handler to IDT. This is used at boot time which tracing is disabled. - Make set_intr_gate() macro so that it can register trace handler to trace IDT and non-trace handler to normal IDT. Signed-off-by: Seiji Aguchi seiji.agu...@hds.com --- arch/x86/include/asm/desc.h | 33 +--- arch/x86/include/asm/hw_irq.h | 14 - arch/x86/include/asm/trace/exceptions.h | 51 +++ arch/x86/include/asm/traps.h| 22 + arch/x86/kernel/entry_32.S | 10 ++ arch/x86/kernel/entry_64.S | 13 +++- arch/x86/kernel/head64.c|2 +- arch/x86/kernel/irqinit.c |2 +- arch/x86/kernel/kvm.c |2 +- arch/x86/kernel/traps.c | 28 arch/x86/mm/Makefile|2 + arch/x86/mm/fault.c | 22 + 12 files changed, 177 insertions(+), 24 deletions(-) create mode 100644 arch/x86/include/asm/trace/exceptions.h diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index b90e5df..c04302b 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -327,10 +327,28 @@ static inline void write_trace_idt_entry(int entry, const gate_desc *gate) { write_idt_entry(trace_idt_table, entry, gate); } + +static inline void _trace_set_gate(int gate, unsigned type, void *addr, +unsigned dpl, unsigned ist, unsigned seg) +{ + gate_desc s; + + pack_gate(s, type, (unsigned long)addr, dpl, ist, seg); + /* + * does not need to be atomic because it is only done once at + * setup time + */ + write_trace_idt_entry(gate, s); +} #else static inline void write_trace_idt_entry(int entry, const gate_desc *gate) { } + +static inline void _trace_set_gate(int gate, unsigned type, void *addr, +unsigned dpl, unsigned ist, unsigned seg) +{ +} #endif static inline void _set_gate(int gate, unsigned type, void *addr, @@ -353,12 +371,20 @@ static inline void _set_gate(int gate, unsigned type, void *addr, * Pentium F0 0F bugfix can have resulted in the mapped * IDT being write-protected. */ -static inline void set_intr_gate(unsigned int n, void *addr) +static inline void set_intr_gate_raw(unsigned int n, void *addr) { BUG_ON((unsigned)n 0xFF); _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); } +#define set_intr_gate(n, addr) \ + do {\ + BUG_ON((unsigned)n 0xFF); \ + _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); \ + _trace_set_gate(n, GATE_INTERRUPT, trace_##addr, 0, 0, \ + __KERNEL_CS); \ + } while (0) + extern int first_system_vector; /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */ extern unsigned long used_vectors[]; @@ -395,10 +421,7 @@ static inline void trace_set_intr_gate(unsigned int gate, void *addr) #define __trace_alloc_intr_gate(n, addr) #endif -static inline void __alloc_intr_gate(unsigned int n, void *addr) -{ - set_intr_gate(n, addr); -} +#define __alloc_intr_gate(n, addr) set_intr_gate(n, addr) #define alloc_intr_gate(n, addr) \ do {\ diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h index e4ac559..fbd73b7 100644 --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -89,10 +89,22 @@ extern void trace_reschedule_interrupt(void); extern void trace_threshold_interrupt
[RFC][PATCH] Introduce page fault tracepoint
This patch introduces page fault tracepoints to x86 architecture by switching IDT. [Use case of page fault events] Two events, for user and kernel spaces, are introduced at the beginning of page fault handler. - User space event There is a request of page fault event for user space as below. http://marc.info/?l=linux-mm=136807959830182=2 http://marc.info/?l=linux-mm=136807959130175=2 - Kernel space event: Overhead in kernel space is measurable by enabling it. [Creating IDT] A way to create IDT is as below. - Introduce set_intr_gate_raw() to register just non-trace handler to IDT. This is used at boot time which tracing is disabled. - Make set_intr_gate() macro so that it can register trace handler to trace IDT and non-trace handler to normal IDT. Signed-off-by: Seiji Aguchi --- arch/x86/include/asm/desc.h | 33 +--- arch/x86/include/asm/hw_irq.h | 14 - arch/x86/include/asm/trace/exceptions.h | 51 +++ arch/x86/include/asm/traps.h| 22 + arch/x86/kernel/entry_32.S | 10 ++ arch/x86/kernel/entry_64.S | 13 +++- arch/x86/kernel/head64.c|2 +- arch/x86/kernel/irqinit.c |2 +- arch/x86/kernel/kvm.c |2 +- arch/x86/kernel/traps.c | 28 arch/x86/mm/Makefile|2 + arch/x86/mm/fault.c | 22 + 12 files changed, 177 insertions(+), 24 deletions(-) create mode 100644 arch/x86/include/asm/trace/exceptions.h diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index b90e5df..c04302b 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -327,10 +327,28 @@ static inline void write_trace_idt_entry(int entry, const gate_desc *gate) { write_idt_entry(trace_idt_table, entry, gate); } + +static inline void _trace_set_gate(int gate, unsigned type, void *addr, + unsigned dpl, unsigned ist, unsigned seg) +{ + gate_desc s; + + pack_gate(, type, (unsigned long)addr, dpl, ist, seg); + /* +* does not need to be atomic because it is only done once at +* setup time +*/ + write_trace_idt_entry(gate, ); +} #else static inline void write_trace_idt_entry(int entry, const gate_desc *gate) { } + +static inline void _trace_set_gate(int gate, unsigned type, void *addr, + unsigned dpl, unsigned ist, unsigned seg) +{ +} #endif static inline void _set_gate(int gate, unsigned type, void *addr, @@ -353,12 +371,20 @@ static inline void _set_gate(int gate, unsigned type, void *addr, * Pentium F0 0F bugfix can have resulted in the mapped * IDT being write-protected. */ -static inline void set_intr_gate(unsigned int n, void *addr) +static inline void set_intr_gate_raw(unsigned int n, void *addr) { BUG_ON((unsigned)n > 0xFF); _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); } +#define set_intr_gate(n, addr) \ + do {\ + BUG_ON((unsigned)n > 0xFF); \ + _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); \ + _trace_set_gate(n, GATE_INTERRUPT, trace_##addr, 0, 0, \ + __KERNEL_CS); \ + } while (0) + extern int first_system_vector; /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */ extern unsigned long used_vectors[]; @@ -395,10 +421,7 @@ static inline void trace_set_intr_gate(unsigned int gate, void *addr) #define __trace_alloc_intr_gate(n, addr) #endif -static inline void __alloc_intr_gate(unsigned int n, void *addr) -{ - set_intr_gate(n, addr); -} +#define __alloc_intr_gate(n, addr) set_intr_gate(n, addr) #define alloc_intr_gate(n, addr) \ do {\ diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h index e4ac559..fbd73b7 100644 --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -89,10 +89,22 @@ extern void trace_reschedule_interrupt(void); extern void trace_threshold_interrupt(void); extern void trace_call_function_interrupt(void); extern void trace_call_function_single_interrupt(void); +#else /* CONFIG_TRACING */ +#define trace_apic_timer_interrupt apic_timer_interrupt +#define trace_x86_platform_ipi x86_platform_ipi +#define trace_error_interrupt error_interrupt +#define trace_irq_work_interrupt irq_work_interrupt +#define trace_spurious_interrupt spurious_interrupt +#define trace_thermal_interrupt thermal_interrupt +#define trace_reschedule_interrupt reschedule_interrupt +#
[RFC][PATCH] Introduce page fault tracepoint
This patch introduces page fault tracepoints to x86 architecture by switching IDT. [Use case of page fault events] Two events, for user and kernel spaces, are introduced at the beginning of page fault handler. - User space event There is a request of page fault event for user space as below. http://marc.info/?l=linux-mmm=136807959830182w=2 http://marc.info/?l=linux-mmm=136807959130175w=2 - Kernel space event: Overhead in kernel space is measurable by enabling it. [Creating IDT] A way to create IDT is as below. - Introduce set_intr_gate_raw() to register just non-trace handler to IDT. This is used at boot time which tracing is disabled. - Make set_intr_gate() macro so that it can register trace handler to trace IDT and non-trace handler to normal IDT. Signed-off-by: Seiji Aguchi seiji.agu...@hds.com --- arch/x86/include/asm/desc.h | 33 +--- arch/x86/include/asm/hw_irq.h | 14 - arch/x86/include/asm/trace/exceptions.h | 51 +++ arch/x86/include/asm/traps.h| 22 + arch/x86/kernel/entry_32.S | 10 ++ arch/x86/kernel/entry_64.S | 13 +++- arch/x86/kernel/head64.c|2 +- arch/x86/kernel/irqinit.c |2 +- arch/x86/kernel/kvm.c |2 +- arch/x86/kernel/traps.c | 28 arch/x86/mm/Makefile|2 + arch/x86/mm/fault.c | 22 + 12 files changed, 177 insertions(+), 24 deletions(-) create mode 100644 arch/x86/include/asm/trace/exceptions.h diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index b90e5df..c04302b 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -327,10 +327,28 @@ static inline void write_trace_idt_entry(int entry, const gate_desc *gate) { write_idt_entry(trace_idt_table, entry, gate); } + +static inline void _trace_set_gate(int gate, unsigned type, void *addr, + unsigned dpl, unsigned ist, unsigned seg) +{ + gate_desc s; + + pack_gate(s, type, (unsigned long)addr, dpl, ist, seg); + /* +* does not need to be atomic because it is only done once at +* setup time +*/ + write_trace_idt_entry(gate, s); +} #else static inline void write_trace_idt_entry(int entry, const gate_desc *gate) { } + +static inline void _trace_set_gate(int gate, unsigned type, void *addr, + unsigned dpl, unsigned ist, unsigned seg) +{ +} #endif static inline void _set_gate(int gate, unsigned type, void *addr, @@ -353,12 +371,20 @@ static inline void _set_gate(int gate, unsigned type, void *addr, * Pentium F0 0F bugfix can have resulted in the mapped * IDT being write-protected. */ -static inline void set_intr_gate(unsigned int n, void *addr) +static inline void set_intr_gate_raw(unsigned int n, void *addr) { BUG_ON((unsigned)n 0xFF); _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); } +#define set_intr_gate(n, addr) \ + do {\ + BUG_ON((unsigned)n 0xFF); \ + _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS); \ + _trace_set_gate(n, GATE_INTERRUPT, trace_##addr, 0, 0, \ + __KERNEL_CS); \ + } while (0) + extern int first_system_vector; /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */ extern unsigned long used_vectors[]; @@ -395,10 +421,7 @@ static inline void trace_set_intr_gate(unsigned int gate, void *addr) #define __trace_alloc_intr_gate(n, addr) #endif -static inline void __alloc_intr_gate(unsigned int n, void *addr) -{ - set_intr_gate(n, addr); -} +#define __alloc_intr_gate(n, addr) set_intr_gate(n, addr) #define alloc_intr_gate(n, addr) \ do {\ diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h index e4ac559..fbd73b7 100644 --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -89,10 +89,22 @@ extern void trace_reschedule_interrupt(void); extern void trace_threshold_interrupt(void); extern void trace_call_function_interrupt(void); extern void trace_call_function_single_interrupt(void); +#else /* CONFIG_TRACING */ +#define trace_apic_timer_interrupt apic_timer_interrupt +#define trace_x86_platform_ipi x86_platform_ipi +#define trace_error_interrupt error_interrupt +#define trace_irq_work_interrupt irq_work_interrupt +#define trace_spurious_interrupt spurious_interrupt +#define trace_thermal_interrupt thermal_interrupt +#define trace_reschedule_interrupt reschedule_interrupt
RE: Yet more softlockups.
> -Original Message- > From: H. Peter Anvin [mailto:h...@zytor.com] > Sent: Friday, July 05, 2013 12:41 PM > To: Thomas Gleixner > Cc: Dave Jones; Linus Torvalds; Linux Kernel; Ingo Molnar; Peter Zijlstra; > Seiji Aguchi > Subject: Re: Yet more softlockups. > > On 07/05/2013 09:02 AM, Thomas Gleixner wrote: > > On Fri, 5 Jul 2013, Dave Jones wrote: > >> On Fri, Jul 05, 2013 at 05:15:07PM +0200, Thomas Gleixner wrote: > >> > On Fri, 5 Jul 2013, Dave Jones wrote: > >> > > >> > > BUG: soft lockup - CPU#3 stuck for 23s! [trinity-child1:14565] > >> > > perf samples too long (2519 > 2500), lowering > >> kernel.perf_event_max_sample_rate to 5 > >> > > INFO: NMI handler (perf_event_nmi_handler) took too long to run: > >> 238147.002 msecs > >> > > >> > So we see a softlockup of 23 seconds and the perf_event_nmi_handler > >> > claims it did run 23.8 seconds. > >> > > >> > Are there more instances of NMI handler messages ? > >> > >> [ 2552.006181] perf samples too long (2511 > 2500), lowering > >> kernel.perf_event_max_sample_rate to 5 > >> [ 2552.008680] INFO: NMI handler (perf_event_nmi_handler) took too long to > >> run: 500392.002 msecs > > > > Yuck. Spending 50 seconds in NMI context surely explains a softlockup :) > > > > Hmmm... this makes me wonder if the interrupt tracepoint stuff is at > fault here, as it changes the IDT handling for NMI context. This softlockup happens while disabling the interrupt tracepoints, Because if it is enabled, "smp_trace_apic_timer_interrupt" is displayed instead of "smp_apic_timer_interrupt" in the call trace below. But I can't say anything how this issue is related to the tracepoint stuff, I need to reproduce it on my machine first. Call Trace: [] __do_softirq+0xff/0x440 [] irq_exit+0xcd/0xe0 [] smp_apic_timer_interrupt+0x6b/0x9b [] apic_timer_interrupt+0x6f/0x80 Seiji -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/urgent] x86/tracing: Add irq_enter/exit() in smp_trace_reschedule_interrupt()
Commit-ID: 4787c368a9bca39e173d702389ee2eaf0520abc1 Gitweb: http://git.kernel.org/tip/4787c368a9bca39e173d702389ee2eaf0520abc1 Author: Seiji Aguchi AuthorDate: Fri, 28 Jun 2013 14:02:11 -0400 Committer: Ingo Molnar CommitDate: Tue, 2 Jul 2013 09:52:31 +0200 x86/tracing: Add irq_enter/exit() in smp_trace_reschedule_interrupt() Reschedule vector tracepoints may be called in cpu idle state. This causes lockdep check warning below. The tracepoint requires rcu but for accuracy it also requires irq_enter() (tracepoints record the irq context), thus, the tracepoint interrupt handler should be calling irq_enter() and not rcu_irq_enter() (irq_enter() calls rcu_irq_enter()). So, add irq_enter/exit() to smp_trace_reschedule_interrupt() with common pre/post processing functions, smp_entering_irq() and exiting_irq() (exiting_irq() calls just irq_exit() in arch/x86/include/asm/apic.h), because these can be shared among reschedule, call_function, and call_function_single vectors. [ 50.720557] Testing event reschedule_exit: [ 50.721349] [ 50.721502] === [ 50.721835] [ INFO: suspicious RCU usage. ] [ 50.722169] 3.10.0-rc6-4-gcf910e8 #190 Not tainted [ 50.722582] --- [ 50.722915] /c/kernel-tests/src/linux/arch/x86/include/asm/trace/irq_vectors.h:50 suspicious rcu_dereference_check() usage! [ 50.723770] [ 50.723770] other info that might help us debug this: [ 50.723770] [ 50.724385] [ 50.724385] RCU used illegally from idle CPU! [ 50.724385] rcu_scheduler_active = 1, debug_locks = 0 [ 50.725232] RCU used illegally from extended quiescent state! [ 50.725690] no locks held by swapper/0/0. [ 50.726010] [ 50.726010] stack backtrace: [...] Signed-off-by: Seiji Aguchi Reviewed-by: Steven Rostedt Link: http://lkml.kernel.org/r/51cdcfa3.9080...@hds.com Signed-off-by: Ingo Molnar --- arch/x86/kernel/smp.c | 29 ++--- 1 file changed, 18 insertions(+), 11 deletions(-) diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c index f4fe0b8..cdaa347 100644 --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -265,23 +265,30 @@ void smp_reschedule_interrupt(struct pt_regs *regs) */ } -void smp_trace_reschedule_interrupt(struct pt_regs *regs) +static inline void smp_entering_irq(void) { ack_APIC_irq(); + irq_enter(); +} + +void smp_trace_reschedule_interrupt(struct pt_regs *regs) +{ + /* +* Need to call irq_enter() before calling the trace point. +* __smp_reschedule_interrupt() calls irq_enter/exit() too (in +* scheduler_ipi(). This is OK, since those functions are allowed +* to nest. +*/ + smp_entering_irq(); trace_reschedule_entry(RESCHEDULE_VECTOR); __smp_reschedule_interrupt(); trace_reschedule_exit(RESCHEDULE_VECTOR); + exiting_irq(); /* * KVM uses this interrupt to force a cpu out of guest mode */ } -static inline void call_function_entering_irq(void) -{ - ack_APIC_irq(); - irq_enter(); -} - static inline void __smp_call_function_interrupt(void) { generic_smp_call_function_interrupt(); @@ -290,14 +297,14 @@ static inline void __smp_call_function_interrupt(void) void smp_call_function_interrupt(struct pt_regs *regs) { - call_function_entering_irq(); + smp_entering_irq(); __smp_call_function_interrupt(); exiting_irq(); } void smp_trace_call_function_interrupt(struct pt_regs *regs) { - call_function_entering_irq(); + smp_entering_irq(); trace_call_function_entry(CALL_FUNCTION_VECTOR); __smp_call_function_interrupt(); trace_call_function_exit(CALL_FUNCTION_VECTOR); @@ -312,14 +319,14 @@ static inline void __smp_call_function_single_interrupt(void) void smp_call_function_single_interrupt(struct pt_regs *regs) { - call_function_entering_irq(); + smp_entering_irq(); __smp_call_function_single_interrupt(); exiting_irq(); } void smp_trace_call_function_single_interrupt(struct pt_regs *regs) { - call_function_entering_irq(); + smp_entering_irq(); trace_call_function_single_entry(CALL_FUNCTION_SINGLE_VECTOR); __smp_call_function_single_interrupt(); trace_call_function_single_exit(CALL_FUNCTION_SINGLE_VECTOR); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/