RE: [PATCH v2] efi-pstore: Make efi-pstore return a unique id

2013-11-27 Thread Seiji Aguchi


> -Original Message-
> From: Matt Fleming [mailto:m...@console-pimps.org]
> Sent: Wednesday, November 27, 2013 6:45 AM
> To: Madper Xie
> Cc: 'linux-kernel@vger.kernel.org'; linux-...@vger.kernel.org; 
> matt.flem...@intel.com; Seiji Aguchi; rich...@nod.at; Tony Luck
> Subject: Re: [PATCH v2] efi-pstore: Make efi-pstore return a unique id
> 
> On Sat, 23 Nov, at 07:55:37PM, Madper Xie wrote:
> >
> > Pstore fs expects that backends provide a uniqued id which could avoid
> > pstore making entries as duplication or denominating entries the same
> > name. So I combine the timestamp, part and count into id.
> >
> > Signed-off-by: Madper Xie 
> > ---
> >  drivers/firmware/efi/efi-pstore.c | 19 +--
> >  1 file changed, 13 insertions(+), 6 deletions(-)
> 
> Thanks unless anyone complains, I'm going to apply this.
> 

I'm OK with this patch based on Madper's comment below.


So i'd like to using fixed length. My dell xps has 128kb nvram
space. and it at most store 126 pstore entries. So I think 1000 will be
a safe value for count. For part, 100 should be okay. I don't think
there will be a large kmsg and we must split it into more than 100 parts.


> Does this need to be tagged for stable?
> 

I think so.

Seiji

> --
> Matt Fleming, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v2] efi-pstore: Make efi-pstore return a unique id

2013-11-27 Thread Seiji Aguchi


 -Original Message-
 From: Matt Fleming [mailto:m...@console-pimps.org]
 Sent: Wednesday, November 27, 2013 6:45 AM
 To: Madper Xie
 Cc: 'linux-kernel@vger.kernel.org'; linux-...@vger.kernel.org; 
 matt.flem...@intel.com; Seiji Aguchi; rich...@nod.at; Tony Luck
 Subject: Re: [PATCH v2] efi-pstore: Make efi-pstore return a unique id
 
 On Sat, 23 Nov, at 07:55:37PM, Madper Xie wrote:
 
  Pstore fs expects that backends provide a uniqued id which could avoid
  pstore making entries as duplication or denominating entries the same
  name. So I combine the timestamp, part and count into id.
 
  Signed-off-by: Madper Xie c...@redhat.com
  ---
   drivers/firmware/efi/efi-pstore.c | 19 +--
   1 file changed, 13 insertions(+), 6 deletions(-)
 
 Thanks unless anyone complains, I'm going to apply this.
 

I'm OK with this patch based on Madper's comment below.

snip
So i'd like to using fixed length. My dell xps has 128kb nvram
space. and it at most store 126 pstore entries. So I think 1000 will be
a safe value for count. For part, 100 should be okay. I don't think
there will be a large kmsg and we must split it into more than 100 parts.
snip

 Does this need to be tagged for stable?
 

I think so.

Seiji

 --
 Matt Fleming, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [tip:x86/trace] x86, trace: Add page fault tracepoints

2013-11-11 Thread Seiji Aguchi

> Seiji - are you okay with this?  If so I'll just make a quick patch now.
>

I'm OK with it.

Seiji
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [tip:x86/trace] x86, trace: Add page fault tracepoints

2013-11-11 Thread Seiji Aguchi

 Seiji - are you okay with this?  If so I'll just make a quick patch now.


I'm OK with it.

Seiji
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/trace] x86, trace: Add page fault tracepoints

2013-11-08 Thread tip-bot for Seiji Aguchi
Commit-ID:  d34603b07c4255b2b00a546d34f297ccd50ae4c6
Gitweb: http://git.kernel.org/tip/d34603b07c4255b2b00a546d34f297ccd50ae4c6
Author: Seiji Aguchi 
AuthorDate: Wed, 30 Oct 2013 16:39:03 -0400
Committer:  H. Peter Anvin 
CommitDate: Fri, 8 Nov 2013 14:15:49 -0800

x86, trace: Add page fault tracepoints

This patch introduces page fault tracepoints to x86 architecture
by switching IDT.

  Two events, for user and kernel spaces, are introduced at the beginning
  of page fault handler for tracing.

  - User space event
There is a request of page fault event for user space as below.


https://lkml.kernel.org/r/1368079520-11015-2-git-send-email-fdeslaur+()+gmail+!+com

https://lkml.kernel.org/r/1368079520-11015-1-git-send-email-fdeslaur+()+gmail+!+com

  - Kernel space event:
When we measure an overhead in kernel space for investigating performance
issues, we can check if it comes from the page fault events.

Signed-off-by: Seiji Aguchi 
Link: http://lkml.kernel.org/r/52716e67.6090...@hds.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/trace/exceptions.h | 52 +
 arch/x86/mm/Makefile|  2 ++
 arch/x86/mm/fault.c | 13 +
 3 files changed, 67 insertions(+)

diff --git a/arch/x86/include/asm/trace/exceptions.h 
b/arch/x86/include/asm/trace/exceptions.h
new file mode 100644
index 000..86540c0
--- /dev/null
+++ b/arch/x86/include/asm/trace/exceptions.h
@@ -0,0 +1,52 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM exceptions
+
+#if !defined(_TRACE_PAGE_FAULT_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_PAGE_FAULT_H
+
+#include 
+
+extern void trace_irq_vector_regfunc(void);
+extern void trace_irq_vector_unregfunc(void);
+
+DECLARE_EVENT_CLASS(x86_exceptions,
+
+   TP_PROTO(unsigned long address, struct pt_regs *regs,
+unsigned long error_code),
+
+   TP_ARGS(address, regs, error_code),
+
+   TP_STRUCT__entry(
+   __field(unsigned long, address  )
+   __field(unsigned long, ip   )
+   __field(unsigned long, error_code )
+   ),
+
+   TP_fast_assign(
+   __entry->address = address;
+   __entry->ip = regs->ip;
+   __entry->error_code = error_code;
+   ),
+
+   TP_printk("address=%pf ip=%pf error_code=0x%lx",
+ (void *)__entry->address, (void *)__entry->ip,
+ __entry->error_code) );
+
+#define DEFINE_PAGE_FAULT_EVENT(name)  \
+DEFINE_EVENT_FN(x86_exceptions, name,  \
+   TP_PROTO(unsigned long address, struct pt_regs *regs,   \
+unsigned long error_code), \
+   TP_ARGS(address, regs, error_code), \
+   trace_irq_vector_regfunc,   \
+   trace_irq_vector_unregfunc);
+
+DEFINE_PAGE_FAULT_EVENT(user_page_fault);
+DEFINE_PAGE_FAULT_EVENT(kernel_page_fault);
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH .
+#define TRACE_INCLUDE_FILE exceptions
+#endif /*  _TRACE_PAGE_FAULT_H */
+
+/* This part must be outside protection */
+#include 
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 23d8e5f..6a19ad9 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -6,6 +6,8 @@ nostackp := $(call cc-option, -fno-stack-protector)
 CFLAGS_physaddr.o  := $(nostackp)
 CFLAGS_setup_nx.o  := $(nostackp)
 
+CFLAGS_fault.o := -I$(src)/../include/asm/trace
+
 obj-$(CONFIG_X86_PAT)  += pat_rbtree.o
 obj-$(CONFIG_SMP)  += tlb.o
 
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index fd3e281..f2730cbc 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -20,6 +20,9 @@
 #include  /* kmemcheck_*(), ...   */
 #include /* VSYSCALL_START   
*/
 
+#define CREATE_TRACE_POINTS
+#include 
+
 /*
  * Page fault error code bits:
  *
@@ -1232,12 +1235,22 @@ do_page_fault(struct pt_regs *regs, unsigned long 
error_code)
exception_exit(prev_state);
 }
 
+static void trace_page_fault_entries(struct pt_regs *regs,
+unsigned long error_code)
+{
+   if (user_mode(regs))
+   trace_user_page_fault(read_cr2(), regs, error_code);
+   else
+   trace_kernel_page_fault(read_cr2(), regs, error_code);
+}
+
 dotraplinkage void __kprobes
 trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
 {
enum ctx_state prev_state;
 
prev_state = exception_enter();
+   trace_page_fault_entries(regs, error_code);
__do_page_fault(regs, error_code);
exception_exit(prev_state);
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More 

[tip:x86/trace] x86, trace: Register exception handler to trace IDT

2013-11-08 Thread tip-bot for Seiji Aguchi
Commit-ID:  25c74b10bacead867478480170083f69cfc0db48
Gitweb: http://git.kernel.org/tip/25c74b10bacead867478480170083f69cfc0db48
Author: Seiji Aguchi 
AuthorDate: Wed, 30 Oct 2013 16:37:00 -0400
Committer:  H. Peter Anvin 
CommitDate: Fri, 8 Nov 2013 14:15:45 -0800

x86, trace: Register exception handler to trace IDT

This patch registers exception handlers for tracing to a trace IDT.

To implemented it in set_intr_gate(), this patch does followings.
 - Register the exception handlers to
   the trace IDT by prepending "trace_" to the handler's names.
 - Also, newly introduce trace_page_fault() to add tracepoints
   in a subsequent patch.

Signed-off-by: Seiji Aguchi 
Link: http://lkml.kernel.org/r/52716dec.5050...@hds.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/desc.h| 28 +++-
 arch/x86/include/asm/hw_irq.h  |  3 +++
 arch/x86/include/asm/segment.h |  3 +++
 arch/x86/include/asm/traps.h   | 20 
 arch/x86/kernel/entry_32.S | 10 ++
 arch/x86/kernel/entry_64.S | 13 -
 arch/x86/kernel/head64.c   |  2 +-
 arch/x86/kernel/kvm.c  |  2 +-
 arch/x86/kernel/traps.c| 28 ++--
 arch/x86/mm/fault.c| 10 ++
 10 files changed, 97 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index d939567..3d73437 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -327,10 +327,25 @@ static inline void write_trace_idt_entry(int entry, const 
gate_desc *gate)
 {
write_idt_entry(trace_idt_table, entry, gate);
 }
+
+static inline void _trace_set_gate(int gate, unsigned type, void *addr,
+  unsigned dpl, unsigned ist, unsigned seg)
+{
+   gate_desc s;
+
+   pack_gate(, type, (unsigned long)addr, dpl, ist, seg);
+   /*
+* does not need to be atomic because it is only done once at
+* setup time
+*/
+   write_trace_idt_entry(gate, );
+}
 #else
 static inline void write_trace_idt_entry(int entry, const gate_desc *gate)
 {
 }
+
+#define _trace_set_gate(gate, type, addr, dpl, ist, seg)
 #endif
 
 static inline void _set_gate(int gate, unsigned type, void *addr,
@@ -353,11 +368,14 @@ static inline void _set_gate(int gate, unsigned type, 
void *addr,
  * Pentium F0 0F bugfix can have resulted in the mapped
  * IDT being write-protected.
  */
-static inline void set_intr_gate(unsigned int n, void *addr)
-{
-   BUG_ON((unsigned)n > 0xFF);
-   _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS);
-}
+#define set_intr_gate(n, addr) \
+   do {\
+   BUG_ON((unsigned)n > 0xFF); \
+   _set_gate(n, GATE_INTERRUPT, (void *)addr, 0, 0,\
+ __KERNEL_CS); \
+   _trace_set_gate(n, GATE_INTERRUPT, (void *)trace_##addr,\
+   0, 0, __KERNEL_CS); \
+   } while (0)
 
 extern int first_system_vector;
 /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */
diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 92b3bae..cba45d9 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -187,6 +187,9 @@ extern __visible void smp_invalidate_interrupt(struct 
pt_regs *);
 #endif
 
 extern void (*__initconst interrupt[NR_VECTORS-FIRST_EXTERNAL_VECTOR])(void);
+#ifdef CONFIG_TRACING
+#define trace_interrupt interrupt
+#endif
 
 typedef int vector_irq_t[NR_VECTORS];
 DECLARE_PER_CPU(vector_irq_t, vector_irq);
diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h
index c48a950..6f1c3a8 100644
--- a/arch/x86/include/asm/segment.h
+++ b/arch/x86/include/asm/segment.h
@@ -214,6 +214,9 @@
 #ifdef __KERNEL__
 #ifndef __ASSEMBLY__
 extern const char early_idt_handlers[NUM_EXCEPTION_VECTORS][2+2+5];
+#ifdef CONFIG_TRACING
+#define trace_early_idt_handlers early_idt_handlers
+#endif
 
 /*
  * Load a segment. Fall back on loading the zero
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 7036cb6..58d66fe 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -37,6 +37,23 @@ asmlinkage void machine_check(void);
 #endif /* CONFIG_X86_MCE */
 asmlinkage void simd_coprocessor_error(void);
 
+#ifdef CONFIG_TRACING
+asmlinkage void trace_page_fault(void);
+#define trace_divide_error divide_error
+#define trace_bounds bounds
+#define trace_invalid_op invalid_op
+#define trace_device_not_available device_not_available
+#define trace_coprocessor_segment_overrun coprocessor_segment_overrun
+#define trace_invalid_TSS invalid_TSS
+#define trace_segment_not_present segment_not_present
+#define trace_general_protection general_pro

[tip:x86/trace] x86, trace: Delete __trace_alloc_intr_gate()

2013-11-08 Thread tip-bot for Seiji Aguchi
Commit-ID:  ac7956e2699380b8b10146ec2ba8cbe43a03ff7a
Gitweb: http://git.kernel.org/tip/ac7956e2699380b8b10146ec2ba8cbe43a03ff7a
Author: Seiji Aguchi 
AuthorDate: Wed, 30 Oct 2013 16:37:47 -0400
Committer:  H. Peter Anvin 
CommitDate: Fri, 8 Nov 2013 14:15:47 -0800

x86, trace: Delete __trace_alloc_intr_gate()

Currently irq vector handlers for tracing are registered in both set_intr_gate()
 and __trace_alloc_intr_gate() in alloc_intr_gate().
But, we don't need to do that twice.
So, let's delete __trace_alloc_intr_gate().

Signed-off-by: Seiji Aguchi 
Link: http://lkml.kernel.org/r/52716e1b.7090...@hds.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/desc.h | 22 --
 1 file changed, 22 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index 3d73437..50d033a 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -392,32 +392,10 @@ static inline void alloc_system_vector(int vector)
}
 }
 
-#ifdef CONFIG_TRACING
-static inline void trace_set_intr_gate(unsigned int gate, void *addr)
-{
-   gate_desc s;
-
-   pack_gate(, GATE_INTERRUPT, (unsigned long)addr, 0, 0, __KERNEL_CS);
-   write_idt_entry(trace_idt_table, gate, );
-}
-
-static inline void __trace_alloc_intr_gate(unsigned int n, void *addr)
-{
-   trace_set_intr_gate(n, addr);
-}
-#else
-static inline void trace_set_intr_gate(unsigned int gate, void *addr)
-{
-}
-
-#define __trace_alloc_intr_gate(n, addr)
-#endif
-
 #define alloc_intr_gate(n, addr)   \
do {\
alloc_system_vector(n); \
set_intr_gate(n, addr); \
-   __trace_alloc_intr_gate(n, trace_##addr);   \
} while (0)
 
 /*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/trace] x86, trace: Remove __alloc_intr_gate()

2013-11-08 Thread tip-bot for Seiji Aguchi
Commit-ID:  959c071f0974cda7702d7574647de7ad9259eb57
Gitweb: http://git.kernel.org/tip/959c071f0974cda7702d7574647de7ad9259eb57
Author: Seiji Aguchi 
AuthorDate: Wed, 30 Oct 2013 16:36:08 -0400
Committer:  H. Peter Anvin 
CommitDate: Fri, 8 Nov 2013 14:15:44 -0800

x86, trace: Remove __alloc_intr_gate()

Prepare to move set_intr_gate() into a macro by removing
__alloc_intr_gate().

The purpose is to avoid failing a kernel build after applying a
subsequent patch which changes set_intr_gate() into a macro.

Signed-off-by: Seiji Aguchi 
Link: http://lkml.kernel.org/r/52716db8.1080...@hds.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/desc.h | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index b90e5df..d939567 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -395,15 +395,10 @@ static inline void trace_set_intr_gate(unsigned int gate, 
void *addr)
 #define __trace_alloc_intr_gate(n, addr)
 #endif
 
-static inline void __alloc_intr_gate(unsigned int n, void *addr)
-{
-   set_intr_gate(n, addr);
-}
-
 #define alloc_intr_gate(n, addr)   \
do {\
alloc_system_vector(n); \
-   __alloc_intr_gate(n, addr); \
+   set_intr_gate(n, addr); \
__trace_alloc_intr_gate(n, trace_##addr);   \
} while (0)
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/trace] x86, trace: Delete __trace_alloc_intr_gate()

2013-11-08 Thread tip-bot for Seiji Aguchi
Commit-ID:  ac7956e2699380b8b10146ec2ba8cbe43a03ff7a
Gitweb: http://git.kernel.org/tip/ac7956e2699380b8b10146ec2ba8cbe43a03ff7a
Author: Seiji Aguchi seiji.agu...@hds.com
AuthorDate: Wed, 30 Oct 2013 16:37:47 -0400
Committer:  H. Peter Anvin h...@linux.intel.com
CommitDate: Fri, 8 Nov 2013 14:15:47 -0800

x86, trace: Delete __trace_alloc_intr_gate()

Currently irq vector handlers for tracing are registered in both set_intr_gate()
 and __trace_alloc_intr_gate() in alloc_intr_gate().
But, we don't need to do that twice.
So, let's delete __trace_alloc_intr_gate().

Signed-off-by: Seiji Aguchi seiji.agu...@hds.com
Link: http://lkml.kernel.org/r/52716e1b.7090...@hds.com
Signed-off-by: H. Peter Anvin h...@linux.intel.com
---
 arch/x86/include/asm/desc.h | 22 --
 1 file changed, 22 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index 3d73437..50d033a 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -392,32 +392,10 @@ static inline void alloc_system_vector(int vector)
}
 }
 
-#ifdef CONFIG_TRACING
-static inline void trace_set_intr_gate(unsigned int gate, void *addr)
-{
-   gate_desc s;
-
-   pack_gate(s, GATE_INTERRUPT, (unsigned long)addr, 0, 0, __KERNEL_CS);
-   write_idt_entry(trace_idt_table, gate, s);
-}
-
-static inline void __trace_alloc_intr_gate(unsigned int n, void *addr)
-{
-   trace_set_intr_gate(n, addr);
-}
-#else
-static inline void trace_set_intr_gate(unsigned int gate, void *addr)
-{
-}
-
-#define __trace_alloc_intr_gate(n, addr)
-#endif
-
 #define alloc_intr_gate(n, addr)   \
do {\
alloc_system_vector(n); \
set_intr_gate(n, addr); \
-   __trace_alloc_intr_gate(n, trace_##addr);   \
} while (0)
 
 /*
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/trace] x86, trace: Add page fault tracepoints

2013-11-08 Thread tip-bot for Seiji Aguchi
Commit-ID:  d34603b07c4255b2b00a546d34f297ccd50ae4c6
Gitweb: http://git.kernel.org/tip/d34603b07c4255b2b00a546d34f297ccd50ae4c6
Author: Seiji Aguchi seiji.agu...@hds.com
AuthorDate: Wed, 30 Oct 2013 16:39:03 -0400
Committer:  H. Peter Anvin h...@linux.intel.com
CommitDate: Fri, 8 Nov 2013 14:15:49 -0800

x86, trace: Add page fault tracepoints

This patch introduces page fault tracepoints to x86 architecture
by switching IDT.

  Two events, for user and kernel spaces, are introduced at the beginning
  of page fault handler for tracing.

  - User space event
There is a request of page fault event for user space as below.


https://lkml.kernel.org/r/1368079520-11015-2-git-send-email-fdeslaur+()+gmail+!+com

https://lkml.kernel.org/r/1368079520-11015-1-git-send-email-fdeslaur+()+gmail+!+com

  - Kernel space event:
When we measure an overhead in kernel space for investigating performance
issues, we can check if it comes from the page fault events.

Signed-off-by: Seiji Aguchi seiji.agu...@hds.com
Link: http://lkml.kernel.org/r/52716e67.6090...@hds.com
Signed-off-by: H. Peter Anvin h...@linux.intel.com
---
 arch/x86/include/asm/trace/exceptions.h | 52 +
 arch/x86/mm/Makefile|  2 ++
 arch/x86/mm/fault.c | 13 +
 3 files changed, 67 insertions(+)

diff --git a/arch/x86/include/asm/trace/exceptions.h 
b/arch/x86/include/asm/trace/exceptions.h
new file mode 100644
index 000..86540c0
--- /dev/null
+++ b/arch/x86/include/asm/trace/exceptions.h
@@ -0,0 +1,52 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM exceptions
+
+#if !defined(_TRACE_PAGE_FAULT_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_PAGE_FAULT_H
+
+#include linux/tracepoint.h
+
+extern void trace_irq_vector_regfunc(void);
+extern void trace_irq_vector_unregfunc(void);
+
+DECLARE_EVENT_CLASS(x86_exceptions,
+
+   TP_PROTO(unsigned long address, struct pt_regs *regs,
+unsigned long error_code),
+
+   TP_ARGS(address, regs, error_code),
+
+   TP_STRUCT__entry(
+   __field(unsigned long, address  )
+   __field(unsigned long, ip   )
+   __field(unsigned long, error_code )
+   ),
+
+   TP_fast_assign(
+   __entry-address = address;
+   __entry-ip = regs-ip;
+   __entry-error_code = error_code;
+   ),
+
+   TP_printk(address=%pf ip=%pf error_code=0x%lx,
+ (void *)__entry-address, (void *)__entry-ip,
+ __entry-error_code) );
+
+#define DEFINE_PAGE_FAULT_EVENT(name)  \
+DEFINE_EVENT_FN(x86_exceptions, name,  \
+   TP_PROTO(unsigned long address, struct pt_regs *regs,   \
+unsigned long error_code), \
+   TP_ARGS(address, regs, error_code), \
+   trace_irq_vector_regfunc,   \
+   trace_irq_vector_unregfunc);
+
+DEFINE_PAGE_FAULT_EVENT(user_page_fault);
+DEFINE_PAGE_FAULT_EVENT(kernel_page_fault);
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH .
+#define TRACE_INCLUDE_FILE exceptions
+#endif /*  _TRACE_PAGE_FAULT_H */
+
+/* This part must be outside protection */
+#include trace/define_trace.h
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 23d8e5f..6a19ad9 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -6,6 +6,8 @@ nostackp := $(call cc-option, -fno-stack-protector)
 CFLAGS_physaddr.o  := $(nostackp)
 CFLAGS_setup_nx.o  := $(nostackp)
 
+CFLAGS_fault.o := -I$(src)/../include/asm/trace
+
 obj-$(CONFIG_X86_PAT)  += pat_rbtree.o
 obj-$(CONFIG_SMP)  += tlb.o
 
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index fd3e281..f2730cbc 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -20,6 +20,9 @@
 #include asm/kmemcheck.h /* kmemcheck_*(), ...   */
 #include asm/fixmap.h/* VSYSCALL_START   
*/
 
+#define CREATE_TRACE_POINTS
+#include asm/trace/exceptions.h
+
 /*
  * Page fault error code bits:
  *
@@ -1232,12 +1235,22 @@ do_page_fault(struct pt_regs *regs, unsigned long 
error_code)
exception_exit(prev_state);
 }
 
+static void trace_page_fault_entries(struct pt_regs *regs,
+unsigned long error_code)
+{
+   if (user_mode(regs))
+   trace_user_page_fault(read_cr2(), regs, error_code);
+   else
+   trace_kernel_page_fault(read_cr2(), regs, error_code);
+}
+
 dotraplinkage void __kprobes
 trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
 {
enum ctx_state prev_state;
 
prev_state = exception_enter();
+   trace_page_fault_entries(regs, error_code);
__do_page_fault(regs, error_code);
exception_exit(prev_state);
 }
--
To unsubscribe

[tip:x86/trace] x86, trace: Register exception handler to trace IDT

2013-11-08 Thread tip-bot for Seiji Aguchi
Commit-ID:  25c74b10bacead867478480170083f69cfc0db48
Gitweb: http://git.kernel.org/tip/25c74b10bacead867478480170083f69cfc0db48
Author: Seiji Aguchi seiji.agu...@hds.com
AuthorDate: Wed, 30 Oct 2013 16:37:00 -0400
Committer:  H. Peter Anvin h...@linux.intel.com
CommitDate: Fri, 8 Nov 2013 14:15:45 -0800

x86, trace: Register exception handler to trace IDT

This patch registers exception handlers for tracing to a trace IDT.

To implemented it in set_intr_gate(), this patch does followings.
 - Register the exception handlers to
   the trace IDT by prepending trace_ to the handler's names.
 - Also, newly introduce trace_page_fault() to add tracepoints
   in a subsequent patch.

Signed-off-by: Seiji Aguchi seiji.agu...@hds.com
Link: http://lkml.kernel.org/r/52716dec.5050...@hds.com
Signed-off-by: H. Peter Anvin h...@linux.intel.com
---
 arch/x86/include/asm/desc.h| 28 +++-
 arch/x86/include/asm/hw_irq.h  |  3 +++
 arch/x86/include/asm/segment.h |  3 +++
 arch/x86/include/asm/traps.h   | 20 
 arch/x86/kernel/entry_32.S | 10 ++
 arch/x86/kernel/entry_64.S | 13 -
 arch/x86/kernel/head64.c   |  2 +-
 arch/x86/kernel/kvm.c  |  2 +-
 arch/x86/kernel/traps.c| 28 ++--
 arch/x86/mm/fault.c| 10 ++
 10 files changed, 97 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index d939567..3d73437 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -327,10 +327,25 @@ static inline void write_trace_idt_entry(int entry, const 
gate_desc *gate)
 {
write_idt_entry(trace_idt_table, entry, gate);
 }
+
+static inline void _trace_set_gate(int gate, unsigned type, void *addr,
+  unsigned dpl, unsigned ist, unsigned seg)
+{
+   gate_desc s;
+
+   pack_gate(s, type, (unsigned long)addr, dpl, ist, seg);
+   /*
+* does not need to be atomic because it is only done once at
+* setup time
+*/
+   write_trace_idt_entry(gate, s);
+}
 #else
 static inline void write_trace_idt_entry(int entry, const gate_desc *gate)
 {
 }
+
+#define _trace_set_gate(gate, type, addr, dpl, ist, seg)
 #endif
 
 static inline void _set_gate(int gate, unsigned type, void *addr,
@@ -353,11 +368,14 @@ static inline void _set_gate(int gate, unsigned type, 
void *addr,
  * Pentium F0 0F bugfix can have resulted in the mapped
  * IDT being write-protected.
  */
-static inline void set_intr_gate(unsigned int n, void *addr)
-{
-   BUG_ON((unsigned)n  0xFF);
-   _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS);
-}
+#define set_intr_gate(n, addr) \
+   do {\
+   BUG_ON((unsigned)n  0xFF); \
+   _set_gate(n, GATE_INTERRUPT, (void *)addr, 0, 0,\
+ __KERNEL_CS); \
+   _trace_set_gate(n, GATE_INTERRUPT, (void *)trace_##addr,\
+   0, 0, __KERNEL_CS); \
+   } while (0)
 
 extern int first_system_vector;
 /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */
diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 92b3bae..cba45d9 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -187,6 +187,9 @@ extern __visible void smp_invalidate_interrupt(struct 
pt_regs *);
 #endif
 
 extern void (*__initconst interrupt[NR_VECTORS-FIRST_EXTERNAL_VECTOR])(void);
+#ifdef CONFIG_TRACING
+#define trace_interrupt interrupt
+#endif
 
 typedef int vector_irq_t[NR_VECTORS];
 DECLARE_PER_CPU(vector_irq_t, vector_irq);
diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h
index c48a950..6f1c3a8 100644
--- a/arch/x86/include/asm/segment.h
+++ b/arch/x86/include/asm/segment.h
@@ -214,6 +214,9 @@
 #ifdef __KERNEL__
 #ifndef __ASSEMBLY__
 extern const char early_idt_handlers[NUM_EXCEPTION_VECTORS][2+2+5];
+#ifdef CONFIG_TRACING
+#define trace_early_idt_handlers early_idt_handlers
+#endif
 
 /*
  * Load a segment. Fall back on loading the zero
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 7036cb6..58d66fe 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -37,6 +37,23 @@ asmlinkage void machine_check(void);
 #endif /* CONFIG_X86_MCE */
 asmlinkage void simd_coprocessor_error(void);
 
+#ifdef CONFIG_TRACING
+asmlinkage void trace_page_fault(void);
+#define trace_divide_error divide_error
+#define trace_bounds bounds
+#define trace_invalid_op invalid_op
+#define trace_device_not_available device_not_available
+#define trace_coprocessor_segment_overrun coprocessor_segment_overrun
+#define trace_invalid_TSS invalid_TSS
+#define trace_segment_not_present

[tip:x86/trace] x86, trace: Remove __alloc_intr_gate()

2013-11-08 Thread tip-bot for Seiji Aguchi
Commit-ID:  959c071f0974cda7702d7574647de7ad9259eb57
Gitweb: http://git.kernel.org/tip/959c071f0974cda7702d7574647de7ad9259eb57
Author: Seiji Aguchi seiji.agu...@hds.com
AuthorDate: Wed, 30 Oct 2013 16:36:08 -0400
Committer:  H. Peter Anvin h...@linux.intel.com
CommitDate: Fri, 8 Nov 2013 14:15:44 -0800

x86, trace: Remove __alloc_intr_gate()

Prepare to move set_intr_gate() into a macro by removing
__alloc_intr_gate().

The purpose is to avoid failing a kernel build after applying a
subsequent patch which changes set_intr_gate() into a macro.

Signed-off-by: Seiji Aguchi seiji.agu...@hds.com
Link: http://lkml.kernel.org/r/52716db8.1080...@hds.com
Signed-off-by: H. Peter Anvin h...@linux.intel.com
---
 arch/x86/include/asm/desc.h | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index b90e5df..d939567 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -395,15 +395,10 @@ static inline void trace_set_intr_gate(unsigned int gate, 
void *addr)
 #define __trace_alloc_intr_gate(n, addr)
 #endif
 
-static inline void __alloc_intr_gate(unsigned int n, void *addr)
-{
-   set_intr_gate(n, addr);
-}
-
 #define alloc_intr_gate(n, addr)   \
do {\
alloc_system_vector(n); \
-   __alloc_intr_gate(n, addr); \
+   set_intr_gate(n, addr); \
__trace_alloc_intr_gate(n, trace_##addr);   \
} while (0)
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] Make efi-pstore return a unique id

2013-11-02 Thread Seiji Aguchi
> How does efivars backend handle "unlink(2)" in the pstore file system.
> pstore will call the backend->erase function passing the "id".  The
> backend should then erase the right record from persistent storage.
> 
> With the  ((timestamp * 100 + part) * 100 + count function - you can
> easily reverse it to find timestamp, part and count - would that make life
> easier for the backend to find the record to be erased?  If you use a
> hash function you will need to check each record and compute the
> hash to see if it matches (probably not a big deal because the backend
> will generally only hold a handful of records).

By generating the id in  efi_pstore_write(), and using it to a variable name,
It works at an erasing time as well.

The root cause of this problem is that efivars used "part" as id.
It was a wrong way. So, we should not keep it.

Seiji


N�Р骒r��yb�X�肚�v�^�)藓{.n�+�伐�{��赙zXФ�≤�}��财�z�:+v�����赙zZ+��+zf"�h���~i���z��wア�?�ㄨ��&�)撷f��^j谦y�m��@A�a囤�
0鹅h���i

RE: [PATCH] Make efi-pstore return a unique id

2013-11-02 Thread Seiji Aguchi
 How does efivars backend handle unlink(2) in the pstore file system.
 pstore will call the backend-erase function passing the id.  The
 backend should then erase the right record from persistent storage.
 
 With the  ((timestamp * 100 + part) * 100 + count function - you can
 easily reverse it to find timestamp, part and count - would that make life
 easier for the backend to find the record to be erased?  If you use a
 hash function you will need to check each record and compute the
 hash to see if it matches (probably not a big deal because the backend
 will generally only hold a handful of records).

By generating the id in  efi_pstore_write(), and using it to a variable name,
It works at an erasing time as well.

The root cause of this problem is that efivars used part as id.
It was a wrong way. So, we should not keep it.

Seiji


N�Р骒r��yb�X�肚�v�^�)藓{.n�+�伐�{��赙zXФ�≤�}��财�z�j:+v�����赙zZ+��+zf"�h���~i���z��wア�?�ㄨ���)撷f��^j谦y�m��@A�a囤�
0鹅h���i

RE: [PATCH] Make efi-pstore return a unique id

2013-11-01 Thread Seiji Aguchi
> What about feeding the bytes of all three integers into a non-cryptographic 
> hash function?
> Using this way you get a cheap unique id.

It is reasonable to me.

Seiji


RE: [PATCH] Make efi-pstore return a unique id

2013-11-01 Thread Seiji Aguchi
> >> +{
> >> +   char id_str[64];
> >> +   u64 id = 0;
> >> +
> >> +   sprintf(id_str, "%lu%u%d", timestamp, part, count);
> >> +   if (kstrtoull(id_str, 10, ))
> >> +   pr_warn("efi-pstore: failed to generate id\n");
> >> +   return id;
> >> +}
> >
> > This is just odd. You make a string from three ints and then a parse
> > it to a int again.
> 
> Agreed.  I liked your ((timestamp * 100 + part) * 100 + count function much
> more than this.

I was worried that the part and count could be more than 100.
If it happens, the id may not be unique...

But, currently, size of nvram storage is limited, so it is a corner case.
I respect your opinion.

> @@ -125,9 +137,11 @@ static int efi_pstore_write(enum pstore_type_id type,
>   efi_char16_t efi_name[DUMP_NAME_LEN];
>   efi_guid_t vendor = LINUX_EFI_CRASH_GUID;
>   int i, ret = 0;
> + unsigned long timestamp;
> 
> + timestamp = get_seconds();
>   sprintf(name, "dump-type%u-%u-%d-%lu-%c", type, part, count,
> - get_seconds(), compressed ? 'C' : 'D');
> + timestamp, compressed ? 'C' : 'D');
> 

I don't think you need to change efi_pstore_write().
It is just add a local timestamp variable.

Seiji

>   for (i = 0; i < DUMP_NAME_LEN; i++)
>   efi_name[i] = name[i];
> --
> 1.8.4.2


RE: [PATCH] Make efi-pstore return a unique id

2013-11-01 Thread Seiji Aguchi
  +{
  +   char id_str[64];
  +   u64 id = 0;
  +
  +   sprintf(id_str, %lu%u%d, timestamp, part, count);
  +   if (kstrtoull(id_str, 10, id))
  +   pr_warn(efi-pstore: failed to generate id\n);
  +   return id;
  +}
 
  This is just odd. You make a string from three ints and then a parse
  it to a int again.
 
 Agreed.  I liked your ((timestamp * 100 + part) * 100 + count function much
 more than this.

I was worried that the part and count could be more than 100.
If it happens, the id may not be unique...

But, currently, size of nvram storage is limited, so it is a corner case.
I respect your opinion.

 @@ -125,9 +137,11 @@ static int efi_pstore_write(enum pstore_type_id type,
   efi_char16_t efi_name[DUMP_NAME_LEN];
   efi_guid_t vendor = LINUX_EFI_CRASH_GUID;
   int i, ret = 0;
 + unsigned long timestamp;
 
 + timestamp = get_seconds();
   sprintf(name, dump-type%u-%u-%d-%lu-%c, type, part, count,
 - get_seconds(), compressed ? 'C' : 'D');
 + timestamp, compressed ? 'C' : 'D');
 

I don't think you need to change efi_pstore_write().
It is just add a local timestamp variable.

Seiji

   for (i = 0; i  DUMP_NAME_LEN; i++)
   efi_name[i] = name[i];
 --
 1.8.4.2


RE: [PATCH] Make efi-pstore return a unique id

2013-11-01 Thread Seiji Aguchi
 What about feeding the bytes of all three integers into a non-cryptographic 
 hash function?
 Using this way you get a cheap unique id.

It is reasonable to me.

Seiji


RE: [PATCH 0/2] make all stored entries accessible.

2013-10-31 Thread Seiji Aguchi
> I also like option 1 ... but I think the "id" should be a persistent value for
> a given saved record.  So some func(timestamp, part, count) would be a
> good idea.  If we try using "sequential" numbers - and don't manage to
> clear out /sys/fs/pstore each time - then we may have the same "dmesg"
> file show up with different names on each boot.
> 
> Right now I have a simple script to save & clear ... not much more
> complex than:
> 
>   cd /sys/fs/pstore
>   cp * /var/log/save-pstore
>   rm *
> 
> This depends on not re-using filenames (otherwise new files in pstore
> might overwrite older saved files in my /var/log/save-pstore area).

I see.. It is a persuasive use case.

(1) I agree that some func(timestamp, part, count) would be  good idea.
  This might work, although an overflow will happen some time...
  sprintf(id_str, "%lld%d%d", timestamp, part, count)
  simple_str_to_ull(id_str, , base)
 
(2) There is a GetNextHighMonotonicCount() runtime service in EFI specification
   to get a persistent number across the reboot, but I'm not sure if it is 
safe to use it..
  Also,  it would be good if we can create the id by ourselves, rather than 
using firmware.

 (3) Also, (it might not be good idea),  if a pstore filesystem expects all 
backend drivers
   to use the persistent id, the pstore should provide it by itself.
   (by using time stamp counter or something like that.)

   As I looked into the ramoops's code. It seems to use a non-persistent 
sequential counter,by initializing
   read_cnt to "0" in ramoops_pstore_open(), and incrementing it in 
ramoops_pstore_read().
   It doesn't seem to be the pstore's expectation. 
   And when someone introduces a new driver, they may misunderstand how to 
create the id as well..

As above, there are mutiple ideas, but (1) is reasonable to me.
 
Seiji




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 0/2] make all stored entries accessible.

2013-10-31 Thread Seiji Aguchi


> 1. combine timestamp, count and part into "id".
>for now, in efi-pstore.c, *id = part. and we could simply change it
>to unique one. F.E. *id = (timestamp * 100 + part) * 100 + count.

My opinion close to 1.
But, the "*id" should not be the complex one like (timestamp * 100 + part) * 
100 + count.
Rather, it should be a simple sequential number beginning with 1.

 - Remove "id" member from pstore_read_info struct.
 -  Introduce a global sequential counter like "static u64 
efi_pstore_read_count" (or add the member to pstore_info structure?)
 - Initialize to "1" in efi_pstore_open().
 - Increment it in efi_pstore_read().

If we can do it, we don't need to touch a code of pstore filesystem and can 
avoid regressions of
other backend drivers.

Seiji

> -Original Message-
> From: linux-efi-ow...@vger.kernel.org 
> [mailto:linux-efi-ow...@vger.kernel.org] On Behalf Of Madper Xie
> Sent: Wednesday, October 30, 2013 11:01 PM
> To: Luck, Tony
> Cc: Seiji Aguchi; Madper Xie; keesc...@chromium.org; ccr...@android.com; 
> an...@enomsg.org; linux-...@vger.kernel.org; linux-
> ker...@vger.kernel.org; bbbo...@gmail.com
> Subject: Re: [PATCH 0/2] make all stored entries accessible.
> 
> 
> tony.l...@intel.com writes:
> 
> >> So, do you mean efivars should fix to use the "id" in a proper way?
> >
> > It would avoid the need for all these tests, and additions to the filename 
> > to guarantee
> > uniqueness.
> >
> > Not sure what options efivars has to create a unique, persistent "id" for 
> > each
> > record.  It's a fundamental part of how ERST works (though the unique ID is 
> > just
> > based around a timestamp).
> >
> Okay, maybe there are three options here:
> 1. combine timestamp, count and part into "id".
>for now, in efi-pstore.c, *id = part. and we could simply change it
>to unique one. F.E. *id = (timestamp * 100 + part) * 100 + count.
> 2. change the id's type. let id become a string.
>so every backend could write anything to id. then it will become a
>part of filename in pstore filesystem. (but we need fix all backends
>since we modified api.)
> 3. apply the patches I have sent... even if the filename will be ugly
>and gory...
> Which one do you prefer?
> >> I acked Madper's patch 2/2 earlier today, but when I look at your test 
> >> result, I'm not sure if
> >> it is reasonable for users to make multiple numbers visible to the file 
> >> name.
> >>
> >>> -r--r--r-- 1 root root 17499 Oct 30 13:41 
> >>> dmesg-erst-5940651313304961029--2129078373-1383165669
> >
> > after I added the "count = 0" initialization the filename gets a tiny bit 
> > less
> > scary:
> >
> > -r--r--r-- 1 root root 17499 Oct 30 13:41 
> > dmesg-erst-5940651313304961029-0-1383165669
> >
> > -Tony
> 
> 
> --
> Best,
> Madper Xie.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-efi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 0/2] make all stored entries accessible.

2013-10-31 Thread Seiji Aguchi


 1. combine timestamp, count and part into id.
for now, in efi-pstore.c, *id = part. and we could simply change it
to unique one. F.E. *id = (timestamp * 100 + part) * 100 + count.

My opinion close to 1.
But, the *id should not be the complex one like (timestamp * 100 + part) * 
100 + count.
Rather, it should be a simple sequential number beginning with 1.

 - Remove id member from pstore_read_info struct.
 -  Introduce a global sequential counter like static u64 
efi_pstore_read_count (or add the member to pstore_info structure?)
 - Initialize to 1 in efi_pstore_open().
 - Increment it in efi_pstore_read().

If we can do it, we don't need to touch a code of pstore filesystem and can 
avoid regressions of
other backend drivers.

Seiji

 -Original Message-
 From: linux-efi-ow...@vger.kernel.org 
 [mailto:linux-efi-ow...@vger.kernel.org] On Behalf Of Madper Xie
 Sent: Wednesday, October 30, 2013 11:01 PM
 To: Luck, Tony
 Cc: Seiji Aguchi; Madper Xie; keesc...@chromium.org; ccr...@android.com; 
 an...@enomsg.org; linux-...@vger.kernel.org; linux-
 ker...@vger.kernel.org; bbbo...@gmail.com
 Subject: Re: [PATCH 0/2] make all stored entries accessible.
 
 
 tony.l...@intel.com writes:
 
  So, do you mean efivars should fix to use the id in a proper way?
 
  It would avoid the need for all these tests, and additions to the filename 
  to guarantee
  uniqueness.
 
  Not sure what options efivars has to create a unique, persistent id for 
  each
  record.  It's a fundamental part of how ERST works (though the unique ID is 
  just
  based around a timestamp).
 
 Okay, maybe there are three options here:
 1. combine timestamp, count and part into id.
for now, in efi-pstore.c, *id = part. and we could simply change it
to unique one. F.E. *id = (timestamp * 100 + part) * 100 + count.
 2. change the id's type. let id become a string.
so every backend could write anything to id. then it will become a
part of filename in pstore filesystem. (but we need fix all backends
since we modified api.)
 3. apply the patches I have sent... even if the filename will be ugly
and gory...
 Which one do you prefer?
  I acked Madper's patch 2/2 earlier today, but when I look at your test 
  result, I'm not sure if
  it is reasonable for users to make multiple numbers visible to the file 
  name.
 
  -r--r--r-- 1 root root 17499 Oct 30 13:41 
  dmesg-erst-5940651313304961029--2129078373-1383165669
 
  after I added the count = 0 initialization the filename gets a tiny bit 
  less
  scary:
 
  -r--r--r-- 1 root root 17499 Oct 30 13:41 
  dmesg-erst-5940651313304961029-0-1383165669
 
  -Tony
 
 
 --
 Best,
 Madper Xie.
 --
 To unsubscribe from this list: send the line unsubscribe linux-efi in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 0/2] make all stored entries accessible.

2013-10-31 Thread Seiji Aguchi
 I also like option 1 ... but I think the id should be a persistent value for
 a given saved record.  So some func(timestamp, part, count) would be a
 good idea.  If we try using sequential numbers - and don't manage to
 clear out /sys/fs/pstore each time - then we may have the same dmesg
 file show up with different names on each boot.
 
 Right now I have a simple script to save  clear ... not much more
 complex than:
 
   cd /sys/fs/pstore
   cp * /var/log/save-pstore
   rm *
 
 This depends on not re-using filenames (otherwise new files in pstore
 might overwrite older saved files in my /var/log/save-pstore area).

I see.. It is a persuasive use case.

(1) I agree that some func(timestamp, part, count) would be  good idea.
  This might work, although an overflow will happen some time...
  sprintf(id_str, %lld%d%d, timestamp, part, count)
  simple_str_to_ull(id_str, id, base)
 
(2) There is a GetNextHighMonotonicCount() runtime service in EFI specification
   to get a persistent number across the reboot, but I'm not sure if it is 
safe to use it..
  Also,  it would be good if we can create the id by ourselves, rather than 
using firmware.

 (3) Also, (it might not be good idea),  if a pstore filesystem expects all 
backend drivers
   to use the persistent id, the pstore should provide it by itself.
   (by using time stamp counter or something like that.)

   As I looked into the ramoops's code. It seems to use a non-persistent 
sequential counter,by initializing
   read_cnt to 0 in ramoops_pstore_open(), and incrementing it in 
ramoops_pstore_read().
   It doesn't seem to be the pstore's expectation. 
   And when someone introduces a new driver, they may misunderstand how to 
create the id as well..

As above, there are mutiple ideas, but (1) is reasonable to me.
 
Seiji




--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 0/2] make all stored entries accessible.

2013-10-30 Thread Seiji Aguchi
> Ah - I was expecting that the backend driver would have a unique "id" for
> each record stored ... but is seems that this isn't true for efivars.
> 

So, do you mean efivars should fix to use the "id" in a proper way?

I acked Madper's patch 2/2 earlier today, but when I look at your test result, 
I'm not sure if
it is reasonable for users to make multiple numbers visible to the file name.

> -r--r--r-- 1 root root 17499 Oct 30 13:41 
> dmesg-erst-5940651313304961029--2129078373-1383165669

Seiji
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 3/4] Delete __trace_alloc_intr_gate()

2013-10-30 Thread Seiji Aguchi
Currently irq vector handlers for tracing are registered in both set_intr_gate()
 and __trace_alloc_intr_gate() in alloc_intr_gate().
But, we don't need to do that twice.
So, let's delete __trace_alloc_intr_gate().

Signed-off-by: Seiji Aguchi 
---
 arch/x86/include/asm/desc.h | 22 --
 1 file changed, 22 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index 3d73437..50d033a 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -392,32 +392,10 @@ static inline void alloc_system_vector(int vector)
}
 }
 
-#ifdef CONFIG_TRACING
-static inline void trace_set_intr_gate(unsigned int gate, void *addr)
-{
-   gate_desc s;
-
-   pack_gate(, GATE_INTERRUPT, (unsigned long)addr, 0, 0, __KERNEL_CS);
-   write_idt_entry(trace_idt_table, gate, );
-}
-
-static inline void __trace_alloc_intr_gate(unsigned int n, void *addr)
-{
-   trace_set_intr_gate(n, addr);
-}
-#else
-static inline void trace_set_intr_gate(unsigned int gate, void *addr)
-{
-}
-
-#define __trace_alloc_intr_gate(n, addr)
-#endif
-
 #define alloc_intr_gate(n, addr)   \
do {\
alloc_system_vector(n); \
set_intr_gate(n, addr); \
-   __trace_alloc_intr_gate(n, trace_##addr);   \
} while (0)
 
 /*
-- 
1.8.3.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 4/4] Add page fault tracepoints

2013-10-30 Thread Seiji Aguchi
This patch introduces page fault tracepoints to x86 architecture
by switching IDT.

  Two events, for user and kernel spaces, are introduced at the beginning
  of page fault handler for tracing.

  - User space event
There is a request of page fault event for user space as below.


https://lkml.kernel.org/r/1368079520-11015-2-git-send-email-fdeslaur+()+gmail+!+com

https://lkml.kernel.org/r/1368079520-11015-1-git-send-email-fdeslaur+()+gmail+!+com

  - Kernel space event:
When we meature an overhead in kernel space for investigating performance
issues, we can check if it comes from the page fault events.

Signed-off-by: Seiji Aguchi 
---
 arch/x86/include/asm/trace/exceptions.h | 52 +
 arch/x86/mm/Makefile|  2 ++
 arch/x86/mm/fault.c | 13 +
 3 files changed, 67 insertions(+)
 create mode 100644 arch/x86/include/asm/trace/exceptions.h

diff --git a/arch/x86/include/asm/trace/exceptions.h 
b/arch/x86/include/asm/trace/exceptions.h
new file mode 100644
index 000..86540c0
--- /dev/null
+++ b/arch/x86/include/asm/trace/exceptions.h
@@ -0,0 +1,52 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM exceptions
+
+#if !defined(_TRACE_PAGE_FAULT_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_PAGE_FAULT_H
+
+#include 
+
+extern void trace_irq_vector_regfunc(void);
+extern void trace_irq_vector_unregfunc(void);
+
+DECLARE_EVENT_CLASS(x86_exceptions,
+
+   TP_PROTO(unsigned long address, struct pt_regs *regs,
+unsigned long error_code),
+
+   TP_ARGS(address, regs, error_code),
+
+   TP_STRUCT__entry(
+   __field(unsigned long, address  )
+   __field(unsigned long, ip   )
+   __field(unsigned long, error_code )
+   ),
+
+   TP_fast_assign(
+   __entry->address = address;
+   __entry->ip = regs->ip;
+   __entry->error_code = error_code;
+   ),
+
+   TP_printk("address=%pf ip=%pf error_code=0x%lx",
+ (void *)__entry->address, (void *)__entry->ip,
+ __entry->error_code) );
+
+#define DEFINE_PAGE_FAULT_EVENT(name)  \
+DEFINE_EVENT_FN(x86_exceptions, name,  \
+   TP_PROTO(unsigned long address, struct pt_regs *regs,   \
+unsigned long error_code), \
+   TP_ARGS(address, regs, error_code), \
+   trace_irq_vector_regfunc,   \
+   trace_irq_vector_unregfunc);
+
+DEFINE_PAGE_FAULT_EVENT(user_page_fault);
+DEFINE_PAGE_FAULT_EVENT(kernel_page_fault);
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH .
+#define TRACE_INCLUDE_FILE exceptions
+#endif /*  _TRACE_PAGE_FAULT_H */
+
+/* This part must be outside protection */
+#include 
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 23d8e5f..6a19ad9 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -6,6 +6,8 @@ nostackp := $(call cc-option, -fno-stack-protector)
 CFLAGS_physaddr.o  := $(nostackp)
 CFLAGS_setup_nx.o  := $(nostackp)
 
+CFLAGS_fault.o := -I$(src)/../include/asm/trace
+
 obj-$(CONFIG_X86_PAT)  += pat_rbtree.o
 obj-$(CONFIG_SMP)  += tlb.o
 
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index fd3e281..f2730cbc 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -20,6 +20,9 @@
 #include  /* kmemcheck_*(), ...   */
 #include /* VSYSCALL_START   
*/
 
+#define CREATE_TRACE_POINTS
+#include 
+
 /*
  * Page fault error code bits:
  *
@@ -1232,12 +1235,22 @@ do_page_fault(struct pt_regs *regs, unsigned long 
error_code)
exception_exit(prev_state);
 }
 
+static void trace_page_fault_entries(struct pt_regs *regs,
+unsigned long error_code)
+{
+   if (user_mode(regs))
+   trace_user_page_fault(read_cr2(), regs, error_code);
+   else
+   trace_kernel_page_fault(read_cr2(), regs, error_code);
+}
+
 dotraplinkage void __kprobes
 trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
 {
enum ctx_state prev_state;
 
prev_state = exception_enter();
+   trace_page_fault_entries(regs, error_code);
__do_page_fault(regs, error_code);
exception_exit(prev_state);
 }
-- 
1.8.3.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 2/4] Register exception handler to trace IDT

2013-10-30 Thread Seiji Aguchi
This patch registers exception handlers for tracing to a trace IDT.

To implemented it in set_intr_gate(), this patch does followings.
 - Register the exception handlers to
   the trace IDT by prepending "trace_" to the handler's names.
 - Also, newly introduce trace_page_fault() to add tracepoints
   in a subsequent patch.

Signed-off-by: Seiji Aguchi 
---
 arch/x86/include/asm/desc.h| 28 +++-
 arch/x86/include/asm/hw_irq.h  |  3 +++
 arch/x86/include/asm/segment.h |  3 +++
 arch/x86/include/asm/traps.h   | 20 
 arch/x86/kernel/entry_32.S | 10 ++
 arch/x86/kernel/entry_64.S | 13 -
 arch/x86/kernel/head64.c   |  2 +-
 arch/x86/kernel/kvm.c  |  2 +-
 arch/x86/kernel/traps.c| 28 ++--
 arch/x86/mm/fault.c| 10 ++
 10 files changed, 97 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index d939567..3d73437 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -327,10 +327,25 @@ static inline void write_trace_idt_entry(int entry, const 
gate_desc *gate)
 {
write_idt_entry(trace_idt_table, entry, gate);
 }
+
+static inline void _trace_set_gate(int gate, unsigned type, void *addr,
+  unsigned dpl, unsigned ist, unsigned seg)
+{
+   gate_desc s;
+
+   pack_gate(, type, (unsigned long)addr, dpl, ist, seg);
+   /*
+* does not need to be atomic because it is only done once at
+* setup time
+*/
+   write_trace_idt_entry(gate, );
+}
 #else
 static inline void write_trace_idt_entry(int entry, const gate_desc *gate)
 {
 }
+
+#define _trace_set_gate(gate, type, addr, dpl, ist, seg)
 #endif
 
 static inline void _set_gate(int gate, unsigned type, void *addr,
@@ -353,11 +368,14 @@ static inline void _set_gate(int gate, unsigned type, 
void *addr,
  * Pentium F0 0F bugfix can have resulted in the mapped
  * IDT being write-protected.
  */
-static inline void set_intr_gate(unsigned int n, void *addr)
-{
-   BUG_ON((unsigned)n > 0xFF);
-   _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS);
-}
+#define set_intr_gate(n, addr) \
+   do {\
+   BUG_ON((unsigned)n > 0xFF); \
+   _set_gate(n, GATE_INTERRUPT, (void *)addr, 0, 0,\
+ __KERNEL_CS); \
+   _trace_set_gate(n, GATE_INTERRUPT, (void *)trace_##addr,\
+   0, 0, __KERNEL_CS); \
+   } while (0)
 
 extern int first_system_vector;
 /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */
diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 92b3bae..cba45d9 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -187,6 +187,9 @@ extern __visible void smp_invalidate_interrupt(struct 
pt_regs *);
 #endif
 
 extern void (*__initconst interrupt[NR_VECTORS-FIRST_EXTERNAL_VECTOR])(void);
+#ifdef CONFIG_TRACING
+#define trace_interrupt interrupt
+#endif
 
 typedef int vector_irq_t[NR_VECTORS];
 DECLARE_PER_CPU(vector_irq_t, vector_irq);
diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h
index c48a950..6f1c3a8 100644
--- a/arch/x86/include/asm/segment.h
+++ b/arch/x86/include/asm/segment.h
@@ -214,6 +214,9 @@
 #ifdef __KERNEL__
 #ifndef __ASSEMBLY__
 extern const char early_idt_handlers[NUM_EXCEPTION_VECTORS][2+2+5];
+#ifdef CONFIG_TRACING
+#define trace_early_idt_handlers early_idt_handlers
+#endif
 
 /*
  * Load a segment. Fall back on loading the zero
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 7036cb6..58d66fe 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -37,6 +37,23 @@ asmlinkage void machine_check(void);
 #endif /* CONFIG_X86_MCE */
 asmlinkage void simd_coprocessor_error(void);
 
+#ifdef CONFIG_TRACING
+asmlinkage void trace_page_fault(void);
+#define trace_divide_error divide_error
+#define trace_bounds bounds
+#define trace_invalid_op invalid_op
+#define trace_device_not_available device_not_available
+#define trace_coprocessor_segment_overrun coprocessor_segment_overrun
+#define trace_invalid_TSS invalid_TSS
+#define trace_segment_not_present segment_not_present
+#define trace_general_protection general_protection
+#define trace_spurious_interrupt_bug spurious_interrupt_bug
+#define trace_coprocessor_error coprocessor_error
+#define trace_alignment_check alignment_check
+#define trace_simd_coprocessor_error simd_coprocessor_error
+#define trace_async_page_fault async_page_fault
+#endif
+
 dotraplinkage void do_divide_error(struct pt_regs *, long);
 dotraplinkage void do_debug(struct pt_regs *, long);
 dotraplinkage vo

[PATCH v4 0/4] Introduce page fault tracepoints

2013-10-30 Thread Seiji Aguchi
Change from v3:
 - Separate modifications to make review easy.
 - Refactor implementations registering exception/irq_vector
   handers. (Patch 1, 2, 3)

This series introduce page fault tracepoints.

Detailed descriptions are explained in each patch.
Any comments are welcome.

Seiji Aguchi (4):
  Move set_intr_gate() into macro
  Register exception handler to trace IDT
  Delete __trace_alloc_intr_gate()
  Add page fault tracepoints

 arch/x86/include/asm/desc.h | 57 ++---
 arch/x86/include/asm/hw_irq.h   |  3 ++
 arch/x86/include/asm/segment.h  |  3 ++
 arch/x86/include/asm/trace/exceptions.h | 52 ++
 arch/x86/include/asm/traps.h| 20 
 arch/x86/kernel/entry_32.S  | 10 ++
 arch/x86/kernel/entry_64.S  | 13 +++-
 arch/x86/kernel/head64.c|  2 +-
 arch/x86/kernel/kvm.c   |  2 +-
 arch/x86/kernel/traps.c | 28 
 arch/x86/mm/Makefile|  2 ++
 arch/x86/mm/fault.c | 23 +
 12 files changed, 165 insertions(+), 50 deletions(-)
 create mode 100644 arch/x86/include/asm/trace/exceptions.h

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 1/4] Move set_intr_gate() into macro

2013-10-30 Thread Seiji Aguchi
Move set_intr_gate() into a macro by removing __alloc_intr_gate().

The purpose is to avoid failing a kernel build after applying
a subsequent patch which changes set_intr_gate() to macro.

Signed-off-by: Seiji Aguchi 
---
 arch/x86/include/asm/desc.h | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index b90e5df..d939567 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -395,15 +395,10 @@ static inline void trace_set_intr_gate(unsigned int gate, 
void *addr)
 #define __trace_alloc_intr_gate(n, addr)
 #endif
 
-static inline void __alloc_intr_gate(unsigned int n, void *addr)
-{
-   set_intr_gate(n, addr);
-}
-
 #define alloc_intr_gate(n, addr)   \
do {\
alloc_system_vector(n); \
-   __alloc_intr_gate(n, addr); \
+   set_intr_gate(n, addr); \
__trace_alloc_intr_gate(n, trace_##addr);   \
} while (0)
 
-- 
1.8.3.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4] efivars,efi-pstore: Hold off deletion of sysfs entry until the scan is completed

2013-10-30 Thread Seiji Aguchi
Change from v3:
- Introduce EFI_VARS_DATA_SIZE_MAX macro.
- Add some description why a scanning/deleting logic is needed.

Currently, when mounting pstore file system, a read callback of efi_pstore
driver runs mutiple times as below.

- In the first read callback, scan efivar_sysfs_list from head and pass
  a kmsg buffer of a entry to an upper pstore layer.
- In the second read callback, rescan efivar_sysfs_list from the entry and pass
  another kmsg buffer to it.
- Repeat the scan and pass until the end of efivar_sysfs_list.

In this process, an entry is read across the multiple read function calls.
To avoid race between the read and erasion, the whole process above is
protected by a spinlock, holding in open() and releasing in close().

At the same time, kmemdup() is called to pass the buffer to pstore filesystem
during it.
And then, it causes a following lockdep warning.

To make the dynamic memory allocation runnable without taking spinlock,
holding off a deletion of sysfs entry if it happens while scanning it
via efi_pstore, and deleting it after the scan is completed.

To implement it, this patch introduces two flags, scanning and deleting,
to efivar_entry.

On the code basis, it seems that all the scanning and deleting logic is not
needed because __efivars->lock are not dropped when reading from the EFI
variable store.

But, the scanning and deleting logic is still needed because an efi-pstore and
a pstore filesystem works as follows.

In case an entry(A) is found, the pointer is saved to psi->data.
And efi_pstore_read() passes the entry(A) to a pstore filesystem by
releasing  __efivars->lock.

And then, the pstore filesystem calls efi_pstore_read() again and the same
entry(A), which is saved to psi->data, is used for resuming to scan
a sysfs-list.

So, to protect the entry(A), the logic is needed.

[1.143710] [ cut here ]
[1.144058] WARNING: CPU: 1 PID: 1 at kernel/lockdep.c:2740
lockdep_trace_alloc+0x104/0x110()
[1.144058] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
[1.144058] Modules linked in:

[1.144058] CPU: 1 PID: 1 Comm: systemd Not tainted 3.11.0-rc5 #2
[1.144058]  0009 8800797e9ae0 816614a5
8800797e9b28
[1.144058]  8800797e9b18 8105510d 0080
0046
[1.144058]  00d0 03af 81ccd0c0
8800797e9b78
[1.144058] Call Trace:
[1.144058]  [] dump_stack+0x54/0x74
[1.144058]  [] warn_slowpath_common+0x7d/0xa0
[1.144058]  [] warn_slowpath_fmt+0x4c/0x50
[1.144058]  [] ? vsscanf+0x57f/0x7b0
[1.144058]  [] lockdep_trace_alloc+0x104/0x110
[1.144058]  [] __kmalloc_track_caller+0x50/0x280
[1.144058]  [] ?
efi_pstore_read_func.part.1+0x12b/0x170
[1.144058]  [] kmemdup+0x20/0x50
[1.144058]  [] efi_pstore_read_func.part.1+0x12b/0x170
[1.144058]  [] ?
efi_pstore_read_func.part.1+0x170/0x170
[1.144058]  [] efi_pstore_read_func+0xb4/0xe0
[1.144058]  [] __efivar_entry_iter+0xfb/0x120
[1.144058]  [] efi_pstore_read+0x3f/0x50
[1.144058]  [] pstore_get_records+0x9a/0x150
[1.158207]  [] ? selinux_d_instantiate+0x1c/0x20
[1.158207]  [] ? parse_options+0x80/0x80
[1.158207]  [] pstore_fill_super+0xa5/0xc0
[1.158207]  [] mount_single+0xa2/0xd0
[1.158207]  [] pstore_mount+0x18/0x20
[1.158207]  [] mount_fs+0x39/0x1b0
[1.158207]  [] ? __alloc_percpu+0x10/0x20
[1.158207]  [] vfs_kern_mount+0x63/0xf0
[1.158207]  [] do_mount+0x23e/0xa20
[1.158207]  [] ? strndup_user+0x4b/0xf0
[1.158207]  [] SyS_mount+0x83/0xc0
[1.158207]  [] system_call_fastpath+0x16/0x1b
[1.158207] ---[ end trace 61981bc62de9f6f4 ]---

Signed-off-by: Seiji Aguchi 
---
 drivers/firmware/efi/efi-pstore.c | 143 +++---
 drivers/firmware/efi/efivars.c|  12 ++--
 drivers/firmware/efi/vars.c   |  12 +++-
 include/linux/efi.h   |   4 ++
 4 files changed, 155 insertions(+), 16 deletions(-)

diff --git a/drivers/firmware/efi/efi-pstore.c 
b/drivers/firmware/efi/efi-pstore.c
index 5002d50..6ce31e9 100644
--- a/drivers/firmware/efi/efi-pstore.c
+++ b/drivers/firmware/efi/efi-pstore.c
@@ -18,14 +18,12 @@ module_param_named(pstore_disable, efivars_pstore_disable, 
bool, 0644);
 
 static int efi_pstore_open(struct pstore_info *psi)
 {
-   efivar_entry_iter_begin();
psi->data = NULL;
return 0;
 }
 
 static int efi_pstore_close(struct pstore_info *psi)
 {
-   efivar_entry_iter_end();
psi->data = NULL;
return 0;
 }
@@ -91,19 +89,125 @@ static int efi_pstore_read_func(struct efivar_entry 
*entry, void *data)
__efivar_entry_get(entry, >var.Attributes,
   >var.DataSize, entry->var.Data);
size = entry->var.DataSize;
+   memcpy(*cb_data->buf, entry->var.Data,
+  (size_t)min_t(unsigned long, EFIVARS_DATA_SIZE_MAX, size));
 
-   *cb_data-

RE: [PATCH v3] efivars,efi-pstore: Hold off deletion of sysfs entry until, the scan is completed

2013-10-30 Thread Seiji Aguchi
> On Fri, 18 Oct, at 10:30:58PM, Seiji Aguchi wrote:
> > The scanning and deleting logic is still needed.  In case an entry(A)
> > is found, the pointer is saved to psi->data.  And efi_pstore_read()
> > passes the entry(A) to a pstore filesystem by releasing
> > __efivars->lock.
> >
> > And then, the pstore filesystem calls efi_pstore_read() again and the
> > same entry(A), which is saved to psi->data, is used for re-scanning a
> > sysfs-list.  (That is why list_for_each_entry_safe_from () is used in
> > efi_pstore_sysfs_entry_iter().)
> >
> > So, to protect the entry(A), the logic is needed because, in pstore
> > filesystem,  __efivars->lock Is released.
> 
> Very good point.

Thanks,

I will add the description in the next patch.

Seiji
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 2/2] pstore: Differentiating names by adding count and timestamp

2013-10-30 Thread Seiji Aguchi


> -Original Message-
> From: Madper Xie [mailto:c...@redhat.com]
> Sent: Wednesday, October 30, 2013 5:45 AM
> To: tony.l...@intel.com; keesc...@chromium.org; ccr...@android.com; 
> an...@enomsg.org; Seiji Aguchi
> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; 
> bbbo...@gmail.com; Madper Xie
> Subject: [PATCH 2/2] pstore: Differentiating names by adding count and 
> timestamp
> 
> From: Madper Xie 
> 
> pstore denominates dumped file as type-psname-id. it makes many file have
> the same name if there are many entries in backend have the same id.
> So adding count and timestamp to file name for differentiating.
> 
> Signed-off-by: Madper Xie 

It should be tested by other drivers as well..
But, looks good to me.
Acked-by: Seiji Aguchi 

> ---
>  fs/pstore/inode.c | 29 ++---
>  1 file changed, 18 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c
> index 0ae994c..36b502f 100644
> --- a/fs/pstore/inode.c
> +++ b/fs/pstore/inode.c
> @@ -285,7 +285,7 @@ int pstore_mkfile(enum pstore_type_id type, char *psname, 
> u64 id, int count,
>   int rc = 0;
>   charname[PSTORE_NAMELEN];
>   struct pstore_private   *private, *pos;
> - unsigned long   flags;
> + unsigned long   flags, timestamp;
> 
>   spin_lock_irqsave(_lock, flags);
>   list_for_each_entry(pos, , list) {
> @@ -316,35 +316,42 @@ int pstore_mkfile(enum pstore_type_id type, char 
> *psname, u64 id, int count,
>   private->count = count;
>   private->psi = psi;
>   memcpy(>time, , sizeof(struct timespec));
> + timestamp = time.tv_sec;
> 
>   switch (type) {
>   case PSTORE_TYPE_DMESG:
> - sprintf(name, "dmesg-%s-%lld%s", psname, id,
> - compressed ? ".enc.z" : "");
> + sprintf(name, "dmesg-%s-%lld-%d-%ld%s", psname, id, count,
> + timestamp, compressed ? ".enc.z" : "");
>   break;
>   case PSTORE_TYPE_CONSOLE:
> - sprintf(name, "console-%s", psname);
> + sprintf(name, "console-%s-%d-%ld", psname, count, timestamp);
>   break;
>   case PSTORE_TYPE_FTRACE:
> - sprintf(name, "ftrace-%s", psname);
> + sprintf(name, "ftrace-%s-%d-%ld", psname, count, timestamp);
>   break;
>   case PSTORE_TYPE_MCE:
> - sprintf(name, "mce-%s-%lld", psname, id);
> + sprintf(name, "mce-%s-%lld-%d-%ld", psname, id, count,
> + timestamp);
>   break;
>   case PSTORE_TYPE_PPC_RTAS:
> - sprintf(name, "rtas-%s-%lld", psname, id);
> + sprintf(name, "rtas-%s-%lld-%d-%ld", psname, id, count,
> + timestamp);
>   break;
>   case PSTORE_TYPE_PPC_OF:
> - sprintf(name, "powerpc-ofw-%s-%lld", psname, id);
> + sprintf(name, "powerpc-ofw-%s-%lld-%d-%ld", psname, id, count,
> + timestamp);
>   break;
>   case PSTORE_TYPE_PPC_COMMON:
> - sprintf(name, "powerpc-common-%s-%lld", psname, id);
> + sprintf(name, "powerpc-common-%s-%lld-%d-%ld", psname, id,
> + count, timestamp);
>   break;
>   case PSTORE_TYPE_UNKNOWN:
> - sprintf(name, "unknown-%s-%lld", psname, id);
> + sprintf(name, "unknown-%s-%lld-%d-%ld", psname, id, count,
> + timestamp);
>   break;
>   default:
> - sprintf(name, "type%d-%s-%lld", type, psname, id);
> + sprintf(name, "type%d-%s-%lld-%d-%ld", type, psname, id, count,
> + timestamp);
>   break;
>   }
> 
> --
> 1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/2] pstore: avoid incorrectly mark entry as duplicate

2013-10-30 Thread Seiji Aguchi


> -Original Message-
> From: Madper Xie [mailto:c...@redhat.com]
> Sent: Wednesday, October 30, 2013 5:45 AM
> To: tony.l...@intel.com; keesc...@chromium.org; ccr...@android.com; 
> an...@enomsg.org; Seiji Aguchi
> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; 
> bbbo...@gmail.com; Madper Xie
> Subject: [PATCH 1/2] pstore: avoid incorrectly mark entry as duplicate
> 
> From: Madper Xie 
> 
> pstore try to find duplicate entries by check both ID, type and psi.
> They are not really enough for efi backend. dumped vars always have
> the same type, psi and ID. like follows:
> dump-type0-9-1-1382511508-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
> dump-type0-9-1-1382515661-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
> 
> The "duplicate" entries won't appear in pstorefs. And a complain will be
> print -- pstore: failed to load 76 record(s) from 'efi'
> So I add two more checks: timespec and count.
> 
> Signed-off-by: Madper Xie 

Looks good to me.
Acked-by: Seiji Aguchi 

But, this patch has to be tested by other backend drivers like erst and ramoops.
In my understanding, erst has supported to log multiple events.
So, I'm interested why the same error hasn't happened in the other drivers.

Seiji

> ---
>  fs/pstore/inode.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c
> index 1282384..0ae994c 100644
> --- a/fs/pstore/inode.c
> +++ b/fs/pstore/inode.c
> @@ -47,6 +47,7 @@ static LIST_HEAD(allpstore);
>  struct pstore_private {
>   struct list_head list;
>   struct pstore_info *psi;
> + struct timespec time;
>   enum pstore_type_id type;
>   u64 id;
>   int count;
> @@ -290,7 +291,9 @@ int pstore_mkfile(enum pstore_type_id type, char *psname, 
> u64 id, int count,
>   list_for_each_entry(pos, , list) {
>   if (pos->type == type &&
>   pos->id == id &&
> - pos->psi == psi) {
> + pos->psi == psi &&
> + pos->count == count &&
> + !timespec_compare(>time, )) {
>   rc = -EEXIST;
>   break;
>   }
> @@ -312,6 +315,7 @@ int pstore_mkfile(enum pstore_type_id type, char *psname, 
> u64 id, int count,
>   private->id = id;
>   private->count = count;
>   private->psi = psi;
> + memcpy(>time, , sizeof(struct timespec));
> 
>   switch (type) {
>   case PSTORE_TYPE_DMESG:
> --
> 1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/2] pstore: avoid incorrectly mark entry as duplicate

2013-10-30 Thread Seiji Aguchi


 -Original Message-
 From: Madper Xie [mailto:c...@redhat.com]
 Sent: Wednesday, October 30, 2013 5:45 AM
 To: tony.l...@intel.com; keesc...@chromium.org; ccr...@android.com; 
 an...@enomsg.org; Seiji Aguchi
 Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; 
 bbbo...@gmail.com; Madper Xie
 Subject: [PATCH 1/2] pstore: avoid incorrectly mark entry as duplicate
 
 From: Madper Xie bbbo...@gmail.com
 
 pstore try to find duplicate entries by check both ID, type and psi.
 They are not really enough for efi backend. dumped vars always have
 the same type, psi and ID. like follows:
 dump-type0-9-1-1382511508-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
 dump-type0-9-1-1382515661-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
 
 The duplicate entries won't appear in pstorefs. And a complain will be
 print -- pstore: failed to load 76 record(s) from 'efi'
 So I add two more checks: timespec and count.
 
 Signed-off-by: Madper Xie c...@redhat.com

Looks good to me.
Acked-by: Seiji Aguchi seiji.agu...@hds.com

But, this patch has to be tested by other backend drivers like erst and ramoops.
In my understanding, erst has supported to log multiple events.
So, I'm interested why the same error hasn't happened in the other drivers.

Seiji

 ---
  fs/pstore/inode.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)
 
 diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c
 index 1282384..0ae994c 100644
 --- a/fs/pstore/inode.c
 +++ b/fs/pstore/inode.c
 @@ -47,6 +47,7 @@ static LIST_HEAD(allpstore);
  struct pstore_private {
   struct list_head list;
   struct pstore_info *psi;
 + struct timespec time;
   enum pstore_type_id type;
   u64 id;
   int count;
 @@ -290,7 +291,9 @@ int pstore_mkfile(enum pstore_type_id type, char *psname, 
 u64 id, int count,
   list_for_each_entry(pos, allpstore, list) {
   if (pos-type == type 
   pos-id == id 
 - pos-psi == psi) {
 + pos-psi == psi 
 + pos-count == count 
 + !timespec_compare(pos-time, time)) {
   rc = -EEXIST;
   break;
   }
 @@ -312,6 +315,7 @@ int pstore_mkfile(enum pstore_type_id type, char *psname, 
 u64 id, int count,
   private-id = id;
   private-count = count;
   private-psi = psi;
 + memcpy(private-time, time, sizeof(struct timespec));
 
   switch (type) {
   case PSTORE_TYPE_DMESG:
 --
 1.8.4.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 2/2] pstore: Differentiating names by adding count and timestamp

2013-10-30 Thread Seiji Aguchi


 -Original Message-
 From: Madper Xie [mailto:c...@redhat.com]
 Sent: Wednesday, October 30, 2013 5:45 AM
 To: tony.l...@intel.com; keesc...@chromium.org; ccr...@android.com; 
 an...@enomsg.org; Seiji Aguchi
 Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; 
 bbbo...@gmail.com; Madper Xie
 Subject: [PATCH 2/2] pstore: Differentiating names by adding count and 
 timestamp
 
 From: Madper Xie bbbo...@gmail.com
 
 pstore denominates dumped file as type-psname-id. it makes many file have
 the same name if there are many entries in backend have the same id.
 So adding count and timestamp to file name for differentiating.
 
 Signed-off-by: Madper Xie c...@redhat.com

It should be tested by other drivers as well..
But, looks good to me.
Acked-by: Seiji Aguchi seiji.agu...@hds.com

 ---
  fs/pstore/inode.c | 29 ++---
  1 file changed, 18 insertions(+), 11 deletions(-)
 
 diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c
 index 0ae994c..36b502f 100644
 --- a/fs/pstore/inode.c
 +++ b/fs/pstore/inode.c
 @@ -285,7 +285,7 @@ int pstore_mkfile(enum pstore_type_id type, char *psname, 
 u64 id, int count,
   int rc = 0;
   charname[PSTORE_NAMELEN];
   struct pstore_private   *private, *pos;
 - unsigned long   flags;
 + unsigned long   flags, timestamp;
 
   spin_lock_irqsave(allpstore_lock, flags);
   list_for_each_entry(pos, allpstore, list) {
 @@ -316,35 +316,42 @@ int pstore_mkfile(enum pstore_type_id type, char 
 *psname, u64 id, int count,
   private-count = count;
   private-psi = psi;
   memcpy(private-time, time, sizeof(struct timespec));
 + timestamp = time.tv_sec;
 
   switch (type) {
   case PSTORE_TYPE_DMESG:
 - sprintf(name, dmesg-%s-%lld%s, psname, id,
 - compressed ? .enc.z : );
 + sprintf(name, dmesg-%s-%lld-%d-%ld%s, psname, id, count,
 + timestamp, compressed ? .enc.z : );
   break;
   case PSTORE_TYPE_CONSOLE:
 - sprintf(name, console-%s, psname);
 + sprintf(name, console-%s-%d-%ld, psname, count, timestamp);
   break;
   case PSTORE_TYPE_FTRACE:
 - sprintf(name, ftrace-%s, psname);
 + sprintf(name, ftrace-%s-%d-%ld, psname, count, timestamp);
   break;
   case PSTORE_TYPE_MCE:
 - sprintf(name, mce-%s-%lld, psname, id);
 + sprintf(name, mce-%s-%lld-%d-%ld, psname, id, count,
 + timestamp);
   break;
   case PSTORE_TYPE_PPC_RTAS:
 - sprintf(name, rtas-%s-%lld, psname, id);
 + sprintf(name, rtas-%s-%lld-%d-%ld, psname, id, count,
 + timestamp);
   break;
   case PSTORE_TYPE_PPC_OF:
 - sprintf(name, powerpc-ofw-%s-%lld, psname, id);
 + sprintf(name, powerpc-ofw-%s-%lld-%d-%ld, psname, id, count,
 + timestamp);
   break;
   case PSTORE_TYPE_PPC_COMMON:
 - sprintf(name, powerpc-common-%s-%lld, psname, id);
 + sprintf(name, powerpc-common-%s-%lld-%d-%ld, psname, id,
 + count, timestamp);
   break;
   case PSTORE_TYPE_UNKNOWN:
 - sprintf(name, unknown-%s-%lld, psname, id);
 + sprintf(name, unknown-%s-%lld-%d-%ld, psname, id, count,
 + timestamp);
   break;
   default:
 - sprintf(name, type%d-%s-%lld, type, psname, id);
 + sprintf(name, type%d-%s-%lld-%d-%ld, type, psname, id, count,
 + timestamp);
   break;
   }
 
 --
 1.8.4.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v3] efivars,efi-pstore: Hold off deletion of sysfs entry until, the scan is completed

2013-10-30 Thread Seiji Aguchi
 On Fri, 18 Oct, at 10:30:58PM, Seiji Aguchi wrote:
  The scanning and deleting logic is still needed.  In case an entry(A)
  is found, the pointer is saved to psi-data.  And efi_pstore_read()
  passes the entry(A) to a pstore filesystem by releasing
  __efivars-lock.
 
  And then, the pstore filesystem calls efi_pstore_read() again and the
  same entry(A), which is saved to psi-data, is used for re-scanning a
  sysfs-list.  (That is why list_for_each_entry_safe_from () is used in
  efi_pstore_sysfs_entry_iter().)
 
  So, to protect the entry(A), the logic is needed because, in pstore
  filesystem,  __efivars-lock Is released.
 
 Very good point.

Thanks,

I will add the description in the next patch.

Seiji
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4] efivars,efi-pstore: Hold off deletion of sysfs entry until the scan is completed

2013-10-30 Thread Seiji Aguchi
Change from v3:
- Introduce EFI_VARS_DATA_SIZE_MAX macro.
- Add some description why a scanning/deleting logic is needed.

Currently, when mounting pstore file system, a read callback of efi_pstore
driver runs mutiple times as below.

- In the first read callback, scan efivar_sysfs_list from head and pass
  a kmsg buffer of a entry to an upper pstore layer.
- In the second read callback, rescan efivar_sysfs_list from the entry and pass
  another kmsg buffer to it.
- Repeat the scan and pass until the end of efivar_sysfs_list.

In this process, an entry is read across the multiple read function calls.
To avoid race between the read and erasion, the whole process above is
protected by a spinlock, holding in open() and releasing in close().

At the same time, kmemdup() is called to pass the buffer to pstore filesystem
during it.
And then, it causes a following lockdep warning.

To make the dynamic memory allocation runnable without taking spinlock,
holding off a deletion of sysfs entry if it happens while scanning it
via efi_pstore, and deleting it after the scan is completed.

To implement it, this patch introduces two flags, scanning and deleting,
to efivar_entry.

On the code basis, it seems that all the scanning and deleting logic is not
needed because __efivars-lock are not dropped when reading from the EFI
variable store.

But, the scanning and deleting logic is still needed because an efi-pstore and
a pstore filesystem works as follows.

In case an entry(A) is found, the pointer is saved to psi-data.
And efi_pstore_read() passes the entry(A) to a pstore filesystem by
releasing  __efivars-lock.

And then, the pstore filesystem calls efi_pstore_read() again and the same
entry(A), which is saved to psi-data, is used for resuming to scan
a sysfs-list.

So, to protect the entry(A), the logic is needed.

[1.143710] [ cut here ]
[1.144058] WARNING: CPU: 1 PID: 1 at kernel/lockdep.c:2740
lockdep_trace_alloc+0x104/0x110()
[1.144058] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
[1.144058] Modules linked in:

[1.144058] CPU: 1 PID: 1 Comm: systemd Not tainted 3.11.0-rc5 #2
[1.144058]  0009 8800797e9ae0 816614a5
8800797e9b28
[1.144058]  8800797e9b18 8105510d 0080
0046
[1.144058]  00d0 03af 81ccd0c0
8800797e9b78
[1.144058] Call Trace:
[1.144058]  [816614a5] dump_stack+0x54/0x74
[1.144058]  [8105510d] warn_slowpath_common+0x7d/0xa0
[1.144058]  [8105517c] warn_slowpath_fmt+0x4c/0x50
[1.144058]  [8131290f] ? vsscanf+0x57f/0x7b0
[1.144058]  [810bbd74] lockdep_trace_alloc+0x104/0x110
[1.144058]  [81192da0] __kmalloc_track_caller+0x50/0x280
[1.144058]  [815147bb] ?
efi_pstore_read_func.part.1+0x12b/0x170
[1.144058]  [8115b260] kmemdup+0x20/0x50
[1.144058]  [815147bb] efi_pstore_read_func.part.1+0x12b/0x170
[1.144058]  [81514800] ?
efi_pstore_read_func.part.1+0x170/0x170
[1.144058]  [815148b4] efi_pstore_read_func+0xb4/0xe0
[1.144058]  [81512b7b] __efivar_entry_iter+0xfb/0x120
[1.144058]  [8151428f] efi_pstore_read+0x3f/0x50
[1.144058]  [8128d7ba] pstore_get_records+0x9a/0x150
[1.158207]  [812af25c] ? selinux_d_instantiate+0x1c/0x20
[1.158207]  [8128ce30] ? parse_options+0x80/0x80
[1.158207]  [8128ced5] pstore_fill_super+0xa5/0xc0
[1.158207]  [811ae7d2] mount_single+0xa2/0xd0
[1.158207]  [8128ccf8] pstore_mount+0x18/0x20
[1.158207]  [811ae8b9] mount_fs+0x39/0x1b0
[1.158207]  [81160550] ? __alloc_percpu+0x10/0x20
[1.158207]  [811c9493] vfs_kern_mount+0x63/0xf0
[1.158207]  [811cbb0e] do_mount+0x23e/0xa20
[1.158207]  [8115b51b] ? strndup_user+0x4b/0xf0
[1.158207]  [811cc373] SyS_mount+0x83/0xc0
[1.158207]  [81673cc2] system_call_fastpath+0x16/0x1b
[1.158207] ---[ end trace 61981bc62de9f6f4 ]---

Signed-off-by: Seiji Aguchi seiji.agu...@hds.com
---
 drivers/firmware/efi/efi-pstore.c | 143 +++---
 drivers/firmware/efi/efivars.c|  12 ++--
 drivers/firmware/efi/vars.c   |  12 +++-
 include/linux/efi.h   |   4 ++
 4 files changed, 155 insertions(+), 16 deletions(-)

diff --git a/drivers/firmware/efi/efi-pstore.c 
b/drivers/firmware/efi/efi-pstore.c
index 5002d50..6ce31e9 100644
--- a/drivers/firmware/efi/efi-pstore.c
+++ b/drivers/firmware/efi/efi-pstore.c
@@ -18,14 +18,12 @@ module_param_named(pstore_disable, efivars_pstore_disable, 
bool, 0644);
 
 static int efi_pstore_open(struct pstore_info *psi)
 {
-   efivar_entry_iter_begin();
psi-data = NULL;
return 0;
 }
 
 static int efi_pstore_close(struct pstore_info *psi)
 {
-   efivar_entry_iter_end();
psi-data = NULL

[PATCH v4 0/4] Introduce page fault tracepoints

2013-10-30 Thread Seiji Aguchi
Change from v3:
 - Separate modifications to make review easy.
 - Refactor implementations registering exception/irq_vector
   handers. (Patch 1, 2, 3)

This series introduce page fault tracepoints.

Detailed descriptions are explained in each patch.
Any comments are welcome.

Seiji Aguchi (4):
  Move set_intr_gate() into macro
  Register exception handler to trace IDT
  Delete __trace_alloc_intr_gate()
  Add page fault tracepoints

 arch/x86/include/asm/desc.h | 57 ++---
 arch/x86/include/asm/hw_irq.h   |  3 ++
 arch/x86/include/asm/segment.h  |  3 ++
 arch/x86/include/asm/trace/exceptions.h | 52 ++
 arch/x86/include/asm/traps.h| 20 
 arch/x86/kernel/entry_32.S  | 10 ++
 arch/x86/kernel/entry_64.S  | 13 +++-
 arch/x86/kernel/head64.c|  2 +-
 arch/x86/kernel/kvm.c   |  2 +-
 arch/x86/kernel/traps.c | 28 
 arch/x86/mm/Makefile|  2 ++
 arch/x86/mm/fault.c | 23 +
 12 files changed, 165 insertions(+), 50 deletions(-)
 create mode 100644 arch/x86/include/asm/trace/exceptions.h

-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 2/4] Register exception handler to trace IDT

2013-10-30 Thread Seiji Aguchi
This patch registers exception handlers for tracing to a trace IDT.

To implemented it in set_intr_gate(), this patch does followings.
 - Register the exception handlers to
   the trace IDT by prepending trace_ to the handler's names.
 - Also, newly introduce trace_page_fault() to add tracepoints
   in a subsequent patch.

Signed-off-by: Seiji Aguchi seiji.agu...@hds.com
---
 arch/x86/include/asm/desc.h| 28 +++-
 arch/x86/include/asm/hw_irq.h  |  3 +++
 arch/x86/include/asm/segment.h |  3 +++
 arch/x86/include/asm/traps.h   | 20 
 arch/x86/kernel/entry_32.S | 10 ++
 arch/x86/kernel/entry_64.S | 13 -
 arch/x86/kernel/head64.c   |  2 +-
 arch/x86/kernel/kvm.c  |  2 +-
 arch/x86/kernel/traps.c| 28 ++--
 arch/x86/mm/fault.c| 10 ++
 10 files changed, 97 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index d939567..3d73437 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -327,10 +327,25 @@ static inline void write_trace_idt_entry(int entry, const 
gate_desc *gate)
 {
write_idt_entry(trace_idt_table, entry, gate);
 }
+
+static inline void _trace_set_gate(int gate, unsigned type, void *addr,
+  unsigned dpl, unsigned ist, unsigned seg)
+{
+   gate_desc s;
+
+   pack_gate(s, type, (unsigned long)addr, dpl, ist, seg);
+   /*
+* does not need to be atomic because it is only done once at
+* setup time
+*/
+   write_trace_idt_entry(gate, s);
+}
 #else
 static inline void write_trace_idt_entry(int entry, const gate_desc *gate)
 {
 }
+
+#define _trace_set_gate(gate, type, addr, dpl, ist, seg)
 #endif
 
 static inline void _set_gate(int gate, unsigned type, void *addr,
@@ -353,11 +368,14 @@ static inline void _set_gate(int gate, unsigned type, 
void *addr,
  * Pentium F0 0F bugfix can have resulted in the mapped
  * IDT being write-protected.
  */
-static inline void set_intr_gate(unsigned int n, void *addr)
-{
-   BUG_ON((unsigned)n  0xFF);
-   _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS);
-}
+#define set_intr_gate(n, addr) \
+   do {\
+   BUG_ON((unsigned)n  0xFF); \
+   _set_gate(n, GATE_INTERRUPT, (void *)addr, 0, 0,\
+ __KERNEL_CS); \
+   _trace_set_gate(n, GATE_INTERRUPT, (void *)trace_##addr,\
+   0, 0, __KERNEL_CS); \
+   } while (0)
 
 extern int first_system_vector;
 /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */
diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 92b3bae..cba45d9 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -187,6 +187,9 @@ extern __visible void smp_invalidate_interrupt(struct 
pt_regs *);
 #endif
 
 extern void (*__initconst interrupt[NR_VECTORS-FIRST_EXTERNAL_VECTOR])(void);
+#ifdef CONFIG_TRACING
+#define trace_interrupt interrupt
+#endif
 
 typedef int vector_irq_t[NR_VECTORS];
 DECLARE_PER_CPU(vector_irq_t, vector_irq);
diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h
index c48a950..6f1c3a8 100644
--- a/arch/x86/include/asm/segment.h
+++ b/arch/x86/include/asm/segment.h
@@ -214,6 +214,9 @@
 #ifdef __KERNEL__
 #ifndef __ASSEMBLY__
 extern const char early_idt_handlers[NUM_EXCEPTION_VECTORS][2+2+5];
+#ifdef CONFIG_TRACING
+#define trace_early_idt_handlers early_idt_handlers
+#endif
 
 /*
  * Load a segment. Fall back on loading the zero
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 7036cb6..58d66fe 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -37,6 +37,23 @@ asmlinkage void machine_check(void);
 #endif /* CONFIG_X86_MCE */
 asmlinkage void simd_coprocessor_error(void);
 
+#ifdef CONFIG_TRACING
+asmlinkage void trace_page_fault(void);
+#define trace_divide_error divide_error
+#define trace_bounds bounds
+#define trace_invalid_op invalid_op
+#define trace_device_not_available device_not_available
+#define trace_coprocessor_segment_overrun coprocessor_segment_overrun
+#define trace_invalid_TSS invalid_TSS
+#define trace_segment_not_present segment_not_present
+#define trace_general_protection general_protection
+#define trace_spurious_interrupt_bug spurious_interrupt_bug
+#define trace_coprocessor_error coprocessor_error
+#define trace_alignment_check alignment_check
+#define trace_simd_coprocessor_error simd_coprocessor_error
+#define trace_async_page_fault async_page_fault
+#endif
+
 dotraplinkage void do_divide_error(struct pt_regs *, long);
 dotraplinkage void do_debug(struct pt_regs *, long);
 dotraplinkage

[PATCH v4 1/4] Move set_intr_gate() into macro

2013-10-30 Thread Seiji Aguchi
Move set_intr_gate() into a macro by removing __alloc_intr_gate().

The purpose is to avoid failing a kernel build after applying
a subsequent patch which changes set_intr_gate() to macro.

Signed-off-by: Seiji Aguchi seiji.agu...@hds.com
---
 arch/x86/include/asm/desc.h | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index b90e5df..d939567 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -395,15 +395,10 @@ static inline void trace_set_intr_gate(unsigned int gate, 
void *addr)
 #define __trace_alloc_intr_gate(n, addr)
 #endif
 
-static inline void __alloc_intr_gate(unsigned int n, void *addr)
-{
-   set_intr_gate(n, addr);
-}
-
 #define alloc_intr_gate(n, addr)   \
do {\
alloc_system_vector(n); \
-   __alloc_intr_gate(n, addr); \
+   set_intr_gate(n, addr); \
__trace_alloc_intr_gate(n, trace_##addr);   \
} while (0)
 
-- 
1.8.3.1


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 3/4] Delete __trace_alloc_intr_gate()

2013-10-30 Thread Seiji Aguchi
Currently irq vector handlers for tracing are registered in both set_intr_gate()
 and __trace_alloc_intr_gate() in alloc_intr_gate().
But, we don't need to do that twice.
So, let's delete __trace_alloc_intr_gate().

Signed-off-by: Seiji Aguchi seiji.agu...@hds.com
---
 arch/x86/include/asm/desc.h | 22 --
 1 file changed, 22 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index 3d73437..50d033a 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -392,32 +392,10 @@ static inline void alloc_system_vector(int vector)
}
 }
 
-#ifdef CONFIG_TRACING
-static inline void trace_set_intr_gate(unsigned int gate, void *addr)
-{
-   gate_desc s;
-
-   pack_gate(s, GATE_INTERRUPT, (unsigned long)addr, 0, 0, __KERNEL_CS);
-   write_idt_entry(trace_idt_table, gate, s);
-}
-
-static inline void __trace_alloc_intr_gate(unsigned int n, void *addr)
-{
-   trace_set_intr_gate(n, addr);
-}
-#else
-static inline void trace_set_intr_gate(unsigned int gate, void *addr)
-{
-}
-
-#define __trace_alloc_intr_gate(n, addr)
-#endif
-
 #define alloc_intr_gate(n, addr)   \
do {\
alloc_system_vector(n); \
set_intr_gate(n, addr); \
-   __trace_alloc_intr_gate(n, trace_##addr);   \
} while (0)
 
 /*
-- 
1.8.3.1


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 4/4] Add page fault tracepoints

2013-10-30 Thread Seiji Aguchi
This patch introduces page fault tracepoints to x86 architecture
by switching IDT.

  Two events, for user and kernel spaces, are introduced at the beginning
  of page fault handler for tracing.

  - User space event
There is a request of page fault event for user space as below.


https://lkml.kernel.org/r/1368079520-11015-2-git-send-email-fdeslaur+()+gmail+!+com

https://lkml.kernel.org/r/1368079520-11015-1-git-send-email-fdeslaur+()+gmail+!+com

  - Kernel space event:
When we meature an overhead in kernel space for investigating performance
issues, we can check if it comes from the page fault events.

Signed-off-by: Seiji Aguchi seiji.agu...@hds.com
---
 arch/x86/include/asm/trace/exceptions.h | 52 +
 arch/x86/mm/Makefile|  2 ++
 arch/x86/mm/fault.c | 13 +
 3 files changed, 67 insertions(+)
 create mode 100644 arch/x86/include/asm/trace/exceptions.h

diff --git a/arch/x86/include/asm/trace/exceptions.h 
b/arch/x86/include/asm/trace/exceptions.h
new file mode 100644
index 000..86540c0
--- /dev/null
+++ b/arch/x86/include/asm/trace/exceptions.h
@@ -0,0 +1,52 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM exceptions
+
+#if !defined(_TRACE_PAGE_FAULT_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_PAGE_FAULT_H
+
+#include linux/tracepoint.h
+
+extern void trace_irq_vector_regfunc(void);
+extern void trace_irq_vector_unregfunc(void);
+
+DECLARE_EVENT_CLASS(x86_exceptions,
+
+   TP_PROTO(unsigned long address, struct pt_regs *regs,
+unsigned long error_code),
+
+   TP_ARGS(address, regs, error_code),
+
+   TP_STRUCT__entry(
+   __field(unsigned long, address  )
+   __field(unsigned long, ip   )
+   __field(unsigned long, error_code )
+   ),
+
+   TP_fast_assign(
+   __entry-address = address;
+   __entry-ip = regs-ip;
+   __entry-error_code = error_code;
+   ),
+
+   TP_printk(address=%pf ip=%pf error_code=0x%lx,
+ (void *)__entry-address, (void *)__entry-ip,
+ __entry-error_code) );
+
+#define DEFINE_PAGE_FAULT_EVENT(name)  \
+DEFINE_EVENT_FN(x86_exceptions, name,  \
+   TP_PROTO(unsigned long address, struct pt_regs *regs,   \
+unsigned long error_code), \
+   TP_ARGS(address, regs, error_code), \
+   trace_irq_vector_regfunc,   \
+   trace_irq_vector_unregfunc);
+
+DEFINE_PAGE_FAULT_EVENT(user_page_fault);
+DEFINE_PAGE_FAULT_EVENT(kernel_page_fault);
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH .
+#define TRACE_INCLUDE_FILE exceptions
+#endif /*  _TRACE_PAGE_FAULT_H */
+
+/* This part must be outside protection */
+#include trace/define_trace.h
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 23d8e5f..6a19ad9 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -6,6 +6,8 @@ nostackp := $(call cc-option, -fno-stack-protector)
 CFLAGS_physaddr.o  := $(nostackp)
 CFLAGS_setup_nx.o  := $(nostackp)
 
+CFLAGS_fault.o := -I$(src)/../include/asm/trace
+
 obj-$(CONFIG_X86_PAT)  += pat_rbtree.o
 obj-$(CONFIG_SMP)  += tlb.o
 
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index fd3e281..f2730cbc 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -20,6 +20,9 @@
 #include asm/kmemcheck.h /* kmemcheck_*(), ...   */
 #include asm/fixmap.h/* VSYSCALL_START   
*/
 
+#define CREATE_TRACE_POINTS
+#include asm/trace/exceptions.h
+
 /*
  * Page fault error code bits:
  *
@@ -1232,12 +1235,22 @@ do_page_fault(struct pt_regs *regs, unsigned long 
error_code)
exception_exit(prev_state);
 }
 
+static void trace_page_fault_entries(struct pt_regs *regs,
+unsigned long error_code)
+{
+   if (user_mode(regs))
+   trace_user_page_fault(read_cr2(), regs, error_code);
+   else
+   trace_kernel_page_fault(read_cr2(), regs, error_code);
+}
+
 dotraplinkage void __kprobes
 trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
 {
enum ctx_state prev_state;
 
prev_state = exception_enter();
+   trace_page_fault_entries(regs, error_code);
__do_page_fault(regs, error_code);
exception_exit(prev_state);
 }
-- 
1.8.3.1


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 0/2] make all stored entries accessible.

2013-10-30 Thread Seiji Aguchi
 Ah - I was expecting that the backend driver would have a unique id for
 each record stored ... but is seems that this isn't true for efivars.
 

So, do you mean efivars should fix to use the id in a proper way?

I acked Madper's patch 2/2 earlier today, but when I look at your test result, 
I'm not sure if
it is reasonable for users to make multiple numbers visible to the file name.

 -r--r--r-- 1 root root 17499 Oct 30 13:41 
 dmesg-erst-5940651313304961029--2129078373-1383165669

Seiji
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] Differentiating names by adding a timestamp.

2013-10-28 Thread Seiji Aguchi
To distinguish all events more precisely, "count" is needed to add to the name.
Please see the commit below.
755d4fe46529018ae45bc7c86df682de45ace764

Seiji

> -Original Message-
> From: Madper Xie [mailto:c...@redhat.com]
> Sent: Monday, October 28, 2013 9:22 AM
> To: Tony Luck; Seiji Aguchi; Anton Vorontsov; Colin Cross; Kees Cook
> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [PATCH] Differentiating names by adding a timestamp.
> 
> 
> pstore denominate dumped file as type-psname-id. it makes many file have
> the same name if there are many entries in backend have the same id.
> So adding a timestamp to file name.
> 
> Signed-off-by: Madper Xie 
> ---
>  fs/pstore/inode.c | 26 --
>  1 file changed, 16 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c
> index 1282384..bf6ed3a 100644
> --- a/fs/pstore/inode.c
> +++ b/fs/pstore/inode.c
> @@ -315,32 +315,38 @@ int pstore_mkfile(enum pstore_type_id type, char 
> *psname, u64 id, int count,
> 
>   switch (type) {
>   case PSTORE_TYPE_DMESG:
> - sprintf(name, "dmesg-%s-%lld%s", psname, id,
> - compressed ? ".enc.z" : "");
> + sprintf(name, "dmesg-%s-%lld-%lld%s", psname, id,
> + timespec_to_ns(), compressed ? ".enc.z" : "");
>   break;
>   case PSTORE_TYPE_CONSOLE:
> - sprintf(name, "console-%s", psname);
> + sprintf(name, "console-%s-%lld", psname, timespec_to_ns());
>   break;
>   case PSTORE_TYPE_FTRACE:
> - sprintf(name, "ftrace-%s", psname);
> + sprintf(name, "ftrace-%s-%lld", psname, timespec_to_ns());
>   break;
>   case PSTORE_TYPE_MCE:
> - sprintf(name, "mce-%s-%lld", psname, id);
> + sprintf(name, "mce-%s-%lld-%lld", psname, id,
> + timespec_to_ns());
>   break;
>   case PSTORE_TYPE_PPC_RTAS:
> - sprintf(name, "rtas-%s-%lld", psname, id);
> + sprintf(name, "rtas-%s-%lld-%lld", psname, id,
> + timespec_to_ns());
>   break;
>   case PSTORE_TYPE_PPC_OF:
> - sprintf(name, "powerpc-ofw-%s-%lld", psname, id);
> + sprintf(name, "powerpc-ofw-%s-%lld-%lld", psname, id,
> + timespec_to_ns());
>   break;
>   case PSTORE_TYPE_PPC_COMMON:
> - sprintf(name, "powerpc-common-%s-%lld", psname, id);
> + sprintf(name, "powerpc-common-%s-%lld-%lld", psname, id,
> + timespec_to_ns());
>   break;
>   case PSTORE_TYPE_UNKNOWN:
> - sprintf(name, "unknown-%s-%lld", psname, id);
> + sprintf(name, "unknown-%s-%lld-%lld", psname, id,
> + timespec_to_ns());
>   break;
>   default:
> - sprintf(name, "type%d-%s-%lld", type, psname, id);
> + sprintf(name, "type%d-%s-%lld-%lld", type, psname, id,
> + timespec_to_ns());
>   break;
>   }
> 
> --
> 1.8.4.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] Differentiating names by adding a timestamp.

2013-10-28 Thread Seiji Aguchi
To distinguish all events more precisely, count is needed to add to the name.
Please see the commit below.
755d4fe46529018ae45bc7c86df682de45ace764

Seiji

 -Original Message-
 From: Madper Xie [mailto:c...@redhat.com]
 Sent: Monday, October 28, 2013 9:22 AM
 To: Tony Luck; Seiji Aguchi; Anton Vorontsov; Colin Cross; Kees Cook
 Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org
 Subject: [PATCH] Differentiating names by adding a timestamp.
 
 
 pstore denominate dumped file as type-psname-id. it makes many file have
 the same name if there are many entries in backend have the same id.
 So adding a timestamp to file name.
 
 Signed-off-by: Madper Xie c...@redhat.com
 ---
  fs/pstore/inode.c | 26 --
  1 file changed, 16 insertions(+), 10 deletions(-)
 
 diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c
 index 1282384..bf6ed3a 100644
 --- a/fs/pstore/inode.c
 +++ b/fs/pstore/inode.c
 @@ -315,32 +315,38 @@ int pstore_mkfile(enum pstore_type_id type, char 
 *psname, u64 id, int count,
 
   switch (type) {
   case PSTORE_TYPE_DMESG:
 - sprintf(name, dmesg-%s-%lld%s, psname, id,
 - compressed ? .enc.z : );
 + sprintf(name, dmesg-%s-%lld-%lld%s, psname, id,
 + timespec_to_ns(time), compressed ? .enc.z : );
   break;
   case PSTORE_TYPE_CONSOLE:
 - sprintf(name, console-%s, psname);
 + sprintf(name, console-%s-%lld, psname, timespec_to_ns(time));
   break;
   case PSTORE_TYPE_FTRACE:
 - sprintf(name, ftrace-%s, psname);
 + sprintf(name, ftrace-%s-%lld, psname, timespec_to_ns(time));
   break;
   case PSTORE_TYPE_MCE:
 - sprintf(name, mce-%s-%lld, psname, id);
 + sprintf(name, mce-%s-%lld-%lld, psname, id,
 + timespec_to_ns(time));
   break;
   case PSTORE_TYPE_PPC_RTAS:
 - sprintf(name, rtas-%s-%lld, psname, id);
 + sprintf(name, rtas-%s-%lld-%lld, psname, id,
 + timespec_to_ns(time));
   break;
   case PSTORE_TYPE_PPC_OF:
 - sprintf(name, powerpc-ofw-%s-%lld, psname, id);
 + sprintf(name, powerpc-ofw-%s-%lld-%lld, psname, id,
 + timespec_to_ns(time));
   break;
   case PSTORE_TYPE_PPC_COMMON:
 - sprintf(name, powerpc-common-%s-%lld, psname, id);
 + sprintf(name, powerpc-common-%s-%lld-%lld, psname, id,
 + timespec_to_ns(time));
   break;
   case PSTORE_TYPE_UNKNOWN:
 - sprintf(name, unknown-%s-%lld, psname, id);
 + sprintf(name, unknown-%s-%lld-%lld, psname, id,
 + timespec_to_ns(time));
   break;
   default:
 - sprintf(name, type%d-%s-%lld, type, psname, id);
 + sprintf(name, type%d-%s-%lld-%lld, type, psname, id,
 + timespec_to_ns(time));
   break;
   }
 
 --
 1.8.4.1
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v3] efivars,efi-pstore: Hold off deletion of sysfs entry until, the scan is completed

2013-10-18 Thread Seiji Aguchi
Matt,

> It seems to me that because you're no longer dropping __efivars->lock
> when reading from the EFI variable store, you actually don't need all
> the ->scanning and ->deleting logic because anything that sets those
> flags runs to completion while holding the lock.

The scanning and deleting logic is still needed.
In case an entry(A) is found, the pointer is saved to psi->data.
And efi_pstore_read() passes the entry(A) to a pstore filesystem by releasing  
__efivars->lock.

And then, the pstore filesystem calls efi_pstore_read() again and the same 
entry(A), which is saved to
psi->data, is used for re-scanning a sysfs-list.
(That is why list_for_each_entry_safe_from () is used in 
efi_pstore_sysfs_entry_iter().)

So, to protect the entry(A), the logic is needed because, in pstore filesystem, 
 __efivars->lock
Is released. 

> Can't the patch be simplified to just allocating data.buf at the
> beginning of efi_pstore_read()? 

I think data.buf is simply allocated at the beginning of efi_pstore_read() with 
this patch v3.
If you are not satisfied with it, could you please show me your pseudo code?

>Also, it would be a good idea to
> introduce a #define for the 1024 magic number, e.g.
> 
> #define EFIVARS_DATA_SIZE_MAX 1024

It is good idea. I can fix it.

Seiji


+/**
+ * efi_pstore_read
+ *
+ * This function returns a size of NVRAM entry logged via efi_pstore_write().
+ * The meaning and behavior of efi_pstore/pstore are as below.
+ *
+ * size > 0: Got data of an entry logged via efi_pstore_write() successfully,
+ *   and pstore filesystem will continue reading subsequent entries.
+ * size == 0: Entry was not logged via efi_pstore_write(),
+ *and efi_pstore driver will continue reading subsequent entries.
+ * size < 0: Failed to get data of entry logging via efi_pstore_write(),
+ *   and pstore will stop reading entry.
+ */
 static ssize_t efi_pstore_read(u64 *id, enum pstore_type_id *type,
   int *count, struct timespec *timespec,
   char **buf, bool *compressed,
   struct pstore_info *psi)
 {
struct pstore_read_data data;
+   ssize_t size;
 
data.id = id;
data.type = type;
@@ -112,8 +216,17 @@ static ssize_t efi_pstore_read(u64 *id, enum 
pstore_type_id *type,
data.compressed = compressed;
data.buf = buf;
 
-   return __efivar_entry_iter(efi_pstore_read_func, _sysfs_list, 
,
-  (struct efivar_entry **)>data);
+   *data.buf = kzalloc(1024, GFP_KERNEL);
+   if (!*data.buf)
+   return -ENOMEM;
+
+   efivar_entry_iter_begin();
+   size = efi_pstore_sysfs_entry_iter(,
+  (struct efivar_entry **)>data);
+   efivar_entry_iter_end();
+   if (size <= 0)
+   kfree(*data.buf);
+   return size;
 }

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v3] efivars,efi-pstore: Hold off deletion of sysfs entry until, the scan is completed

2013-10-18 Thread Seiji Aguchi
Matt,

 It seems to me that because you're no longer dropping __efivars-lock
 when reading from the EFI variable store, you actually don't need all
 the -scanning and -deleting logic because anything that sets those
 flags runs to completion while holding the lock.

The scanning and deleting logic is still needed.
In case an entry(A) is found, the pointer is saved to psi-data.
And efi_pstore_read() passes the entry(A) to a pstore filesystem by releasing  
__efivars-lock.

And then, the pstore filesystem calls efi_pstore_read() again and the same 
entry(A), which is saved to
psi-data, is used for re-scanning a sysfs-list.
(That is why list_for_each_entry_safe_from () is used in 
efi_pstore_sysfs_entry_iter().)

So, to protect the entry(A), the logic is needed because, in pstore filesystem, 
 __efivars-lock
Is released. 

 Can't the patch be simplified to just allocating data.buf at the
 beginning of efi_pstore_read()? 

I think data.buf is simply allocated at the beginning of efi_pstore_read() with 
this patch v3.
If you are not satisfied with it, could you please show me your pseudo code?

Also, it would be a good idea to
 introduce a #define for the 1024 magic number, e.g.
 
 #define EFIVARS_DATA_SIZE_MAX 1024

It is good idea. I can fix it.

Seiji

snip
+/**
+ * efi_pstore_read
+ *
+ * This function returns a size of NVRAM entry logged via efi_pstore_write().
+ * The meaning and behavior of efi_pstore/pstore are as below.
+ *
+ * size  0: Got data of an entry logged via efi_pstore_write() successfully,
+ *   and pstore filesystem will continue reading subsequent entries.
+ * size == 0: Entry was not logged via efi_pstore_write(),
+ *and efi_pstore driver will continue reading subsequent entries.
+ * size  0: Failed to get data of entry logging via efi_pstore_write(),
+ *   and pstore will stop reading entry.
+ */
 static ssize_t efi_pstore_read(u64 *id, enum pstore_type_id *type,
   int *count, struct timespec *timespec,
   char **buf, bool *compressed,
   struct pstore_info *psi)
 {
struct pstore_read_data data;
+   ssize_t size;
 
data.id = id;
data.type = type;
@@ -112,8 +216,17 @@ static ssize_t efi_pstore_read(u64 *id, enum 
pstore_type_id *type,
data.compressed = compressed;
data.buf = buf;
 
-   return __efivar_entry_iter(efi_pstore_read_func, efivar_sysfs_list, 
data,
-  (struct efivar_entry **)psi-data);
+   *data.buf = kzalloc(1024, GFP_KERNEL);
+   if (!*data.buf)
+   return -ENOMEM;
+
+   efivar_entry_iter_begin();
+   size = efi_pstore_sysfs_entry_iter(data,
+  (struct efivar_entry **)psi-data);
+   efivar_entry_iter_end();
+   if (size = 0)
+   kfree(*data.buf);
+   return size;
 }
snip
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v3] efivars,efi-pstore: Hold off deletion of sysfs entry until, the scan is completed

2013-10-17 Thread Seiji Aguchi
Hi Madper,

Thank you for assisting me.

But, I need to discuss the implementation with Matt more.
After the discussion, I will post v4 and ask you to test it.

Please wait for a while.

Seiji

> -Original Message-
> From: Madper Xie [mailto:c...@redhat.com]
> Sent: Thursday, October 17, 2013 11:07 AM
> To: Madper Xie
> Cc: Seiji Aguchi; linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; 
> matt.flem...@intel.com; tony.l...@intel.com; Tomoki
> Sekiyama; dle-deve...@lists.sourceforge.net
> Subject: Re: [PATCH v3] efivars,efi-pstore: Hold off deletion of sysfs entry 
> until, the scan is completed
> 
> Hi folks,
>I tested it on my DELL XPS desktop. And it won't show any warnings
>when I mounting pstore and deleting pstore items after this patch
>applied.
> 
>Tested-by: Madper Xie
> c...@redhat.com writes:
> 
> > Oops, It seems my mu4e(a email client for emacs)'s auto-indent breaks the
> > patch... I apologize for this...
> >
> > seiji.agu...@hds.com writes:
> >
> >> Hi Madper,
> >>
> >> I tested this patch on 3.12-rc4.
> >> Could you please send me the log when you failed to apply?
> >>
> >> Seiji
> >>
> >>> -Original Message-
> >>> From: Madper Xie [mailto:c...@redhat.com]
> >>> Sent: Thursday, October 17, 2013 1:54 AM
> >>> To: Seiji Aguchi
> >>> Cc: linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; 
> >>> matt.flem...@intel.com; tony.l...@intel.com; Tomoki Sekiyama;
> dle-
> >>> deve...@lists.sourceforge.net
> >>> Subject: Re: [PATCH v3] efivars,efi-pstore: Hold off deletion of sysfs 
> >>> entry until, the scan is completed
> >>>
> >>> Howdy Seiji,
> >>>   I failed appily this patch to both 3.12-rc2 and 3.12-rc4. Could you
> >>>   please let me know which is the right tree for this patch?
> >>>
> >>>   Thanks,
> >>>   Madper.
> >>> seiji.agu...@hds.com writes:
> >>>
> >>> > Change from v2:
> >>> > - Move dynamic memory allocation to efi_pstore_read() before holding
> >>> >   efivars->lock to protect entry->var.Data.
> >>> > - Access to entry->scanning while holding efivars->lock.
> >>> > - Move a comment about a returned value from efi_pstore_read_func() to
> >>> >   efi_pstore_read() because "size < 0" case may happen in 
> >>> > efi_pstore_read().
> >>> >
> >>> > Currently, when mounting pstore file system, a read callback of 
> >>> > efi_pstore
> >>> > driver runs mutiple times as below.
> >>> >
> >>> > - In the first read callback, scan efivar_sysfs_list from head and pass
> >>> >   a kmsg buffer of a entry to an upper pstore layer.
> >>> > - In the second read callback, rescan efivar_sysfs_list from the entry 
> >>> > and pass
> >>> >   another kmsg buffer to it.
> >>> > - Repeat the scan and pass until the end of efivar_sysfs_list.
> >>> >
> >>> > In this process, an entry is read across the multiple read function 
> >>> > calls.
> >>> > To avoid race between the read and erasion, the whole process above is
> >>> > protected by a spinlock, holding in open() and releasing in close().
> >>> >
> >>> > At the same time, kmemdup() is called to pass the buffer to pstore 
> >>> > filesystem
> >>> > during it.
> >>> > And then, it causes a following lockdep warning.
> >>> >
> >>> > To make the dynamic memory allocation runnable without taking spinlock,
> >>> > holding off a deletion of sysfs entry if it happens while scanning it
> >>> > via efi_pstore, and deleting it after the scan is completed.
> >>> >
> >>> > To implement it, this patch introduces two flags, scanning and deleting,
> >>> > to efivar_entry.
> >>> >
> >>> > [1.143710] [ cut here ]
> >>> > [1.144058] WARNING: CPU: 1 PID: 1 at kernel/lockdep.c:2740
> >>> > lockdep_trace_alloc+0x104/0x110()
> >>> > [1.144058] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
> >>> > [1.144058] Modules linked in:
> >>> >
> >>> > [1.144058] CPU: 1 PID: 1 Comm: systemd Not tainted 3.11.0-rc5 #2
> >>> > [1.144058]  0009 88

RE: [PATCH v3] efivars,efi-pstore: Hold off deletion of sysfs entry until, the scan is completed

2013-10-17 Thread Seiji Aguchi
Hi Madper,

I tested this patch on 3.12-rc4.
Could you please send me the log when you failed to apply?

Seiji

> -Original Message-
> From: Madper Xie [mailto:c...@redhat.com]
> Sent: Thursday, October 17, 2013 1:54 AM
> To: Seiji Aguchi
> Cc: linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; 
> matt.flem...@intel.com; tony.l...@intel.com; Tomoki Sekiyama; dle-
> deve...@lists.sourceforge.net
> Subject: Re: [PATCH v3] efivars,efi-pstore: Hold off deletion of sysfs entry 
> until, the scan is completed
> 
> Howdy Seiji,
>   I failed appily this patch to both 3.12-rc2 and 3.12-rc4. Could you
>   please let me know which is the right tree for this patch?
> 
>   Thanks,
>   Madper.
> seiji.agu...@hds.com writes:
> 
> > Change from v2:
> > - Move dynamic memory allocation to efi_pstore_read() before holding
> >   efivars->lock to protect entry->var.Data.
> > - Access to entry->scanning while holding efivars->lock.
> > - Move a comment about a returned value from efi_pstore_read_func() to
> >   efi_pstore_read() because "size < 0" case may happen in efi_pstore_read().
> >
> > Currently, when mounting pstore file system, a read callback of efi_pstore
> > driver runs mutiple times as below.
> >
> > - In the first read callback, scan efivar_sysfs_list from head and pass
> >   a kmsg buffer of a entry to an upper pstore layer.
> > - In the second read callback, rescan efivar_sysfs_list from the entry and 
> > pass
> >   another kmsg buffer to it.
> > - Repeat the scan and pass until the end of efivar_sysfs_list.
> >
> > In this process, an entry is read across the multiple read function calls.
> > To avoid race between the read and erasion, the whole process above is
> > protected by a spinlock, holding in open() and releasing in close().
> >
> > At the same time, kmemdup() is called to pass the buffer to pstore 
> > filesystem
> > during it.
> > And then, it causes a following lockdep warning.
> >
> > To make the dynamic memory allocation runnable without taking spinlock,
> > holding off a deletion of sysfs entry if it happens while scanning it
> > via efi_pstore, and deleting it after the scan is completed.
> >
> > To implement it, this patch introduces two flags, scanning and deleting,
> > to efivar_entry.
> >
> > [1.143710] [ cut here ]
> > [1.144058] WARNING: CPU: 1 PID: 1 at kernel/lockdep.c:2740
> > lockdep_trace_alloc+0x104/0x110()
> > [1.144058] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
> > [1.144058] Modules linked in:
> >
> > [1.144058] CPU: 1 PID: 1 Comm: systemd Not tainted 3.11.0-rc5 #2
> > [1.144058]  0009 8800797e9ae0 816614a5
> > 8800797e9b28
> > [1.144058]  8800797e9b18 8105510d 0080
> > 0046
> > [1.144058]  00d0 03af 81ccd0c0
> > 8800797e9b78
> > [1.144058] Call Trace:
> > [1.144058]  [] dump_stack+0x54/0x74
> > [1.144058]  [] warn_slowpath_common+0x7d/0xa0
> > [1.144058]  [] warn_slowpath_fmt+0x4c/0x50
> > [1.144058]  [] ? vsscanf+0x57f/0x7b0
> > [1.144058]  [] lockdep_trace_alloc+0x104/0x110
> > [1.144058]  [] __kmalloc_track_caller+0x50/0x280
> > [1.144058]  [] ?
> > efi_pstore_read_func.part.1+0x12b/0x170
> > [1.144058]  [] kmemdup+0x20/0x50
> > [1.144058]  [] efi_pstore_read_func.part.1+0x12b/0x170
> > [1.144058]  [] ?
> > efi_pstore_read_func.part.1+0x170/0x170
> > [1.144058]  [] efi_pstore_read_func+0xb4/0xe0
> > [1.144058]  [] __efivar_entry_iter+0xfb/0x120
> > [1.144058]  [] efi_pstore_read+0x3f/0x50
> > [1.144058]  [] pstore_get_records+0x9a/0x150
> > [1.158207]  [] ? selinux_d_instantiate+0x1c/0x20
> > [1.158207]  [] ? parse_options+0x80/0x80
> > [1.158207]  [] pstore_fill_super+0xa5/0xc0
> > [1.158207]  [] mount_single+0xa2/0xd0
> > [1.158207]  [] pstore_mount+0x18/0x20
> > [1.158207]  [] mount_fs+0x39/0x1b0
> > [1.158207]  [] ? __alloc_percpu+0x10/0x20
> > [1.158207]  [] vfs_kern_mount+0x63/0xf0
> > [1.158207]  [] do_mount+0x23e/0xa20
> > [1.158207]  [] ? strndup_user+0x4b/0xf0
> > [1.158207]  [] SyS_mount+0x83/0xc0
> > [1.158207]  [] system_call_fastpath+0x16/0x1b
> > [1.158207] ---[ end trace 61981bc62de9f6f4 ]---
> >
> > Signed-off-by: Seiji Aguchi 
> > ---
> >  drivers/firmware/efi/efi-pstore.c | 143 
> > 

RE: [PATCH v3] efivars,efi-pstore: Hold off deletion of sysfs entry until, the scan is completed

2013-10-17 Thread Seiji Aguchi
Hi Madper,

I tested this patch on 3.12-rc4.
Could you please send me the log when you failed to apply?

Seiji

 -Original Message-
 From: Madper Xie [mailto:c...@redhat.com]
 Sent: Thursday, October 17, 2013 1:54 AM
 To: Seiji Aguchi
 Cc: linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; 
 matt.flem...@intel.com; tony.l...@intel.com; Tomoki Sekiyama; dle-
 deve...@lists.sourceforge.net
 Subject: Re: [PATCH v3] efivars,efi-pstore: Hold off deletion of sysfs entry 
 until, the scan is completed
 
 Howdy Seiji,
   I failed appily this patch to both 3.12-rc2 and 3.12-rc4. Could you
   please let me know which is the right tree for this patch?
 
   Thanks,
   Madper.
 seiji.agu...@hds.com writes:
 
  Change from v2:
  - Move dynamic memory allocation to efi_pstore_read() before holding
efivars-lock to protect entry-var.Data.
  - Access to entry-scanning while holding efivars-lock.
  - Move a comment about a returned value from efi_pstore_read_func() to
efi_pstore_read() because size  0 case may happen in efi_pstore_read().
 
  Currently, when mounting pstore file system, a read callback of efi_pstore
  driver runs mutiple times as below.
 
  - In the first read callback, scan efivar_sysfs_list from head and pass
a kmsg buffer of a entry to an upper pstore layer.
  - In the second read callback, rescan efivar_sysfs_list from the entry and 
  pass
another kmsg buffer to it.
  - Repeat the scan and pass until the end of efivar_sysfs_list.
 
  In this process, an entry is read across the multiple read function calls.
  To avoid race between the read and erasion, the whole process above is
  protected by a spinlock, holding in open() and releasing in close().
 
  At the same time, kmemdup() is called to pass the buffer to pstore 
  filesystem
  during it.
  And then, it causes a following lockdep warning.
 
  To make the dynamic memory allocation runnable without taking spinlock,
  holding off a deletion of sysfs entry if it happens while scanning it
  via efi_pstore, and deleting it after the scan is completed.
 
  To implement it, this patch introduces two flags, scanning and deleting,
  to efivar_entry.
 
  [1.143710] [ cut here ]
  [1.144058] WARNING: CPU: 1 PID: 1 at kernel/lockdep.c:2740
  lockdep_trace_alloc+0x104/0x110()
  [1.144058] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
  [1.144058] Modules linked in:
 
  [1.144058] CPU: 1 PID: 1 Comm: systemd Not tainted 3.11.0-rc5 #2
  [1.144058]  0009 8800797e9ae0 816614a5
  8800797e9b28
  [1.144058]  8800797e9b18 8105510d 0080
  0046
  [1.144058]  00d0 03af 81ccd0c0
  8800797e9b78
  [1.144058] Call Trace:
  [1.144058]  [816614a5] dump_stack+0x54/0x74
  [1.144058]  [8105510d] warn_slowpath_common+0x7d/0xa0
  [1.144058]  [8105517c] warn_slowpath_fmt+0x4c/0x50
  [1.144058]  [8131290f] ? vsscanf+0x57f/0x7b0
  [1.144058]  [810bbd74] lockdep_trace_alloc+0x104/0x110
  [1.144058]  [81192da0] __kmalloc_track_caller+0x50/0x280
  [1.144058]  [815147bb] ?
  efi_pstore_read_func.part.1+0x12b/0x170
  [1.144058]  [8115b260] kmemdup+0x20/0x50
  [1.144058]  [815147bb] efi_pstore_read_func.part.1+0x12b/0x170
  [1.144058]  [81514800] ?
  efi_pstore_read_func.part.1+0x170/0x170
  [1.144058]  [815148b4] efi_pstore_read_func+0xb4/0xe0
  [1.144058]  [81512b7b] __efivar_entry_iter+0xfb/0x120
  [1.144058]  [8151428f] efi_pstore_read+0x3f/0x50
  [1.144058]  [8128d7ba] pstore_get_records+0x9a/0x150
  [1.158207]  [812af25c] ? selinux_d_instantiate+0x1c/0x20
  [1.158207]  [8128ce30] ? parse_options+0x80/0x80
  [1.158207]  [8128ced5] pstore_fill_super+0xa5/0xc0
  [1.158207]  [811ae7d2] mount_single+0xa2/0xd0
  [1.158207]  [8128ccf8] pstore_mount+0x18/0x20
  [1.158207]  [811ae8b9] mount_fs+0x39/0x1b0
  [1.158207]  [81160550] ? __alloc_percpu+0x10/0x20
  [1.158207]  [811c9493] vfs_kern_mount+0x63/0xf0
  [1.158207]  [811cbb0e] do_mount+0x23e/0xa20
  [1.158207]  [8115b51b] ? strndup_user+0x4b/0xf0
  [1.158207]  [811cc373] SyS_mount+0x83/0xc0
  [1.158207]  [81673cc2] system_call_fastpath+0x16/0x1b
  [1.158207] ---[ end trace 61981bc62de9f6f4 ]---
 
  Signed-off-by: Seiji Aguchi seiji.agu...@hds.com
  ---
   drivers/firmware/efi/efi-pstore.c | 143 
  +++---
   drivers/firmware/efi/efivars.c|  12 ++--
   drivers/firmware/efi/vars.c   |  12 +++-
   include/linux/efi.h   |   2 +
   4 files changed, 153 insertions(+), 16 deletions(-)
 
  diff --git a/drivers/firmware/efi/efi-pstore.c 
  b/drivers/firmware/efi/efi-pstore.c
  index 5002d50..ebd5dbc

RE: [PATCH v3] efivars,efi-pstore: Hold off deletion of sysfs entry until, the scan is completed

2013-10-17 Thread Seiji Aguchi
Hi Madper,

Thank you for assisting me.

But, I need to discuss the implementation with Matt more.
After the discussion, I will post v4 and ask you to test it.

Please wait for a while.

Seiji

 -Original Message-
 From: Madper Xie [mailto:c...@redhat.com]
 Sent: Thursday, October 17, 2013 11:07 AM
 To: Madper Xie
 Cc: Seiji Aguchi; linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; 
 matt.flem...@intel.com; tony.l...@intel.com; Tomoki
 Sekiyama; dle-deve...@lists.sourceforge.net
 Subject: Re: [PATCH v3] efivars,efi-pstore: Hold off deletion of sysfs entry 
 until, the scan is completed
 
 Hi folks,
I tested it on my DELL XPS desktop. And it won't show any warnings
when I mounting pstore and deleting pstore items after this patch
applied.
 
Tested-by: Madper Xie
 c...@redhat.com writes:
 
  Oops, It seems my mu4e(a email client for emacs)'s auto-indent breaks the
  patch... I apologize for this...
 
  seiji.agu...@hds.com writes:
 
  Hi Madper,
 
  I tested this patch on 3.12-rc4.
  Could you please send me the log when you failed to apply?
 
  Seiji
 
  -Original Message-
  From: Madper Xie [mailto:c...@redhat.com]
  Sent: Thursday, October 17, 2013 1:54 AM
  To: Seiji Aguchi
  Cc: linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; 
  matt.flem...@intel.com; tony.l...@intel.com; Tomoki Sekiyama;
 dle-
  deve...@lists.sourceforge.net
  Subject: Re: [PATCH v3] efivars,efi-pstore: Hold off deletion of sysfs 
  entry until, the scan is completed
 
  Howdy Seiji,
I failed appily this patch to both 3.12-rc2 and 3.12-rc4. Could you
please let me know which is the right tree for this patch?
 
Thanks,
Madper.
  seiji.agu...@hds.com writes:
 
   Change from v2:
   - Move dynamic memory allocation to efi_pstore_read() before holding
 efivars-lock to protect entry-var.Data.
   - Access to entry-scanning while holding efivars-lock.
   - Move a comment about a returned value from efi_pstore_read_func() to
 efi_pstore_read() because size  0 case may happen in 
   efi_pstore_read().
  
   Currently, when mounting pstore file system, a read callback of 
   efi_pstore
   driver runs mutiple times as below.
  
   - In the first read callback, scan efivar_sysfs_list from head and pass
 a kmsg buffer of a entry to an upper pstore layer.
   - In the second read callback, rescan efivar_sysfs_list from the entry 
   and pass
 another kmsg buffer to it.
   - Repeat the scan and pass until the end of efivar_sysfs_list.
  
   In this process, an entry is read across the multiple read function 
   calls.
   To avoid race between the read and erasion, the whole process above is
   protected by a spinlock, holding in open() and releasing in close().
  
   At the same time, kmemdup() is called to pass the buffer to pstore 
   filesystem
   during it.
   And then, it causes a following lockdep warning.
  
   To make the dynamic memory allocation runnable without taking spinlock,
   holding off a deletion of sysfs entry if it happens while scanning it
   via efi_pstore, and deleting it after the scan is completed.
  
   To implement it, this patch introduces two flags, scanning and deleting,
   to efivar_entry.
  
   [1.143710] [ cut here ]
   [1.144058] WARNING: CPU: 1 PID: 1 at kernel/lockdep.c:2740
   lockdep_trace_alloc+0x104/0x110()
   [1.144058] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
   [1.144058] Modules linked in:
  
   [1.144058] CPU: 1 PID: 1 Comm: systemd Not tainted 3.11.0-rc5 #2
   [1.144058]  0009 8800797e9ae0 816614a5
   8800797e9b28
   [1.144058]  8800797e9b18 8105510d 0080
   0046
   [1.144058]  00d0 03af 81ccd0c0
   8800797e9b78
   [1.144058] Call Trace:
   [1.144058]  [816614a5] dump_stack+0x54/0x74
   [1.144058]  [8105510d] warn_slowpath_common+0x7d/0xa0
   [1.144058]  [8105517c] warn_slowpath_fmt+0x4c/0x50
   [1.144058]  [8131290f] ? vsscanf+0x57f/0x7b0
   [1.144058]  [810bbd74] lockdep_trace_alloc+0x104/0x110
   [1.144058]  [81192da0] __kmalloc_track_caller+0x50/0x280
   [1.144058]  [815147bb] ?
   efi_pstore_read_func.part.1+0x12b/0x170
   [1.144058]  [8115b260] kmemdup+0x20/0x50
   [1.144058]  [815147bb] 
   efi_pstore_read_func.part.1+0x12b/0x170
   [1.144058]  [81514800] ?
   efi_pstore_read_func.part.1+0x170/0x170
   [1.144058]  [815148b4] efi_pstore_read_func+0xb4/0xe0
   [1.144058]  [81512b7b] __efivar_entry_iter+0xfb/0x120
   [1.144058]  [8151428f] efi_pstore_read+0x3f/0x50
   [1.144058]  [8128d7ba] pstore_get_records+0x9a/0x150
   [1.158207]  [812af25c] ? selinux_d_instantiate+0x1c/0x20
   [1.158207]  [8128ce30] ? parse_options+0x80/0x80
   [1.158207]  [8128ced5

RE: [PATCH 5/5] efi: Capsule update support and pstore backend

2013-10-16 Thread Seiji Aguchi
> There's also an "either/or" choice between using efi-capsule with pstore, and 
> the
> traditional kexec/kdump method for getting a memory dump from a crash.  We
> have to go through a reset to save the capsule - but we don't want a reset for
> kexec.   Perhaps we can pass the reset parameters through the kexec path to
> the new kernel to make it do the right kind of reset ... but the value of the 
> capsule
> dump is minimal if kdump managed to dump everything.

Tony,

I tried to log kmsg into the kexec path months ago.
It was rejected due to the impelementation problem.

But, as Eric said, it should be OK if it is implemented in the kdump kenel.

http://marc.info/?l=linux-kernel=136917686732183=2

The only problem with kdump here is the implementation in the initial
ram disk.  Fixing the initial ramdisk so it logs to kmsg before it
touches scarier hardware should be the solution.

Seiji
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 2/5] efi: Introduce a Runtime Services lock

2013-10-16 Thread Seiji Aguchi
> +#define efi_call_reset_virt(f, args...)  
> \
> +({   \
> + unsigned long __flags;  \
> + bool __nmi = in_nmi();  \
> + \
> + if (__nmi)  \
> + spin_lock_irqsave(_runtime_lock, __flags);  \

If the lock is not held in the nmi context, runtime service may run concurrently
in non-nmi context as follows.
 - In nmi context, cpu0 calls a runtime service (no lock is held.)
 - In non-nmi context, cpu1 call can take the lock and call the runtime service.

To avoid this, using try_lock in nmi context is better..

If(in_nmi())
try_spin_lock_irqsave();
else
spin_lock_irqsave();

Please see the commit abd4d5587be911f63592537284dad78766d97d62,
which is introduced to pstore by Don Zickus.

Seiji



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v3] Introduce page fault tracepoint

2013-10-16 Thread Seiji Aguchi
Thank you for reviewing.

> > http://marc.info/?l=linux-mm=136807959830182=2
> > http://marc.info/?l=linux-mm=136807959130175=2
> >
> 
> For permanence, please use links of the form:
> 
>   http://lkml.kernel.org/r/message-id
> 
> (Yes, they currently point to marc.info, but can be redirected to point
> to any archive.)

I will fix it.

> This is needlessly confusing, which is apart of why reviewing this patch
> took a lot more time than it should.
> 
> Please break this patch into two: one which sets up the tracing IDT and
> one to create the #PF tracepoint.  The assumption is, I am assuming,
> there will be more.

OK. I will divide the patch into two or more to make the review smooth.

Seiji

> 
>   -hpa



RE: [PATCH v3] Introduce page fault tracepoint

2013-10-16 Thread Seiji Aguchi
Thank you for reviewing.

  http://marc.info/?l=linux-mmm=136807959830182w=2
  http://marc.info/?l=linux-mmm=136807959130175w=2
 
 
 For permanence, please use links of the form:
 
   http://lkml.kernel.org/r/message-id
 
 (Yes, they currently point to marc.info, but can be redirected to point
 to any archive.)

I will fix it.

 This is needlessly confusing, which is apart of why reviewing this patch
 took a lot more time than it should.
 
 Please break this patch into two: one which sets up the tracing IDT and
 one to create the #PF tracepoint.  The assumption is, I am assuming,
 there will be more.

OK. I will divide the patch into two or more to make the review smooth.

Seiji

 
   -hpa



RE: [PATCH 2/5] efi: Introduce a Runtime Services lock

2013-10-16 Thread Seiji Aguchi
 +#define efi_call_reset_virt(f, args...)  
 \
 +({   \
 + unsigned long __flags;  \
 + bool __nmi = in_nmi();  \
 + \
 + if (__nmi)  \
 + spin_lock_irqsave(efi_runtime_lock, __flags);  \

If the lock is not held in the nmi context, runtime service may run concurrently
in non-nmi context as follows.
 - In nmi context, cpu0 calls a runtime service (no lock is held.)
 - In non-nmi context, cpu1 call can take the lock and call the runtime service.

To avoid this, using try_lock in nmi context is better..

If(in_nmi())
try_spin_lock_irqsave();
else
spin_lock_irqsave();

Please see the commit abd4d5587be911f63592537284dad78766d97d62,
which is introduced to pstore by Don Zickus.

Seiji



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 5/5] efi: Capsule update support and pstore backend

2013-10-16 Thread Seiji Aguchi
 There's also an either/or choice between using efi-capsule with pstore, and 
 the
 traditional kexec/kdump method for getting a memory dump from a crash.  We
 have to go through a reset to save the capsule - but we don't want a reset for
 kexec.   Perhaps we can pass the reset parameters through the kexec path to
 the new kernel to make it do the right kind of reset ... but the value of the 
 capsule
 dump is minimal if kdump managed to dump everything.

Tony,

I tried to log kmsg into the kexec path months ago.
It was rejected due to the impelementation problem.

But, as Eric said, it should be OK if it is implemented in the kdump kenel.

http://marc.info/?l=linux-kernelm=136917686732183w=2

The only problem with kdump here is the implementation in the initial
ram disk.  Fixing the initial ramdisk so it logs to kmsg before it
touches scarier hardware should be the solution.

Seiji
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v3] Introduce page fault tracepoint

2013-10-11 Thread Seiji Aguchi
Peter,

Any comment?

Seiji

> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org 
> [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Seiji Aguchi
> Sent: Monday, September 09, 2013 5:56 PM
> To: linux-kernel@vger.kernel.org; x...@kernel.org
> Cc: h...@zytor.com; rost...@goodmis.org; mi...@elte.hu; b...@alien8.de; 
> t...@linutronix.de; fdesl...@gmail.com;
> raphael.beamo...@gmail.com; dle-deve...@lists.sourceforge.net; Tomoki Sekiyama
> Subject: [PATCH v3] Introduce page fault tracepoint
> 
> Change from v2
>  - Print entry->ip instead of entry->regs->ip to avoid kernel crash.
>  - Use %pf instead of 0x%lx to print address and ip.
> 
> This patch introduces page fault tracepoints to x86 architecture
> by switching IDT.
> 
> [Use case of page fault events]
> 
>   Two events, for user and kernel spaces, are introduced at the beginning of
>   page fault handler.
> 
>   - User space event
> There is a request of page fault event for user space as below.
> 
> http://marc.info/?l=linux-mm=136807959830182=2
> http://marc.info/?l=linux-mm=136807959130175=2
> 
>   - Kernel space event:
> Overhead in kernel space is measurable by enabling it.
> 
> [Creating IDT]
> 
>  A way to create IDT is as below.
> 
>  - Introduce set_intr_gate_raw() to register just non-trace handler to IDT.
>This is used at boot time which tracing is disabled.
>  - Make set_intr_gate() macro so that it can register trace handler to
>trace IDT and non-trace handler to normal IDT.
> 
> Signed-off-by: Seiji Aguchi 
> ---
>  arch/x86/include/asm/desc.h | 33 +
>  arch/x86/include/asm/hw_irq.h   | 14 -
>  arch/x86/include/asm/trace/exceptions.h | 52 
> +
>  arch/x86/include/asm/traps.h| 22 ++
>  arch/x86/kernel/entry_32.S  | 10 +++
>  arch/x86/kernel/entry_64.S  | 13 -
>  arch/x86/kernel/head64.c|  2 +-
>  arch/x86/kernel/irqinit.c   |  2 +-
>  arch/x86/kernel/kvm.c   |  2 +-
>  arch/x86/kernel/traps.c | 28 +-
>  arch/x86/mm/Makefile|  2 ++
>  arch/x86/mm/fault.c | 22 ++
>  12 files changed, 178 insertions(+), 24 deletions(-)
>  create mode 100644 arch/x86/include/asm/trace/exceptions.h
> 
> diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
> index b90e5df..c04302b 100644
> --- a/arch/x86/include/asm/desc.h
> +++ b/arch/x86/include/asm/desc.h
> @@ -327,10 +327,28 @@ static inline void write_trace_idt_entry(int entry, 
> const gate_desc *gate)
>  {
>   write_idt_entry(trace_idt_table, entry, gate);
>  }
> +
> +static inline void _trace_set_gate(int gate, unsigned type, void *addr,
> +unsigned dpl, unsigned ist, unsigned seg)
> +{
> + gate_desc s;
> +
> + pack_gate(, type, (unsigned long)addr, dpl, ist, seg);
> + /*
> +  * does not need to be atomic because it is only done once at
> +  * setup time
> +  */
> + write_trace_idt_entry(gate, );
> +}
>  #else
>  static inline void write_trace_idt_entry(int entry, const gate_desc *gate)
>  {
>  }
> +
> +static inline void _trace_set_gate(int gate, unsigned type, void *addr,
> +unsigned dpl, unsigned ist, unsigned seg)
> +{
> +}
>  #endif
> 
>  static inline void _set_gate(int gate, unsigned type, void *addr,
> @@ -353,12 +371,20 @@ static inline void _set_gate(int gate, unsigned type, 
> void *addr,
>   * Pentium F0 0F bugfix can have resulted in the mapped
>   * IDT being write-protected.
>   */
> -static inline void set_intr_gate(unsigned int n, void *addr)
> +static inline void set_intr_gate_raw(unsigned int n, void *addr)
>  {
>   BUG_ON((unsigned)n > 0xFF);
>   _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS);
>  }
> 
> +#define set_intr_gate(n, addr)   
> \
> + do {\
> + BUG_ON((unsigned)n > 0xFF); \
> + _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS);  \
> + _trace_set_gate(n, GATE_INTERRUPT, trace_##addr, 0, 0,  \
> + __KERNEL_CS);   \
> + } while (0)
> +
>  extern int first_system_vector;
>  /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */
>  extern unsigned long used_vectors[];
> @@ -395,10 +421,7 @@ static inline 

RE: [RFC][PATCH v2] efivars,efi-pstore: Hold off deletion of sysfs entry until the scan is completed

2013-10-11 Thread Seiji Aguchi
Matt,

I submitted a v3 patch based on my comment below..

Seiji

> -Original Message-
> From: linux-efi-ow...@vger.kernel.org 
> [mailto:linux-efi-ow...@vger.kernel.org] On Behalf Of Seiji Aguchi
> Sent: Wednesday, October 09, 2013 12:37 PM
> To: Matt Fleming
> Cc: linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; 
> tony.l...@intel.com; matt.flem...@intel.com; dle-
> deve...@lists.sourceforge.net; Tomoki Sekiyama
> Subject: RE: [RFC][PATCH v2] efivars,efi-pstore: Hold off deletion of sysfs 
> entry until the scan is completed
> 
> Thank you for reviewing.
> In my understanding, your point is that all accesses to efivar_entry should 
> be done while holding __efivars->lock.
> 
> > > @@ -88,8 +103,9 @@ static int efi_pstore_read_func(struct efivar_entry 
> > > *entry, void *data)
> > >   return 0;
> > >
> > >   entry->var.DataSize = 1024;
> > > - __efivar_entry_get(entry, >var.Attributes,
> > > ->var.DataSize, entry->var.Data);
> > > + efivar_entry_get(entry, >var.Attributes,
> > > +  >var.DataSize, entry->var.Data);
> > > +
> > >   size = entry->var.DataSize;
> > >
> > >   *cb_data->buf = kmemdup(entry->var.Data, size, GFP_KERNEL);
> >
> > This isn't safe to do without holding the __efivars->lock, because
> > there's the potential for someone else to update entry->var.Data and
> > entry->var.DataSize while you're in the middle of copying the data in
> > kmemdup(). This could leak to an information leak, though I think you're
> > safe from an out-of-bounds access because DataSize is never > 1024.
> >
> 
> I see...
> Bu, kmemdup() cannot be called while holding the spinlock.
> 
> So, for protecting efivar_entry, I will call kzalloc() before holding the 
> lock in efi_pstore_read().
> and use memcpy() in efi_pstore_read_func().
> 
> The pseudo code is as below.
> 
>   static ssize_t efi_pstore_read(u64 *id, enum pstore_type_id *type,
>  struct pstore_info *psi)
>   {
>   *data.buf = kzalloc(1024, GFP_KERNEL);
>   if (!*data.buf)
>   return -ENOMEM;
> 
>   efivar_entry_iter_begin();
>   size = efi_pstore_sysfs_entry_iter(,
>  (struct efivar_entry 
> **)>data);
>   efivar_entry_iter_end();
>   if (size <= 0)
>   kfree(*data.buf);
>   return size;
>   }
> 
>   static int efi_pstore_read_func(struct efivar_entry *entry, void *data)
>   {
>   [...]
>   entry->var.DataSize = 1024;
>   __efivar_entry_get(entry, >var.Attributes,
>>var.DataSize, entry->var.Data);
> 
>   size = entry->var.DataSize;
>   memcpy(*cb_data->buf, entry->var.Data, (size_t)min_t(unsigned 
> long,
>1024, 
> size));
>   return size;
>   }
> 
> 
> > This doesn't look correct to me. You can't access 'entry' outside of the
> > *_iter_begin() and *_iter_end() blocks. You can't do,
> >
> > efivar_entry_iter_end():
> >
> > if (!entry->scanning)
> > efivar_unregister(entry);
> >
> > because 'entry' may have already been freed by another CPU.
> 
>  I will fix it as follows.
> 
>   if (!entry->scanning) {
>   efivar_entry_iter_end();
>   efivar_unregister(entry);
>   }  else
>   efivar_entry_iter_end();
> 
> (efivar_unregister(entry) still runs concurrently.
> But, it cannot move inside spinlock because kzalloc() may run while freeing 
> kobject.)
> 
> Is it your expectation?
> 
> Seiji
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-efi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3] efivars,efi-pstore: Hold off deletion of sysfs entry until, the scan is completed

2013-10-11 Thread Seiji Aguchi
Change from v2:
- Move dynamic memory allocation to efi_pstore_read() before holding
  efivars->lock to protect entry->var.Data.
- Access to entry->scanning while holding efivars->lock.
- Move a comment about a returned value from efi_pstore_read_func() to
  efi_pstore_read() because "size < 0" case may happen in efi_pstore_read().

Currently, when mounting pstore file system, a read callback of efi_pstore
driver runs mutiple times as below.

- In the first read callback, scan efivar_sysfs_list from head and pass
  a kmsg buffer of a entry to an upper pstore layer.
- In the second read callback, rescan efivar_sysfs_list from the entry and pass
  another kmsg buffer to it.
- Repeat the scan and pass until the end of efivar_sysfs_list.

In this process, an entry is read across the multiple read function calls.
To avoid race between the read and erasion, the whole process above is
protected by a spinlock, holding in open() and releasing in close().

At the same time, kmemdup() is called to pass the buffer to pstore filesystem
during it.
And then, it causes a following lockdep warning.

To make the dynamic memory allocation runnable without taking spinlock,
holding off a deletion of sysfs entry if it happens while scanning it
via efi_pstore, and deleting it after the scan is completed.

To implement it, this patch introduces two flags, scanning and deleting,
to efivar_entry.

[1.143710] [ cut here ]
[1.144058] WARNING: CPU: 1 PID: 1 at kernel/lockdep.c:2740
lockdep_trace_alloc+0x104/0x110()
[1.144058] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
[1.144058] Modules linked in:

[1.144058] CPU: 1 PID: 1 Comm: systemd Not tainted 3.11.0-rc5 #2
[1.144058]  0009 8800797e9ae0 816614a5
8800797e9b28
[1.144058]  8800797e9b18 8105510d 0080
0046
[1.144058]  00d0 03af 81ccd0c0
8800797e9b78
[1.144058] Call Trace:
[1.144058]  [] dump_stack+0x54/0x74
[1.144058]  [] warn_slowpath_common+0x7d/0xa0
[1.144058]  [] warn_slowpath_fmt+0x4c/0x50
[1.144058]  [] ? vsscanf+0x57f/0x7b0
[1.144058]  [] lockdep_trace_alloc+0x104/0x110
[1.144058]  [] __kmalloc_track_caller+0x50/0x280
[1.144058]  [] ?
efi_pstore_read_func.part.1+0x12b/0x170
[1.144058]  [] kmemdup+0x20/0x50
[1.144058]  [] efi_pstore_read_func.part.1+0x12b/0x170
[1.144058]  [] ?
efi_pstore_read_func.part.1+0x170/0x170
[1.144058]  [] efi_pstore_read_func+0xb4/0xe0
[1.144058]  [] __efivar_entry_iter+0xfb/0x120
[1.144058]  [] efi_pstore_read+0x3f/0x50
[1.144058]  [] pstore_get_records+0x9a/0x150
[1.158207]  [] ? selinux_d_instantiate+0x1c/0x20
[1.158207]  [] ? parse_options+0x80/0x80
[1.158207]  [] pstore_fill_super+0xa5/0xc0
[1.158207]  [] mount_single+0xa2/0xd0
[1.158207]  [] pstore_mount+0x18/0x20
[1.158207]  [] mount_fs+0x39/0x1b0
[1.158207]  [] ? __alloc_percpu+0x10/0x20
[1.158207]  [] vfs_kern_mount+0x63/0xf0
[1.158207]  [] do_mount+0x23e/0xa20
[1.158207]  [] ? strndup_user+0x4b/0xf0
[1.158207]  [] SyS_mount+0x83/0xc0
[1.158207]  [] system_call_fastpath+0x16/0x1b
[1.158207] ---[ end trace 61981bc62de9f6f4 ]---

Signed-off-by: Seiji Aguchi 
---
 drivers/firmware/efi/efi-pstore.c | 143 +++---
 drivers/firmware/efi/efivars.c|  12 ++--
 drivers/firmware/efi/vars.c   |  12 +++-
 include/linux/efi.h   |   2 +
 4 files changed, 153 insertions(+), 16 deletions(-)

diff --git a/drivers/firmware/efi/efi-pstore.c 
b/drivers/firmware/efi/efi-pstore.c
index 5002d50..ebd5dbc 100644
--- a/drivers/firmware/efi/efi-pstore.c
+++ b/drivers/firmware/efi/efi-pstore.c
@@ -18,14 +18,12 @@ module_param_named(pstore_disable, efivars_pstore_disable, 
bool, 0644);
 
 static int efi_pstore_open(struct pstore_info *psi)
 {
-   efivar_entry_iter_begin();
psi->data = NULL;
return 0;
 }
 
 static int efi_pstore_close(struct pstore_info *psi)
 {
-   efivar_entry_iter_end();
psi->data = NULL;
return 0;
 }
@@ -91,19 +89,125 @@ static int efi_pstore_read_func(struct efivar_entry 
*entry, void *data)
__efivar_entry_get(entry, >var.Attributes,
   >var.DataSize, entry->var.Data);
size = entry->var.DataSize;
+   memcpy(*cb_data->buf, entry->var.Data, (size_t)min_t(unsigned long,
+1024, size));
 
-   *cb_data->buf = kmemdup(entry->var.Data, size, GFP_KERNEL);
-   if (*cb_data->buf == NULL)
-   return -ENOMEM;
return size;
 }
 
+/**
+ * efi_pstore_scan_sysfs_enter
+ * @entry: scanning entry
+ * @next: next entry
+ * @head: list head
+ */
+static void efi_pstore_scan_sysfs_enter(struct efivar_entry *pos,
+   struct efivar_entry *next

[PATCH v3] efivars,efi-pstore: Hold off deletion of sysfs entry until, the scan is completed

2013-10-11 Thread Seiji Aguchi
Change from v2:
- Move dynamic memory allocation to efi_pstore_read() before holding
  efivars-lock to protect entry-var.Data.
- Access to entry-scanning while holding efivars-lock.
- Move a comment about a returned value from efi_pstore_read_func() to
  efi_pstore_read() because size  0 case may happen in efi_pstore_read().

Currently, when mounting pstore file system, a read callback of efi_pstore
driver runs mutiple times as below.

- In the first read callback, scan efivar_sysfs_list from head and pass
  a kmsg buffer of a entry to an upper pstore layer.
- In the second read callback, rescan efivar_sysfs_list from the entry and pass
  another kmsg buffer to it.
- Repeat the scan and pass until the end of efivar_sysfs_list.

In this process, an entry is read across the multiple read function calls.
To avoid race between the read and erasion, the whole process above is
protected by a spinlock, holding in open() and releasing in close().

At the same time, kmemdup() is called to pass the buffer to pstore filesystem
during it.
And then, it causes a following lockdep warning.

To make the dynamic memory allocation runnable without taking spinlock,
holding off a deletion of sysfs entry if it happens while scanning it
via efi_pstore, and deleting it after the scan is completed.

To implement it, this patch introduces two flags, scanning and deleting,
to efivar_entry.

[1.143710] [ cut here ]
[1.144058] WARNING: CPU: 1 PID: 1 at kernel/lockdep.c:2740
lockdep_trace_alloc+0x104/0x110()
[1.144058] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
[1.144058] Modules linked in:

[1.144058] CPU: 1 PID: 1 Comm: systemd Not tainted 3.11.0-rc5 #2
[1.144058]  0009 8800797e9ae0 816614a5
8800797e9b28
[1.144058]  8800797e9b18 8105510d 0080
0046
[1.144058]  00d0 03af 81ccd0c0
8800797e9b78
[1.144058] Call Trace:
[1.144058]  [816614a5] dump_stack+0x54/0x74
[1.144058]  [8105510d] warn_slowpath_common+0x7d/0xa0
[1.144058]  [8105517c] warn_slowpath_fmt+0x4c/0x50
[1.144058]  [8131290f] ? vsscanf+0x57f/0x7b0
[1.144058]  [810bbd74] lockdep_trace_alloc+0x104/0x110
[1.144058]  [81192da0] __kmalloc_track_caller+0x50/0x280
[1.144058]  [815147bb] ?
efi_pstore_read_func.part.1+0x12b/0x170
[1.144058]  [8115b260] kmemdup+0x20/0x50
[1.144058]  [815147bb] efi_pstore_read_func.part.1+0x12b/0x170
[1.144058]  [81514800] ?
efi_pstore_read_func.part.1+0x170/0x170
[1.144058]  [815148b4] efi_pstore_read_func+0xb4/0xe0
[1.144058]  [81512b7b] __efivar_entry_iter+0xfb/0x120
[1.144058]  [8151428f] efi_pstore_read+0x3f/0x50
[1.144058]  [8128d7ba] pstore_get_records+0x9a/0x150
[1.158207]  [812af25c] ? selinux_d_instantiate+0x1c/0x20
[1.158207]  [8128ce30] ? parse_options+0x80/0x80
[1.158207]  [8128ced5] pstore_fill_super+0xa5/0xc0
[1.158207]  [811ae7d2] mount_single+0xa2/0xd0
[1.158207]  [8128ccf8] pstore_mount+0x18/0x20
[1.158207]  [811ae8b9] mount_fs+0x39/0x1b0
[1.158207]  [81160550] ? __alloc_percpu+0x10/0x20
[1.158207]  [811c9493] vfs_kern_mount+0x63/0xf0
[1.158207]  [811cbb0e] do_mount+0x23e/0xa20
[1.158207]  [8115b51b] ? strndup_user+0x4b/0xf0
[1.158207]  [811cc373] SyS_mount+0x83/0xc0
[1.158207]  [81673cc2] system_call_fastpath+0x16/0x1b
[1.158207] ---[ end trace 61981bc62de9f6f4 ]---

Signed-off-by: Seiji Aguchi seiji.agu...@hds.com
---
 drivers/firmware/efi/efi-pstore.c | 143 +++---
 drivers/firmware/efi/efivars.c|  12 ++--
 drivers/firmware/efi/vars.c   |  12 +++-
 include/linux/efi.h   |   2 +
 4 files changed, 153 insertions(+), 16 deletions(-)

diff --git a/drivers/firmware/efi/efi-pstore.c 
b/drivers/firmware/efi/efi-pstore.c
index 5002d50..ebd5dbc 100644
--- a/drivers/firmware/efi/efi-pstore.c
+++ b/drivers/firmware/efi/efi-pstore.c
@@ -18,14 +18,12 @@ module_param_named(pstore_disable, efivars_pstore_disable, 
bool, 0644);
 
 static int efi_pstore_open(struct pstore_info *psi)
 {
-   efivar_entry_iter_begin();
psi-data = NULL;
return 0;
 }
 
 static int efi_pstore_close(struct pstore_info *psi)
 {
-   efivar_entry_iter_end();
psi-data = NULL;
return 0;
 }
@@ -91,19 +89,125 @@ static int efi_pstore_read_func(struct efivar_entry 
*entry, void *data)
__efivar_entry_get(entry, entry-var.Attributes,
   entry-var.DataSize, entry-var.Data);
size = entry-var.DataSize;
+   memcpy(*cb_data-buf, entry-var.Data, (size_t)min_t(unsigned long,
+1024, size));
 
-   *cb_data-buf = kmemdup

RE: [RFC][PATCH v2] efivars,efi-pstore: Hold off deletion of sysfs entry until the scan is completed

2013-10-11 Thread Seiji Aguchi
Matt,

I submitted a v3 patch based on my comment below..

Seiji

 -Original Message-
 From: linux-efi-ow...@vger.kernel.org 
 [mailto:linux-efi-ow...@vger.kernel.org] On Behalf Of Seiji Aguchi
 Sent: Wednesday, October 09, 2013 12:37 PM
 To: Matt Fleming
 Cc: linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; 
 tony.l...@intel.com; matt.flem...@intel.com; dle-
 deve...@lists.sourceforge.net; Tomoki Sekiyama
 Subject: RE: [RFC][PATCH v2] efivars,efi-pstore: Hold off deletion of sysfs 
 entry until the scan is completed
 
 Thank you for reviewing.
 In my understanding, your point is that all accesses to efivar_entry should 
 be done while holding __efivars-lock.
 
   @@ -88,8 +103,9 @@ static int efi_pstore_read_func(struct efivar_entry 
   *entry, void *data)
 return 0;
  
 entry-var.DataSize = 1024;
   - __efivar_entry_get(entry, entry-var.Attributes,
   -entry-var.DataSize, entry-var.Data);
   + efivar_entry_get(entry, entry-var.Attributes,
   +  entry-var.DataSize, entry-var.Data);
   +
 size = entry-var.DataSize;
  
 *cb_data-buf = kmemdup(entry-var.Data, size, GFP_KERNEL);
 
  This isn't safe to do without holding the __efivars-lock, because
  there's the potential for someone else to update entry-var.Data and
  entry-var.DataSize while you're in the middle of copying the data in
  kmemdup(). This could leak to an information leak, though I think you're
  safe from an out-of-bounds access because DataSize is never  1024.
 
 
 I see...
 Bu, kmemdup() cannot be called while holding the spinlock.
 
 So, for protecting efivar_entry, I will call kzalloc() before holding the 
 lock in efi_pstore_read().
 and use memcpy() in efi_pstore_read_func().
 
 The pseudo code is as below.
 
   static ssize_t efi_pstore_read(u64 *id, enum pstore_type_id *type,
  struct pstore_info *psi)
   {
   *data.buf = kzalloc(1024, GFP_KERNEL);
   if (!*data.buf)
   return -ENOMEM;
 
   efivar_entry_iter_begin();
   size = efi_pstore_sysfs_entry_iter(data,
  (struct efivar_entry 
 **)psi-data);
   efivar_entry_iter_end();
   if (size = 0)
   kfree(*data.buf);
   return size;
   }
 
   static int efi_pstore_read_func(struct efivar_entry *entry, void *data)
   {
   [...]
   entry-var.DataSize = 1024;
   __efivar_entry_get(entry, entry-var.Attributes,
entry-var.DataSize, entry-var.Data);
 
   size = entry-var.DataSize;
   memcpy(*cb_data-buf, entry-var.Data, (size_t)min_t(unsigned 
 long,
1024, 
 size));
   return size;
   }
 
 
  This doesn't look correct to me. You can't access 'entry' outside of the
  *_iter_begin() and *_iter_end() blocks. You can't do,
 
  efivar_entry_iter_end():
 
  if (!entry-scanning)
  efivar_unregister(entry);
 
  because 'entry' may have already been freed by another CPU.
 
  I will fix it as follows.
 
   if (!entry-scanning) {
   efivar_entry_iter_end();
   efivar_unregister(entry);
   }  else
   efivar_entry_iter_end();
 
 (efivar_unregister(entry) still runs concurrently.
 But, it cannot move inside spinlock because kzalloc() may run while freeing 
 kobject.)
 
 Is it your expectation?
 
 Seiji
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-efi in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v3] Introduce page fault tracepoint

2013-10-11 Thread Seiji Aguchi
Peter,

Any comment?

Seiji

 -Original Message-
 From: linux-kernel-ow...@vger.kernel.org 
 [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Seiji Aguchi
 Sent: Monday, September 09, 2013 5:56 PM
 To: linux-kernel@vger.kernel.org; x...@kernel.org
 Cc: h...@zytor.com; rost...@goodmis.org; mi...@elte.hu; b...@alien8.de; 
 t...@linutronix.de; fdesl...@gmail.com;
 raphael.beamo...@gmail.com; dle-deve...@lists.sourceforge.net; Tomoki Sekiyama
 Subject: [PATCH v3] Introduce page fault tracepoint
 
 Change from v2
  - Print entry-ip instead of entry-regs-ip to avoid kernel crash.
  - Use %pf instead of 0x%lx to print address and ip.
 
 This patch introduces page fault tracepoints to x86 architecture
 by switching IDT.
 
 [Use case of page fault events]
 
   Two events, for user and kernel spaces, are introduced at the beginning of
   page fault handler.
 
   - User space event
 There is a request of page fault event for user space as below.
 
 http://marc.info/?l=linux-mmm=136807959830182w=2
 http://marc.info/?l=linux-mmm=136807959130175w=2
 
   - Kernel space event:
 Overhead in kernel space is measurable by enabling it.
 
 [Creating IDT]
 
  A way to create IDT is as below.
 
  - Introduce set_intr_gate_raw() to register just non-trace handler to IDT.
This is used at boot time which tracing is disabled.
  - Make set_intr_gate() macro so that it can register trace handler to
trace IDT and non-trace handler to normal IDT.
 
 Signed-off-by: Seiji Aguchi seiji.agu...@hds.com
 ---
  arch/x86/include/asm/desc.h | 33 +
  arch/x86/include/asm/hw_irq.h   | 14 -
  arch/x86/include/asm/trace/exceptions.h | 52 
 +
  arch/x86/include/asm/traps.h| 22 ++
  arch/x86/kernel/entry_32.S  | 10 +++
  arch/x86/kernel/entry_64.S  | 13 -
  arch/x86/kernel/head64.c|  2 +-
  arch/x86/kernel/irqinit.c   |  2 +-
  arch/x86/kernel/kvm.c   |  2 +-
  arch/x86/kernel/traps.c | 28 +-
  arch/x86/mm/Makefile|  2 ++
  arch/x86/mm/fault.c | 22 ++
  12 files changed, 178 insertions(+), 24 deletions(-)
  create mode 100644 arch/x86/include/asm/trace/exceptions.h
 
 diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
 index b90e5df..c04302b 100644
 --- a/arch/x86/include/asm/desc.h
 +++ b/arch/x86/include/asm/desc.h
 @@ -327,10 +327,28 @@ static inline void write_trace_idt_entry(int entry, 
 const gate_desc *gate)
  {
   write_idt_entry(trace_idt_table, entry, gate);
  }
 +
 +static inline void _trace_set_gate(int gate, unsigned type, void *addr,
 +unsigned dpl, unsigned ist, unsigned seg)
 +{
 + gate_desc s;
 +
 + pack_gate(s, type, (unsigned long)addr, dpl, ist, seg);
 + /*
 +  * does not need to be atomic because it is only done once at
 +  * setup time
 +  */
 + write_trace_idt_entry(gate, s);
 +}
  #else
  static inline void write_trace_idt_entry(int entry, const gate_desc *gate)
  {
  }
 +
 +static inline void _trace_set_gate(int gate, unsigned type, void *addr,
 +unsigned dpl, unsigned ist, unsigned seg)
 +{
 +}
  #endif
 
  static inline void _set_gate(int gate, unsigned type, void *addr,
 @@ -353,12 +371,20 @@ static inline void _set_gate(int gate, unsigned type, 
 void *addr,
   * Pentium F0 0F bugfix can have resulted in the mapped
   * IDT being write-protected.
   */
 -static inline void set_intr_gate(unsigned int n, void *addr)
 +static inline void set_intr_gate_raw(unsigned int n, void *addr)
  {
   BUG_ON((unsigned)n  0xFF);
   _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS);
  }
 
 +#define set_intr_gate(n, addr)   
 \
 + do {\
 + BUG_ON((unsigned)n  0xFF); \
 + _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS);  \
 + _trace_set_gate(n, GATE_INTERRUPT, trace_##addr, 0, 0,  \
 + __KERNEL_CS);   \
 + } while (0)
 +
  extern int first_system_vector;
  /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */
  extern unsigned long used_vectors[];
 @@ -395,10 +421,7 @@ static inline void trace_set_intr_gate(unsigned int 
 gate, void *addr)
  #define __trace_alloc_intr_gate(n, addr)
  #endif
 
 -static inline void __alloc_intr_gate(unsigned int n, void *addr)
 -{
 - set_intr_gate(n, addr);
 -}
 +#define __alloc_intr_gate(n, addr) set_intr_gate(n, addr)
 
  #define alloc_intr_gate(n, addr) \
   do {\
 diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h

RE: [RFC][PATCH v2] efivars,efi-pstore: Hold off deletion of sysfs entry until the scan is completed

2013-10-09 Thread Seiji Aguchi
Thank you for reviewing.
In my understanding, your point is that all accesses to efivar_entry should be 
done while holding __efivars->lock.

> > @@ -88,8 +103,9 @@ static int efi_pstore_read_func(struct efivar_entry 
> > *entry, void *data)
> > return 0;
> >
> > entry->var.DataSize = 1024;
> > -   __efivar_entry_get(entry, >var.Attributes,
> > -  >var.DataSize, entry->var.Data);
> > +   efivar_entry_get(entry, >var.Attributes,
> > +>var.DataSize, entry->var.Data);
> > +
> > size = entry->var.DataSize;
> >
> > *cb_data->buf = kmemdup(entry->var.Data, size, GFP_KERNEL);
> 
> This isn't safe to do without holding the __efivars->lock, because
> there's the potential for someone else to update entry->var.Data and
> entry->var.DataSize while you're in the middle of copying the data in
> kmemdup(). This could leak to an information leak, though I think you're
> safe from an out-of-bounds access because DataSize is never > 1024.
> 

I see...
Bu, kmemdup() cannot be called while holding the spinlock.

So, for protecting efivar_entry, I will call kzalloc() before holding the lock 
in efi_pstore_read().
and use memcpy() in efi_pstore_read_func().

The pseudo code is as below.

static ssize_t efi_pstore_read(u64 *id, enum pstore_type_id *type,
   struct pstore_info *psi)
{
*data.buf = kzalloc(1024, GFP_KERNEL);
if (!*data.buf)
return -ENOMEM;

efivar_entry_iter_begin();
size = efi_pstore_sysfs_entry_iter(,
   (struct efivar_entry 
**)>data);
efivar_entry_iter_end();
if (size <= 0)
kfree(*data.buf);
return size;
}

static int efi_pstore_read_func(struct efivar_entry *entry, void *data)
{
[...]
entry->var.DataSize = 1024;
__efivar_entry_get(entry, >var.Attributes,
 >var.DataSize, entry->var.Data);

size = entry->var.DataSize;
memcpy(*cb_data->buf, entry->var.Data, (size_t)min_t(unsigned 
long,
 1024, 
size)); 
return size;
}


> This doesn't look correct to me. You can't access 'entry' outside of the
> *_iter_begin() and *_iter_end() blocks. You can't do,
> 
>   efivar_entry_iter_end():
> 
>   if (!entry->scanning)
>   efivar_unregister(entry);
> 
> because 'entry' may have already been freed by another CPU.

 I will fix it as follows.

if (!entry->scanning) {
efivar_entry_iter_end();
efivar_unregister(entry);
}  else
efivar_entry_iter_end();

(efivar_unregister(entry) still runs concurrently.
But, it cannot move inside spinlock because kzalloc() may run while freeing 
kobject.)

Is it your expectation?

Seiji

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC][PATCH v2] efivars,efi-pstore: Hold off deletion of sysfs entry until the scan is completed

2013-10-09 Thread Seiji Aguchi
Thank you for reviewing.
In my understanding, your point is that all accesses to efivar_entry should be 
done while holding __efivars-lock.

  @@ -88,8 +103,9 @@ static int efi_pstore_read_func(struct efivar_entry 
  *entry, void *data)
  return 0;
 
  entry-var.DataSize = 1024;
  -   __efivar_entry_get(entry, entry-var.Attributes,
  -  entry-var.DataSize, entry-var.Data);
  +   efivar_entry_get(entry, entry-var.Attributes,
  +entry-var.DataSize, entry-var.Data);
  +
  size = entry-var.DataSize;
 
  *cb_data-buf = kmemdup(entry-var.Data, size, GFP_KERNEL);
 
 This isn't safe to do without holding the __efivars-lock, because
 there's the potential for someone else to update entry-var.Data and
 entry-var.DataSize while you're in the middle of copying the data in
 kmemdup(). This could leak to an information leak, though I think you're
 safe from an out-of-bounds access because DataSize is never  1024.
 

I see...
Bu, kmemdup() cannot be called while holding the spinlock.

So, for protecting efivar_entry, I will call kzalloc() before holding the lock 
in efi_pstore_read().
and use memcpy() in efi_pstore_read_func().

The pseudo code is as below.

static ssize_t efi_pstore_read(u64 *id, enum pstore_type_id *type,
   struct pstore_info *psi)
{
*data.buf = kzalloc(1024, GFP_KERNEL);
if (!*data.buf)
return -ENOMEM;

efivar_entry_iter_begin();
size = efi_pstore_sysfs_entry_iter(data,
   (struct efivar_entry 
**)psi-data);
efivar_entry_iter_end();
if (size = 0)
kfree(*data.buf);
return size;
}

static int efi_pstore_read_func(struct efivar_entry *entry, void *data)
{
[...]
entry-var.DataSize = 1024;
__efivar_entry_get(entry, entry-var.Attributes,
 entry-var.DataSize, entry-var.Data);

size = entry-var.DataSize;
memcpy(*cb_data-buf, entry-var.Data, (size_t)min_t(unsigned 
long,
 1024, 
size)); 
return size;
}


 This doesn't look correct to me. You can't access 'entry' outside of the
 *_iter_begin() and *_iter_end() blocks. You can't do,
 
   efivar_entry_iter_end():
 
   if (!entry-scanning)
   efivar_unregister(entry);
 
 because 'entry' may have already been freed by another CPU.

 I will fix it as follows.

if (!entry-scanning) {
efivar_entry_iter_end();
efivar_unregister(entry);
}  else
efivar_entry_iter_end();

(efivar_unregister(entry) still runs concurrently.
But, it cannot move inside spinlock because kzalloc() may run while freeing 
kobject.)

Is it your expectation?

Seiji

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC][PATCH v2] efivars,efi-pstore: Hold off deletion of sysfs entry until the scan is completed

2013-10-04 Thread Seiji Aguchi
Are there anyone who can review this bugfix?

Seiji

> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org 
> [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Seiji Aguchi
> Sent: Friday, September 27, 2013 4:24 PM
> To: linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; 
> tony.l...@intel.com; matt.flem...@intel.com
> Cc: dle-deve...@lists.sourceforge.net; Tomoki Sekiyama
> Subject: [RFC][PATCH v2] efivars,efi-pstore: Hold off deletion of sysfs entry 
> until the scan is completed
> 
> Change form v1
>  - Rebase to 3.12-rc2
> 
> Currently, when mounting pstore file system, a read callback of efi_pstore
> driver runs mutiple times as below.
> 
> - In the first read callback, scan efivar_sysfs_list from head and pass
>   a kmsg buffer of a entry to an upper pstore layer.
> - In the second read callback, rescan efivar_sysfs_list from the entry and 
> pass
>   another kmsg buffer to it.
> - Repeat the scan and pass until the end of efivar_sysfs_list.
> 
> In this process, an entry is read across the multiple read function calls.
> To avoid race between the read and erasion, the whole process above is
> protected by a spinlock, holding in open() and releasing in close().
> 
> At the same time, kmemdup() is called to pass the buffer to pstore filesystem
> during it.
> And then, it causes a following lockdep warning.
> 
> To make the read callback runnable without taking spinlok,
> holding off a deletion of sysfs entry if it happens while scanning it
> via efi_pstore, and deleting it after the scan is completed.
> 
> To implement it, this patch introduces two flags, scanning and deleting,
> to efivar_entry.
> Also, __efivar_entry_get() is removed because it was used in efi_pstore only.
> 
> [1.143710] [ cut here ]
> [1.144058] WARNING: CPU: 1 PID: 1 at kernel/lockdep.c:2740
> lockdep_trace_alloc+0x104/0x110()
> [1.144058] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
> [1.144058] Modules linked in:
> 
> [1.144058] CPU: 1 PID: 1 Comm: systemd Not tainted 3.11.0-rc5 #2
> [1.144058]  0009 8800797e9ae0 816614a5
> 8800797e9b28
> [1.144058]  8800797e9b18 8105510d 0080
> 0046
> [1.144058]  00d0 03af 81ccd0c0
> 8800797e9b78
> [1.144058] Call Trace:
> [1.144058]  [] dump_stack+0x54/0x74
> [1.144058]  [] warn_slowpath_common+0x7d/0xa0
> [1.144058]  [] warn_slowpath_fmt+0x4c/0x50
> [1.144058]  [] ? vsscanf+0x57f/0x7b0
> [1.144058]  [] lockdep_trace_alloc+0x104/0x110
> [1.144058]  [] __kmalloc_track_caller+0x50/0x280
> [1.144058]  [] ?
> efi_pstore_read_func.part.1+0x12b/0x170
> [1.144058]  [] kmemdup+0x20/0x50
> [1.144058]  [] efi_pstore_read_func.part.1+0x12b/0x170
> [1.144058]  [] ?
> efi_pstore_read_func.part.1+0x170/0x170
> [1.144058]  [] efi_pstore_read_func+0xb4/0xe0
> [1.144058]  [] __efivar_entry_iter+0xfb/0x120
> [1.144058]  [] efi_pstore_read+0x3f/0x50
> [1.144058]  [] pstore_get_records+0x9a/0x150
> [1.158207]  [] ? selinux_d_instantiate+0x1c/0x20
> [1.158207]  [] ? parse_options+0x80/0x80
> [1.158207]  [] pstore_fill_super+0xa5/0xc0
> [1.158207]  [] mount_single+0xa2/0xd0
> [1.158207]  [] pstore_mount+0x18/0x20
> [1.158207]  [] mount_fs+0x39/0x1b0
> [1.158207]  [] ? __alloc_percpu+0x10/0x20
> [1.158207]  [] vfs_kern_mount+0x63/0xf0
> [1.158207]  [] do_mount+0x23e/0xa20
> [1.158207]  [] ? strndup_user+0x4b/0xf0
> [1.158207]  [] SyS_mount+0x83/0xc0
> [1.158207]  [] system_call_fastpath+0x16/0x1b
> [1.158207] ---[ end trace 61981bc62de9f6f4 ]---
> 
> Signed-off-by: Seiji Aguchi 
> ---
>  drivers/firmware/efi/efi-pstore.c | 145 
> +++---
>  drivers/firmware/efi/efivars.c|   3 +-
>  drivers/firmware/efi/vars.c   |  39 +++---
>  include/linux/efi.h   |   4 +-
>  4 files changed, 151 insertions(+), 40 deletions(-)
> 
> diff --git a/drivers/firmware/efi/efi-pstore.c 
> b/drivers/firmware/efi/efi-pstore.c
> index 5002d50..53001a5 100644
> --- a/drivers/firmware/efi/efi-pstore.c
> +++ b/drivers/firmware/efi/efi-pstore.c
> @@ -18,14 +18,12 @@ module_param_named(pstore_disable, 
> efivars_pstore_disable, bool, 0644);
> 
>  static int efi_pstore_open(struct pstore_info *psi)
>  {
> - efivar_entry_iter_begin();
>   psi->data = NULL;
>   return 0;
>  }
> 
>  static int efi_pstore_close(struct pstore_info *psi)
>  {
> - efivar_entry_iter_end();
>   psi->data = NULL;
>   return 0;
>  }
> @@ -39,6 +37,23

RE: [RFC][PATCH v2] efivars,efi-pstore: Hold off deletion of sysfs entry until the scan is completed

2013-10-04 Thread Seiji Aguchi
Are there anyone who can review this bugfix?

Seiji

 -Original Message-
 From: linux-kernel-ow...@vger.kernel.org 
 [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Seiji Aguchi
 Sent: Friday, September 27, 2013 4:24 PM
 To: linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; 
 tony.l...@intel.com; matt.flem...@intel.com
 Cc: dle-deve...@lists.sourceforge.net; Tomoki Sekiyama
 Subject: [RFC][PATCH v2] efivars,efi-pstore: Hold off deletion of sysfs entry 
 until the scan is completed
 
 Change form v1
  - Rebase to 3.12-rc2
 
 Currently, when mounting pstore file system, a read callback of efi_pstore
 driver runs mutiple times as below.
 
 - In the first read callback, scan efivar_sysfs_list from head and pass
   a kmsg buffer of a entry to an upper pstore layer.
 - In the second read callback, rescan efivar_sysfs_list from the entry and 
 pass
   another kmsg buffer to it.
 - Repeat the scan and pass until the end of efivar_sysfs_list.
 
 In this process, an entry is read across the multiple read function calls.
 To avoid race between the read and erasion, the whole process above is
 protected by a spinlock, holding in open() and releasing in close().
 
 At the same time, kmemdup() is called to pass the buffer to pstore filesystem
 during it.
 And then, it causes a following lockdep warning.
 
 To make the read callback runnable without taking spinlok,
 holding off a deletion of sysfs entry if it happens while scanning it
 via efi_pstore, and deleting it after the scan is completed.
 
 To implement it, this patch introduces two flags, scanning and deleting,
 to efivar_entry.
 Also, __efivar_entry_get() is removed because it was used in efi_pstore only.
 
 [1.143710] [ cut here ]
 [1.144058] WARNING: CPU: 1 PID: 1 at kernel/lockdep.c:2740
 lockdep_trace_alloc+0x104/0x110()
 [1.144058] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
 [1.144058] Modules linked in:
 
 [1.144058] CPU: 1 PID: 1 Comm: systemd Not tainted 3.11.0-rc5 #2
 [1.144058]  0009 8800797e9ae0 816614a5
 8800797e9b28
 [1.144058]  8800797e9b18 8105510d 0080
 0046
 [1.144058]  00d0 03af 81ccd0c0
 8800797e9b78
 [1.144058] Call Trace:
 [1.144058]  [816614a5] dump_stack+0x54/0x74
 [1.144058]  [8105510d] warn_slowpath_common+0x7d/0xa0
 [1.144058]  [8105517c] warn_slowpath_fmt+0x4c/0x50
 [1.144058]  [8131290f] ? vsscanf+0x57f/0x7b0
 [1.144058]  [810bbd74] lockdep_trace_alloc+0x104/0x110
 [1.144058]  [81192da0] __kmalloc_track_caller+0x50/0x280
 [1.144058]  [815147bb] ?
 efi_pstore_read_func.part.1+0x12b/0x170
 [1.144058]  [8115b260] kmemdup+0x20/0x50
 [1.144058]  [815147bb] efi_pstore_read_func.part.1+0x12b/0x170
 [1.144058]  [81514800] ?
 efi_pstore_read_func.part.1+0x170/0x170
 [1.144058]  [815148b4] efi_pstore_read_func+0xb4/0xe0
 [1.144058]  [81512b7b] __efivar_entry_iter+0xfb/0x120
 [1.144058]  [8151428f] efi_pstore_read+0x3f/0x50
 [1.144058]  [8128d7ba] pstore_get_records+0x9a/0x150
 [1.158207]  [812af25c] ? selinux_d_instantiate+0x1c/0x20
 [1.158207]  [8128ce30] ? parse_options+0x80/0x80
 [1.158207]  [8128ced5] pstore_fill_super+0xa5/0xc0
 [1.158207]  [811ae7d2] mount_single+0xa2/0xd0
 [1.158207]  [8128ccf8] pstore_mount+0x18/0x20
 [1.158207]  [811ae8b9] mount_fs+0x39/0x1b0
 [1.158207]  [81160550] ? __alloc_percpu+0x10/0x20
 [1.158207]  [811c9493] vfs_kern_mount+0x63/0xf0
 [1.158207]  [811cbb0e] do_mount+0x23e/0xa20
 [1.158207]  [8115b51b] ? strndup_user+0x4b/0xf0
 [1.158207]  [811cc373] SyS_mount+0x83/0xc0
 [1.158207]  [81673cc2] system_call_fastpath+0x16/0x1b
 [1.158207] ---[ end trace 61981bc62de9f6f4 ]---
 
 Signed-off-by: Seiji Aguchi seiji.agu...@hds.com
 ---
  drivers/firmware/efi/efi-pstore.c | 145 
 +++---
  drivers/firmware/efi/efivars.c|   3 +-
  drivers/firmware/efi/vars.c   |  39 +++---
  include/linux/efi.h   |   4 +-
  4 files changed, 151 insertions(+), 40 deletions(-)
 
 diff --git a/drivers/firmware/efi/efi-pstore.c 
 b/drivers/firmware/efi/efi-pstore.c
 index 5002d50..53001a5 100644
 --- a/drivers/firmware/efi/efi-pstore.c
 +++ b/drivers/firmware/efi/efi-pstore.c
 @@ -18,14 +18,12 @@ module_param_named(pstore_disable, 
 efivars_pstore_disable, bool, 0644);
 
  static int efi_pstore_open(struct pstore_info *psi)
  {
 - efivar_entry_iter_begin();
   psi-data = NULL;
   return 0;
  }
 
  static int efi_pstore_close(struct pstore_info *psi)
  {
 - efivar_entry_iter_end();
   psi-data = NULL;
   return 0;
  }
 @@ -39,6 +37,23 @@ struct pstore_read_data

[RFC][PATCH v2] efivars,efi-pstore: Hold off deletion of sysfs entry until the scan is completed

2013-09-27 Thread Seiji Aguchi
Change form v1
 - Rebase to 3.12-rc2

Currently, when mounting pstore file system, a read callback of efi_pstore
driver runs mutiple times as below.

- In the first read callback, scan efivar_sysfs_list from head and pass
  a kmsg buffer of a entry to an upper pstore layer.
- In the second read callback, rescan efivar_sysfs_list from the entry and pass
  another kmsg buffer to it.
- Repeat the scan and pass until the end of efivar_sysfs_list.

In this process, an entry is read across the multiple read function calls.
To avoid race between the read and erasion, the whole process above is
protected by a spinlock, holding in open() and releasing in close().

At the same time, kmemdup() is called to pass the buffer to pstore filesystem
during it.
And then, it causes a following lockdep warning.

To make the read callback runnable without taking spinlok,
holding off a deletion of sysfs entry if it happens while scanning it
via efi_pstore, and deleting it after the scan is completed.

To implement it, this patch introduces two flags, scanning and deleting,
to efivar_entry.
Also, __efivar_entry_get() is removed because it was used in efi_pstore only.

[1.143710] [ cut here ]
[1.144058] WARNING: CPU: 1 PID: 1 at kernel/lockdep.c:2740
lockdep_trace_alloc+0x104/0x110()
[1.144058] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
[1.144058] Modules linked in:

[1.144058] CPU: 1 PID: 1 Comm: systemd Not tainted 3.11.0-rc5 #2
[1.144058]  0009 8800797e9ae0 816614a5
8800797e9b28
[1.144058]  8800797e9b18 8105510d 0080
0046
[1.144058]  00d0 03af 81ccd0c0
8800797e9b78
[1.144058] Call Trace:
[1.144058]  [] dump_stack+0x54/0x74
[1.144058]  [] warn_slowpath_common+0x7d/0xa0
[1.144058]  [] warn_slowpath_fmt+0x4c/0x50
[1.144058]  [] ? vsscanf+0x57f/0x7b0
[1.144058]  [] lockdep_trace_alloc+0x104/0x110
[1.144058]  [] __kmalloc_track_caller+0x50/0x280
[1.144058]  [] ?
efi_pstore_read_func.part.1+0x12b/0x170
[1.144058]  [] kmemdup+0x20/0x50
[1.144058]  [] efi_pstore_read_func.part.1+0x12b/0x170
[1.144058]  [] ?
efi_pstore_read_func.part.1+0x170/0x170
[1.144058]  [] efi_pstore_read_func+0xb4/0xe0
[1.144058]  [] __efivar_entry_iter+0xfb/0x120
[1.144058]  [] efi_pstore_read+0x3f/0x50
[1.144058]  [] pstore_get_records+0x9a/0x150
[1.158207]  [] ? selinux_d_instantiate+0x1c/0x20
[1.158207]  [] ? parse_options+0x80/0x80
[1.158207]  [] pstore_fill_super+0xa5/0xc0
[1.158207]  [] mount_single+0xa2/0xd0
[1.158207]  [] pstore_mount+0x18/0x20
[1.158207]  [] mount_fs+0x39/0x1b0
[1.158207]  [] ? __alloc_percpu+0x10/0x20
[1.158207]  [] vfs_kern_mount+0x63/0xf0
[1.158207]  [] do_mount+0x23e/0xa20
[1.158207]  [] ? strndup_user+0x4b/0xf0
[1.158207]  [] SyS_mount+0x83/0xc0
[1.158207]  [] system_call_fastpath+0x16/0x1b
[1.158207] ---[ end trace 61981bc62de9f6f4 ]---

Signed-off-by: Seiji Aguchi 
---
 drivers/firmware/efi/efi-pstore.c | 145 +++---
 drivers/firmware/efi/efivars.c|   3 +-
 drivers/firmware/efi/vars.c   |  39 +++---
 include/linux/efi.h   |   4 +-
 4 files changed, 151 insertions(+), 40 deletions(-)

diff --git a/drivers/firmware/efi/efi-pstore.c 
b/drivers/firmware/efi/efi-pstore.c
index 5002d50..53001a5 100644
--- a/drivers/firmware/efi/efi-pstore.c
+++ b/drivers/firmware/efi/efi-pstore.c
@@ -18,14 +18,12 @@ module_param_named(pstore_disable, efivars_pstore_disable, 
bool, 0644);
 
 static int efi_pstore_open(struct pstore_info *psi)
 {
-   efivar_entry_iter_begin();
psi->data = NULL;
return 0;
 }
 
 static int efi_pstore_close(struct pstore_info *psi)
 {
-   efivar_entry_iter_end();
psi->data = NULL;
return 0;
 }
@@ -39,6 +37,23 @@ struct pstore_read_data {
char **buf;
 };
 
+/**
+ * efi_pstore_read_func
+ * @entry: reading entry
+ * @data:  data of the entry
+ *
+ * This function runs in non-atomic context.
+ *
+ * Also, it returns a size of NVRAM entry logged via efi_pstore_write().
+ * pstore in accordance with the returned value as below.
+ *
+ * size > 0: Got data of an entry logged via efi_pstore_write() successfully,
+ *   and pstore filesystem will continue reading subsequent entries.
+ * size == 0: Entry was not logged via efi_pstore_write(),
+ *and efi_pstore driver will continue reading subsequent entries.
+ * size < 0: Failed to get data of entry logging via efi_pstore_write(),
+ *   and pstore will stop reading entry.
+ */
 static int efi_pstore_read_func(struct efivar_entry *entry, void *data)
 {
efi_guid_t vendor = LINUX_EFI_CRASH_GUID;
@@ -88,8 +103,9 @@ static int efi_pstore_read_func(struct efivar_entry *entry, 
void *data)
return 0;
 
entry->var.DataSize = 1024;
-   __efivar_

[RFC][PATCH v2] efivars,efi-pstore: Hold off deletion of sysfs entry until the scan is completed

2013-09-27 Thread Seiji Aguchi
Change form v1
 - Rebase to 3.12-rc2

Currently, when mounting pstore file system, a read callback of efi_pstore
driver runs mutiple times as below.

- In the first read callback, scan efivar_sysfs_list from head and pass
  a kmsg buffer of a entry to an upper pstore layer.
- In the second read callback, rescan efivar_sysfs_list from the entry and pass
  another kmsg buffer to it.
- Repeat the scan and pass until the end of efivar_sysfs_list.

In this process, an entry is read across the multiple read function calls.
To avoid race between the read and erasion, the whole process above is
protected by a spinlock, holding in open() and releasing in close().

At the same time, kmemdup() is called to pass the buffer to pstore filesystem
during it.
And then, it causes a following lockdep warning.

To make the read callback runnable without taking spinlok,
holding off a deletion of sysfs entry if it happens while scanning it
via efi_pstore, and deleting it after the scan is completed.

To implement it, this patch introduces two flags, scanning and deleting,
to efivar_entry.
Also, __efivar_entry_get() is removed because it was used in efi_pstore only.

[1.143710] [ cut here ]
[1.144058] WARNING: CPU: 1 PID: 1 at kernel/lockdep.c:2740
lockdep_trace_alloc+0x104/0x110()
[1.144058] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
[1.144058] Modules linked in:

[1.144058] CPU: 1 PID: 1 Comm: systemd Not tainted 3.11.0-rc5 #2
[1.144058]  0009 8800797e9ae0 816614a5
8800797e9b28
[1.144058]  8800797e9b18 8105510d 0080
0046
[1.144058]  00d0 03af 81ccd0c0
8800797e9b78
[1.144058] Call Trace:
[1.144058]  [816614a5] dump_stack+0x54/0x74
[1.144058]  [8105510d] warn_slowpath_common+0x7d/0xa0
[1.144058]  [8105517c] warn_slowpath_fmt+0x4c/0x50
[1.144058]  [8131290f] ? vsscanf+0x57f/0x7b0
[1.144058]  [810bbd74] lockdep_trace_alloc+0x104/0x110
[1.144058]  [81192da0] __kmalloc_track_caller+0x50/0x280
[1.144058]  [815147bb] ?
efi_pstore_read_func.part.1+0x12b/0x170
[1.144058]  [8115b260] kmemdup+0x20/0x50
[1.144058]  [815147bb] efi_pstore_read_func.part.1+0x12b/0x170
[1.144058]  [81514800] ?
efi_pstore_read_func.part.1+0x170/0x170
[1.144058]  [815148b4] efi_pstore_read_func+0xb4/0xe0
[1.144058]  [81512b7b] __efivar_entry_iter+0xfb/0x120
[1.144058]  [8151428f] efi_pstore_read+0x3f/0x50
[1.144058]  [8128d7ba] pstore_get_records+0x9a/0x150
[1.158207]  [812af25c] ? selinux_d_instantiate+0x1c/0x20
[1.158207]  [8128ce30] ? parse_options+0x80/0x80
[1.158207]  [8128ced5] pstore_fill_super+0xa5/0xc0
[1.158207]  [811ae7d2] mount_single+0xa2/0xd0
[1.158207]  [8128ccf8] pstore_mount+0x18/0x20
[1.158207]  [811ae8b9] mount_fs+0x39/0x1b0
[1.158207]  [81160550] ? __alloc_percpu+0x10/0x20
[1.158207]  [811c9493] vfs_kern_mount+0x63/0xf0
[1.158207]  [811cbb0e] do_mount+0x23e/0xa20
[1.158207]  [8115b51b] ? strndup_user+0x4b/0xf0
[1.158207]  [811cc373] SyS_mount+0x83/0xc0
[1.158207]  [81673cc2] system_call_fastpath+0x16/0x1b
[1.158207] ---[ end trace 61981bc62de9f6f4 ]---

Signed-off-by: Seiji Aguchi seiji.agu...@hds.com
---
 drivers/firmware/efi/efi-pstore.c | 145 +++---
 drivers/firmware/efi/efivars.c|   3 +-
 drivers/firmware/efi/vars.c   |  39 +++---
 include/linux/efi.h   |   4 +-
 4 files changed, 151 insertions(+), 40 deletions(-)

diff --git a/drivers/firmware/efi/efi-pstore.c 
b/drivers/firmware/efi/efi-pstore.c
index 5002d50..53001a5 100644
--- a/drivers/firmware/efi/efi-pstore.c
+++ b/drivers/firmware/efi/efi-pstore.c
@@ -18,14 +18,12 @@ module_param_named(pstore_disable, efivars_pstore_disable, 
bool, 0644);
 
 static int efi_pstore_open(struct pstore_info *psi)
 {
-   efivar_entry_iter_begin();
psi-data = NULL;
return 0;
 }
 
 static int efi_pstore_close(struct pstore_info *psi)
 {
-   efivar_entry_iter_end();
psi-data = NULL;
return 0;
 }
@@ -39,6 +37,23 @@ struct pstore_read_data {
char **buf;
 };
 
+/**
+ * efi_pstore_read_func
+ * @entry: reading entry
+ * @data:  data of the entry
+ *
+ * This function runs in non-atomic context.
+ *
+ * Also, it returns a size of NVRAM entry logged via efi_pstore_write().
+ * pstore in accordance with the returned value as below.
+ *
+ * size  0: Got data of an entry logged via efi_pstore_write() successfully,
+ *   and pstore filesystem will continue reading subsequent entries.
+ * size == 0: Entry was not logged via efi_pstore_write(),
+ *and efi_pstore driver will continue reading subsequent entries.
+ * size  0

RE: [PATCH v2] pstore: Adjust buffer size for compression for smaller registered buffers

2013-09-12 Thread Seiji Aguchi
efivars works fine with this v2 patch.

Tested-by: Seiji Aguchi 

> -Original Message-
> From: Aruna Balakrishnaiah [mailto:ar...@linux.vnet.ibm.com]
> Sent: Thursday, September 12, 2013 2:51 AM
> To: linuxppc-...@ozlabs.org; tony.l...@intel.com; Seiji Aguchi; 
> linux-kernel@vger.kernel.org; keesc...@chromium.org
> Cc: jkeni...@linux.vnet.ibm.com; mah...@linux.vnet.ibm.com; 
> cbouatmai...@gmail.com; ana...@in.ibm.com; ccr...@android.com
> Subject: [PATCH v2] pstore: Adjust buffer size for compression for smaller 
> registered buffers
> 
> When backends (ex: efivars) have smaller registered buffers, the big_oops_buf
> is quite too big for them as number of repeated occurences in the text 
> captured
> will be less. Patch takes care of adjusting the buffer size based on the
> registered buffer size. cmpr values has been arrived after doing experiments 
> with
> plain text for buffers of size 1k - 4k (Smaller the buffer size repeated 
> occurence
> will be less) and with sample crash log for buffers ranging from 4k - 10k.
> 
> Reported-by: Seiji Aguchi 
> Signed-off-by: Aruna Balakrishnaiah 
> ---
> Changes from v1:
>   Retain the cmpr = 45 for buffers ranging of size 4k - 10k. 45 seems to 
> work.
> I added an additional headroom of 3%. Revert it back to 45.
> 
>  fs/pstore/platform.c |   23 ++-
>  1 file changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
> index 4ffb7ab..57b4219 100644
> --- a/fs/pstore/platform.c
> +++ b/fs/pstore/platform.c
> @@ -195,8 +195,29 @@ error:
>  static void allocate_buf_for_compression(void)
>  {
>   size_t size;
> + size_t cmpr;
> +
> + switch (psinfo->bufsize) {
> + /* buffer range for efivars */
> + case 1000 ... 2000:
> + cmpr = 56;
> + break;
> + case 2001 ... 3000:
> + cmpr = 54;
> + break;
> + case 3001 ... 3999:
> + cmpr = 52;
> + break;
> + /* buffer range for nvram, erst */
> + case 4000 ... 1:
> + cmpr = 45;
> + break;
> + default:
> + cmpr = 60;
> + break;
> + }
> 
> - big_oops_buf_sz = (psinfo->bufsize * 100) / 45;
> + big_oops_buf_sz = (psinfo->bufsize * 100) / cmpr;
>   big_oops_buf = kmalloc(big_oops_buf_sz, GFP_KERNEL);
>   if (big_oops_buf) {
>   size = max(zlib_deflate_workspacesize(WINDOW_BITS, MEM_LEVEL),

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH v2] pstore: Adjust buffer size for compression for smaller registered buffers

2013-09-12 Thread Seiji Aguchi
efivars works fine with this v2 patch.

Tested-by: Seiji Aguchi seiji.agu...@hds.com

 -Original Message-
 From: Aruna Balakrishnaiah [mailto:ar...@linux.vnet.ibm.com]
 Sent: Thursday, September 12, 2013 2:51 AM
 To: linuxppc-...@ozlabs.org; tony.l...@intel.com; Seiji Aguchi; 
 linux-kernel@vger.kernel.org; keesc...@chromium.org
 Cc: jkeni...@linux.vnet.ibm.com; mah...@linux.vnet.ibm.com; 
 cbouatmai...@gmail.com; ana...@in.ibm.com; ccr...@android.com
 Subject: [PATCH v2] pstore: Adjust buffer size for compression for smaller 
 registered buffers
 
 When backends (ex: efivars) have smaller registered buffers, the big_oops_buf
 is quite too big for them as number of repeated occurences in the text 
 captured
 will be less. Patch takes care of adjusting the buffer size based on the
 registered buffer size. cmpr values has been arrived after doing experiments 
 with
 plain text for buffers of size 1k - 4k (Smaller the buffer size repeated 
 occurence
 will be less) and with sample crash log for buffers ranging from 4k - 10k.
 
 Reported-by: Seiji Aguchi seiji.agu...@hds.com
 Signed-off-by: Aruna Balakrishnaiah ar...@linux.vnet.ibm.com
 ---
 Changes from v1:
   Retain the cmpr = 45 for buffers ranging of size 4k - 10k. 45 seems to 
 work.
 I added an additional headroom of 3%. Revert it back to 45.
 
  fs/pstore/platform.c |   23 ++-
  1 file changed, 22 insertions(+), 1 deletion(-)
 
 diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
 index 4ffb7ab..57b4219 100644
 --- a/fs/pstore/platform.c
 +++ b/fs/pstore/platform.c
 @@ -195,8 +195,29 @@ error:
  static void allocate_buf_for_compression(void)
  {
   size_t size;
 + size_t cmpr;
 +
 + switch (psinfo-bufsize) {
 + /* buffer range for efivars */
 + case 1000 ... 2000:
 + cmpr = 56;
 + break;
 + case 2001 ... 3000:
 + cmpr = 54;
 + break;
 + case 3001 ... 3999:
 + cmpr = 52;
 + break;
 + /* buffer range for nvram, erst */
 + case 4000 ... 1:
 + cmpr = 45;
 + break;
 + default:
 + cmpr = 60;
 + break;
 + }
 
 - big_oops_buf_sz = (psinfo-bufsize * 100) / 45;
 + big_oops_buf_sz = (psinfo-bufsize * 100) / cmpr;
   big_oops_buf = kmalloc(big_oops_buf_sz, GFP_KERNEL);
   if (big_oops_buf) {
   size = max(zlib_deflate_workspacesize(WINDOW_BITS, MEM_LEVEL),

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH 1/3] pstore: Adjust buffer size for compression for smaller registered buffers

2013-09-11 Thread Seiji Aguchi
>+  /* buffer range for efivars */
>+  case 1000 ... 2000:
>+  cmpr = 56;
>+  break;

> Seiji: let me know how the efivars tests go.

efivars works fine.
Uncompressed size about 1800 bytes. It matches the value of cmpr, 56. 

Please feel free to add my "Tested-by" to all three patches.

Tested-by: Seiji Aguchi 

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH 1/3] pstore: Adjust buffer size for compression for smaller registered buffers

2013-09-11 Thread Seiji Aguchi
+  /* buffer range for efivars */
+  case 1000 ... 2000:
+  cmpr = 56;
+  break;

 Seiji: let me know how the efivars tests go.

efivars works fine.
Uncompressed size about 1800 bytes. It matches the value of cmpr, 56. 

Please feel free to add my Tested-by to all three patches.

Tested-by: Seiji Aguchi seiji.agu...@hds.com

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

[PATCH v3] Introduce page fault tracepoint

2013-09-09 Thread Seiji Aguchi
Change from v2
 - Print entry->ip instead of entry->regs->ip to avoid kernel crash.
 - Use %pf instead of 0x%lx to print address and ip.

This patch introduces page fault tracepoints to x86 architecture
by switching IDT.

[Use case of page fault events]

  Two events, for user and kernel spaces, are introduced at the beginning of
  page fault handler.

  - User space event
There is a request of page fault event for user space as below.

http://marc.info/?l=linux-mm=136807959830182=2
http://marc.info/?l=linux-mm=136807959130175=2

  - Kernel space event:
Overhead in kernel space is measurable by enabling it.

[Creating IDT]

 A way to create IDT is as below.

 - Introduce set_intr_gate_raw() to register just non-trace handler to IDT.
   This is used at boot time which tracing is disabled.
 - Make set_intr_gate() macro so that it can register trace handler to
   trace IDT and non-trace handler to normal IDT.

Signed-off-by: Seiji Aguchi 
---
 arch/x86/include/asm/desc.h | 33 +
 arch/x86/include/asm/hw_irq.h   | 14 -
 arch/x86/include/asm/trace/exceptions.h | 52 +
 arch/x86/include/asm/traps.h| 22 ++
 arch/x86/kernel/entry_32.S  | 10 +++
 arch/x86/kernel/entry_64.S  | 13 -
 arch/x86/kernel/head64.c|  2 +-
 arch/x86/kernel/irqinit.c   |  2 +-
 arch/x86/kernel/kvm.c   |  2 +-
 arch/x86/kernel/traps.c | 28 +-
 arch/x86/mm/Makefile|  2 ++
 arch/x86/mm/fault.c | 22 ++
 12 files changed, 178 insertions(+), 24 deletions(-)
 create mode 100644 arch/x86/include/asm/trace/exceptions.h

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index b90e5df..c04302b 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -327,10 +327,28 @@ static inline void write_trace_idt_entry(int entry, const 
gate_desc *gate)
 {
write_idt_entry(trace_idt_table, entry, gate);
 }
+
+static inline void _trace_set_gate(int gate, unsigned type, void *addr,
+  unsigned dpl, unsigned ist, unsigned seg)
+{
+   gate_desc s;
+
+   pack_gate(, type, (unsigned long)addr, dpl, ist, seg);
+   /*
+* does not need to be atomic because it is only done once at
+* setup time
+*/
+   write_trace_idt_entry(gate, );
+}
 #else
 static inline void write_trace_idt_entry(int entry, const gate_desc *gate)
 {
 }
+
+static inline void _trace_set_gate(int gate, unsigned type, void *addr,
+  unsigned dpl, unsigned ist, unsigned seg)
+{
+}
 #endif
 
 static inline void _set_gate(int gate, unsigned type, void *addr,
@@ -353,12 +371,20 @@ static inline void _set_gate(int gate, unsigned type, 
void *addr,
  * Pentium F0 0F bugfix can have resulted in the mapped
  * IDT being write-protected.
  */
-static inline void set_intr_gate(unsigned int n, void *addr)
+static inline void set_intr_gate_raw(unsigned int n, void *addr)
 {
BUG_ON((unsigned)n > 0xFF);
_set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS);
 }
 
+#define set_intr_gate(n, addr) \
+   do {\
+   BUG_ON((unsigned)n > 0xFF); \
+   _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS);  \
+   _trace_set_gate(n, GATE_INTERRUPT, trace_##addr, 0, 0,  \
+   __KERNEL_CS);   \
+   } while (0)
+
 extern int first_system_vector;
 /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */
 extern unsigned long used_vectors[];
@@ -395,10 +421,7 @@ static inline void trace_set_intr_gate(unsigned int gate, 
void *addr)
 #define __trace_alloc_intr_gate(n, addr)
 #endif
 
-static inline void __alloc_intr_gate(unsigned int n, void *addr)
-{
-   set_intr_gate(n, addr);
-}
+#define __alloc_intr_gate(n, addr) set_intr_gate(n, addr)
 
 #define alloc_intr_gate(n, addr)   \
do {\
diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 92b3bae..c856e69 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -89,10 +89,22 @@ extern void trace_reschedule_interrupt(void);
 extern void trace_threshold_interrupt(void);
 extern void trace_call_function_interrupt(void);
 extern void trace_call_function_single_interrupt(void);
+#else /* CONFIG_TRACING */
+#define trace_apic_timer_interrupt apic_timer_interrupt
+#define trace_x86_platform_ipi x86_platform_ipi
+#define trace_error_interrupt error_interrupt
+#define trace_irq_work_interrupt irq_work_interrupt
+#define trace_spurious_interrupt spurious_int

[PATCH v3] Introduce page fault tracepoint

2013-09-09 Thread Seiji Aguchi
Change from v2
 - Print entry-ip instead of entry-regs-ip to avoid kernel crash.
 - Use %pf instead of 0x%lx to print address and ip.

This patch introduces page fault tracepoints to x86 architecture
by switching IDT.

[Use case of page fault events]

  Two events, for user and kernel spaces, are introduced at the beginning of
  page fault handler.

  - User space event
There is a request of page fault event for user space as below.

http://marc.info/?l=linux-mmm=136807959830182w=2
http://marc.info/?l=linux-mmm=136807959130175w=2

  - Kernel space event:
Overhead in kernel space is measurable by enabling it.

[Creating IDT]

 A way to create IDT is as below.

 - Introduce set_intr_gate_raw() to register just non-trace handler to IDT.
   This is used at boot time which tracing is disabled.
 - Make set_intr_gate() macro so that it can register trace handler to
   trace IDT and non-trace handler to normal IDT.

Signed-off-by: Seiji Aguchi seiji.agu...@hds.com
---
 arch/x86/include/asm/desc.h | 33 +
 arch/x86/include/asm/hw_irq.h   | 14 -
 arch/x86/include/asm/trace/exceptions.h | 52 +
 arch/x86/include/asm/traps.h| 22 ++
 arch/x86/kernel/entry_32.S  | 10 +++
 arch/x86/kernel/entry_64.S  | 13 -
 arch/x86/kernel/head64.c|  2 +-
 arch/x86/kernel/irqinit.c   |  2 +-
 arch/x86/kernel/kvm.c   |  2 +-
 arch/x86/kernel/traps.c | 28 +-
 arch/x86/mm/Makefile|  2 ++
 arch/x86/mm/fault.c | 22 ++
 12 files changed, 178 insertions(+), 24 deletions(-)
 create mode 100644 arch/x86/include/asm/trace/exceptions.h

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index b90e5df..c04302b 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -327,10 +327,28 @@ static inline void write_trace_idt_entry(int entry, const 
gate_desc *gate)
 {
write_idt_entry(trace_idt_table, entry, gate);
 }
+
+static inline void _trace_set_gate(int gate, unsigned type, void *addr,
+  unsigned dpl, unsigned ist, unsigned seg)
+{
+   gate_desc s;
+
+   pack_gate(s, type, (unsigned long)addr, dpl, ist, seg);
+   /*
+* does not need to be atomic because it is only done once at
+* setup time
+*/
+   write_trace_idt_entry(gate, s);
+}
 #else
 static inline void write_trace_idt_entry(int entry, const gate_desc *gate)
 {
 }
+
+static inline void _trace_set_gate(int gate, unsigned type, void *addr,
+  unsigned dpl, unsigned ist, unsigned seg)
+{
+}
 #endif
 
 static inline void _set_gate(int gate, unsigned type, void *addr,
@@ -353,12 +371,20 @@ static inline void _set_gate(int gate, unsigned type, 
void *addr,
  * Pentium F0 0F bugfix can have resulted in the mapped
  * IDT being write-protected.
  */
-static inline void set_intr_gate(unsigned int n, void *addr)
+static inline void set_intr_gate_raw(unsigned int n, void *addr)
 {
BUG_ON((unsigned)n  0xFF);
_set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS);
 }
 
+#define set_intr_gate(n, addr) \
+   do {\
+   BUG_ON((unsigned)n  0xFF); \
+   _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS);  \
+   _trace_set_gate(n, GATE_INTERRUPT, trace_##addr, 0, 0,  \
+   __KERNEL_CS);   \
+   } while (0)
+
 extern int first_system_vector;
 /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */
 extern unsigned long used_vectors[];
@@ -395,10 +421,7 @@ static inline void trace_set_intr_gate(unsigned int gate, 
void *addr)
 #define __trace_alloc_intr_gate(n, addr)
 #endif
 
-static inline void __alloc_intr_gate(unsigned int n, void *addr)
-{
-   set_intr_gate(n, addr);
-}
+#define __alloc_intr_gate(n, addr) set_intr_gate(n, addr)
 
 #define alloc_intr_gate(n, addr)   \
do {\
diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 92b3bae..c856e69 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -89,10 +89,22 @@ extern void trace_reschedule_interrupt(void);
 extern void trace_threshold_interrupt(void);
 extern void trace_call_function_interrupt(void);
 extern void trace_call_function_single_interrupt(void);
+#else /* CONFIG_TRACING */
+#define trace_apic_timer_interrupt apic_timer_interrupt
+#define trace_x86_platform_ipi x86_platform_ipi
+#define trace_error_interrupt error_interrupt
+#define trace_irq_work_interrupt irq_work_interrupt
+#define trace_spurious_interrupt

RE: [PATCH v2] Introduce page fault tracepoint

2013-09-06 Thread Seiji Aguchi


> -Original Message-
> From: Steven Rostedt [mailto:rost...@goodmis.org]
> Sent: Friday, September 06, 2013 12:50 PM
> To: Seiji Aguchi
> Cc: linux-kernel@vger.kernel.org; x...@kernel.org; h...@zytor.com; 
> mi...@elte.hu; b...@alien8.de; t...@linutronix.de;
> fdesl...@gmail.com; raphael.beamo...@gmail.com; 
> dle-deve...@lists.sourceforge.net; Tomoki Sekiyama
> Subject: Re: [PATCH v2] Introduce page fault tracepoint
> 
> On Fri, 23 Aug 2013 11:37:43 -0400
> Seiji Aguchi  wrote:
> 
> 
> > +#if !defined(_TRACE_PAGE_FAULT_H) || defined(TRACE_HEADER_MULTI_READ)
> > +#define _TRACE_PAGE_FAULT_H
> > +
> > +#include 
> > +
> > +extern void trace_irq_vector_regfunc(void);
> > +extern void trace_irq_vector_unregfunc(void);
> > +
> > +DECLARE_EVENT_CLASS(x86_exceptions,
> > +
> > +   TP_PROTO(unsigned long address, struct pt_regs *regs,
> > +unsigned long error_code),
> > +
> > +   TP_ARGS(address, regs, error_code),
> > +
> > +   TP_STRUCT__entry(
> > +   __field(unsigned long, address  )
> > +   __field(struct pt_regs *, regs  )
> > +   __field(unsigned long, error_code )
> > +   ),
> > +
> > +   TP_fast_assign(
> > +   __entry->address = address;
> > +   __entry->regs = regs;
> > +   __entry->error_code = error_code;
> > +   ),
> > +
> > +   TP_printk("address=0x%lx ip=0x%lx error_code=0x%lx",
> > + __entry->address, __entry->regs->ip, __entry->error_code) );
> 
> This is sure to crash the kernel.
> 
> You just saved the address of a pointer to some task's stack in the
> ring buffer. And then on output (which can happen a long time from when
> it was recorded), you are dereferencing that same address!
> 
> That __entry->regs->ip *will* crash the kernel!
> 
> What you want is to save ip in the fast_assign:
> 
>   __entry->ip = regs->ip
> 
> And then print that. Never dereference a pointer directly from the ring
> buffer unless it's a constant value (like a global string).

I see..
Thank you for reviewing.

I will fix it.

Seiji
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v2] Introduce page fault tracepoint

2013-09-06 Thread Seiji Aguchi
Any comment?

> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org 
> [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Seiji Aguchi
> Sent: Friday, August 23, 2013 11:38 AM
> To: linux-kernel@vger.kernel.org; x...@kernel.org
> Cc: h...@zytor.com; rost...@goodmis.org; mi...@elte.hu; b...@alien8.de; 
> t...@linutronix.de; fdesl...@gmail.com;
> raphael.beamo...@gmail.com; dle-deve...@lists.sourceforge.net; Tomoki Sekiyama
> Subject: [PATCH v2] Introduce page fault tracepoint
> 
> Change from v1
>  - Print regs->ip instead of regs to know where the page fault event happened
>(Suggested by Steven Rostedt)
> 
> 
> This patch introduces page fault tracepoints to x86 architecture
> by switching IDT.
> 
> [Use case of page fault events]
> 
>   Two events, for user and kernel spaces, are introduced at the beginning of
>   page fault handler.
> 
>   - User space event
> There is a request of page fault event for user space as below.
> 
> http://marc.info/?l=linux-mm=136807959830182=2
> http://marc.info/?l=linux-mm=136807959130175=2
> 
>   - Kernel space event:
> Overhead in kernel space is measurable by enabling it.
> 
> [Creating IDT]
> 
>  A way to create IDT is as below.
> 
>  - Introduce set_intr_gate_raw() to register just non-trace handler to IDT.
>This is used at boot time which tracing is disabled.
>  - Make set_intr_gate() macro so that it can register trace handler to
>trace IDT and non-trace handler to normal IDT.
> 
> Signed-off-by: Seiji Aguchi 
> ---
>  arch/x86/include/asm/desc.h | 33 +
>  arch/x86/include/asm/hw_irq.h   | 14 -
>  arch/x86/include/asm/trace/exceptions.h | 51 
> +
>  arch/x86/include/asm/traps.h| 22 ++
>  arch/x86/kernel/entry_32.S  | 10 +++
>  arch/x86/kernel/entry_64.S  | 13 -
>  arch/x86/kernel/head64.c|  2 +-
>  arch/x86/kernel/irqinit.c   |  2 +-
>  arch/x86/kernel/kvm.c   |  2 +-
>  arch/x86/kernel/traps.c | 28 +-
>  arch/x86/mm/Makefile|  2 ++
>  arch/x86/mm/fault.c | 22 ++
>  12 files changed, 177 insertions(+), 24 deletions(-)
>  create mode 100644 arch/x86/include/asm/trace/exceptions.h
> 
> diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
> index b90e5df..c04302b 100644
> --- a/arch/x86/include/asm/desc.h
> +++ b/arch/x86/include/asm/desc.h
> @@ -327,10 +327,28 @@ static inline void write_trace_idt_entry(int entry, 
> const gate_desc *gate)
>  {
>   write_idt_entry(trace_idt_table, entry, gate);
>  }
> +
> +static inline void _trace_set_gate(int gate, unsigned type, void *addr,
> +unsigned dpl, unsigned ist, unsigned seg)
> +{
> + gate_desc s;
> +
> + pack_gate(, type, (unsigned long)addr, dpl, ist, seg);
> + /*
> +  * does not need to be atomic because it is only done once at
> +  * setup time
> +  */
> + write_trace_idt_entry(gate, );
> +}
>  #else
>  static inline void write_trace_idt_entry(int entry, const gate_desc *gate)
>  {
>  }
> +
> +static inline void _trace_set_gate(int gate, unsigned type, void *addr,
> +unsigned dpl, unsigned ist, unsigned seg)
> +{
> +}
>  #endif
> 
>  static inline void _set_gate(int gate, unsigned type, void *addr,
> @@ -353,12 +371,20 @@ static inline void _set_gate(int gate, unsigned type, 
> void *addr,
>   * Pentium F0 0F bugfix can have resulted in the mapped
>   * IDT being write-protected.
>   */
> -static inline void set_intr_gate(unsigned int n, void *addr)
> +static inline void set_intr_gate_raw(unsigned int n, void *addr)
>  {
>   BUG_ON((unsigned)n > 0xFF);
>   _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS);
>  }
> 
> +#define set_intr_gate(n, addr)   
> \
> + do {\
> + BUG_ON((unsigned)n > 0xFF); \
> + _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS);  \
> + _trace_set_gate(n, GATE_INTERRUPT, trace_##addr, 0, 0,  \
> + __KERNEL_CS);   \
> + } while (0)
> +
>  extern int first_system_vector;
>  /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */
>  extern unsigned long used_vectors[];
> @@ -395,10 +421,7 @@ static inline voi

RE: [PATCH v2] Introduce page fault tracepoint

2013-09-06 Thread Seiji Aguchi
Any comment?

 -Original Message-
 From: linux-kernel-ow...@vger.kernel.org 
 [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Seiji Aguchi
 Sent: Friday, August 23, 2013 11:38 AM
 To: linux-kernel@vger.kernel.org; x...@kernel.org
 Cc: h...@zytor.com; rost...@goodmis.org; mi...@elte.hu; b...@alien8.de; 
 t...@linutronix.de; fdesl...@gmail.com;
 raphael.beamo...@gmail.com; dle-deve...@lists.sourceforge.net; Tomoki Sekiyama
 Subject: [PATCH v2] Introduce page fault tracepoint
 
 Change from v1
  - Print regs-ip instead of regs to know where the page fault event happened
(Suggested by Steven Rostedt)
 
 
 This patch introduces page fault tracepoints to x86 architecture
 by switching IDT.
 
 [Use case of page fault events]
 
   Two events, for user and kernel spaces, are introduced at the beginning of
   page fault handler.
 
   - User space event
 There is a request of page fault event for user space as below.
 
 http://marc.info/?l=linux-mmm=136807959830182w=2
 http://marc.info/?l=linux-mmm=136807959130175w=2
 
   - Kernel space event:
 Overhead in kernel space is measurable by enabling it.
 
 [Creating IDT]
 
  A way to create IDT is as below.
 
  - Introduce set_intr_gate_raw() to register just non-trace handler to IDT.
This is used at boot time which tracing is disabled.
  - Make set_intr_gate() macro so that it can register trace handler to
trace IDT and non-trace handler to normal IDT.
 
 Signed-off-by: Seiji Aguchi seiji.agu...@hds.com
 ---
  arch/x86/include/asm/desc.h | 33 +
  arch/x86/include/asm/hw_irq.h   | 14 -
  arch/x86/include/asm/trace/exceptions.h | 51 
 +
  arch/x86/include/asm/traps.h| 22 ++
  arch/x86/kernel/entry_32.S  | 10 +++
  arch/x86/kernel/entry_64.S  | 13 -
  arch/x86/kernel/head64.c|  2 +-
  arch/x86/kernel/irqinit.c   |  2 +-
  arch/x86/kernel/kvm.c   |  2 +-
  arch/x86/kernel/traps.c | 28 +-
  arch/x86/mm/Makefile|  2 ++
  arch/x86/mm/fault.c | 22 ++
  12 files changed, 177 insertions(+), 24 deletions(-)
  create mode 100644 arch/x86/include/asm/trace/exceptions.h
 
 diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
 index b90e5df..c04302b 100644
 --- a/arch/x86/include/asm/desc.h
 +++ b/arch/x86/include/asm/desc.h
 @@ -327,10 +327,28 @@ static inline void write_trace_idt_entry(int entry, 
 const gate_desc *gate)
  {
   write_idt_entry(trace_idt_table, entry, gate);
  }
 +
 +static inline void _trace_set_gate(int gate, unsigned type, void *addr,
 +unsigned dpl, unsigned ist, unsigned seg)
 +{
 + gate_desc s;
 +
 + pack_gate(s, type, (unsigned long)addr, dpl, ist, seg);
 + /*
 +  * does not need to be atomic because it is only done once at
 +  * setup time
 +  */
 + write_trace_idt_entry(gate, s);
 +}
  #else
  static inline void write_trace_idt_entry(int entry, const gate_desc *gate)
  {
  }
 +
 +static inline void _trace_set_gate(int gate, unsigned type, void *addr,
 +unsigned dpl, unsigned ist, unsigned seg)
 +{
 +}
  #endif
 
  static inline void _set_gate(int gate, unsigned type, void *addr,
 @@ -353,12 +371,20 @@ static inline void _set_gate(int gate, unsigned type, 
 void *addr,
   * Pentium F0 0F bugfix can have resulted in the mapped
   * IDT being write-protected.
   */
 -static inline void set_intr_gate(unsigned int n, void *addr)
 +static inline void set_intr_gate_raw(unsigned int n, void *addr)
  {
   BUG_ON((unsigned)n  0xFF);
   _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS);
  }
 
 +#define set_intr_gate(n, addr)   
 \
 + do {\
 + BUG_ON((unsigned)n  0xFF); \
 + _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS);  \
 + _trace_set_gate(n, GATE_INTERRUPT, trace_##addr, 0, 0,  \
 + __KERNEL_CS);   \
 + } while (0)
 +
  extern int first_system_vector;
  /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */
  extern unsigned long used_vectors[];
 @@ -395,10 +421,7 @@ static inline void trace_set_intr_gate(unsigned int 
 gate, void *addr)
  #define __trace_alloc_intr_gate(n, addr)
  #endif
 
 -static inline void __alloc_intr_gate(unsigned int n, void *addr)
 -{
 - set_intr_gate(n, addr);
 -}
 +#define __alloc_intr_gate(n, addr) set_intr_gate(n, addr)
 
  #define alloc_intr_gate(n, addr) \
   do {\
 diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
 index e4ac559..fbd73b7

RE: [PATCH v2] Introduce page fault tracepoint

2013-09-06 Thread Seiji Aguchi


 -Original Message-
 From: Steven Rostedt [mailto:rost...@goodmis.org]
 Sent: Friday, September 06, 2013 12:50 PM
 To: Seiji Aguchi
 Cc: linux-kernel@vger.kernel.org; x...@kernel.org; h...@zytor.com; 
 mi...@elte.hu; b...@alien8.de; t...@linutronix.de;
 fdesl...@gmail.com; raphael.beamo...@gmail.com; 
 dle-deve...@lists.sourceforge.net; Tomoki Sekiyama
 Subject: Re: [PATCH v2] Introduce page fault tracepoint
 
 On Fri, 23 Aug 2013 11:37:43 -0400
 Seiji Aguchi seiji.agu...@hds.com wrote:
 
 
  +#if !defined(_TRACE_PAGE_FAULT_H) || defined(TRACE_HEADER_MULTI_READ)
  +#define _TRACE_PAGE_FAULT_H
  +
  +#include linux/tracepoint.h
  +
  +extern void trace_irq_vector_regfunc(void);
  +extern void trace_irq_vector_unregfunc(void);
  +
  +DECLARE_EVENT_CLASS(x86_exceptions,
  +
  +   TP_PROTO(unsigned long address, struct pt_regs *regs,
  +unsigned long error_code),
  +
  +   TP_ARGS(address, regs, error_code),
  +
  +   TP_STRUCT__entry(
  +   __field(unsigned long, address  )
  +   __field(struct pt_regs *, regs  )
  +   __field(unsigned long, error_code )
  +   ),
  +
  +   TP_fast_assign(
  +   __entry-address = address;
  +   __entry-regs = regs;
  +   __entry-error_code = error_code;
  +   ),
  +
  +   TP_printk(address=0x%lx ip=0x%lx error_code=0x%lx,
  + __entry-address, __entry-regs-ip, __entry-error_code) );
 
 This is sure to crash the kernel.
 
 You just saved the address of a pointer to some task's stack in the
 ring buffer. And then on output (which can happen a long time from when
 it was recorded), you are dereferencing that same address!
 
 That __entry-regs-ip *will* crash the kernel!
 
 What you want is to save ip in the fast_assign:
 
   __entry-ip = regs-ip
 
 And then print that. Never dereference a pointer directly from the ring
 buffer unless it's a constant value (like a global string).

I see..
Thank you for reviewing.

I will fix it.

Seiji
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC PATCH v2 04/11] pstore: Add compression support to pstore

2013-09-04 Thread Seiji Aguchi
> But let's make sure that efivars, erst,
> etc. are all happy with the changes we make before I ask Linus to pull another
> pstore piece.

I will test efivars when Aruna posts the bugfix patches.

Seiji

> -Original Message-
> From: Luck, Tony [mailto:tony.l...@intel.com]
> Sent: Wednesday, September 04, 2013 12:11 PM
> To: Aruna Balakrishnaiah; Seiji Aguchi
> Cc: jkeni...@linux.vnet.ibm.com; keesc...@chromium.org; 
> mah...@linux.vnet.ibm.com; ccr...@android.com; linux-
> ker...@vger.kernel.org; linuxppc-...@ozlabs.org; cbouatmai...@gmail.com
> Subject: RE: [RFC PATCH v2 04/11] pstore: Add compression support to pstore
> 
> > The reason behind compression failure is the size of big_oops_buf which is 
> > too
> > big for efivars case. I will do some experiments with different kind of 
> > texts
> > for buffer size 1024 to check if 100/53 suits for all the cases.
> ...
> 
> > Yes this can be changed to zlib_inflateInit2().
> 
> Original patch series was just pulled by Linus ... so we'll need a patch on 
> top
> of current Linus git tree to fix these issues.  But let's make sure that 
> efivars, erst,
> etc. are all happy with the changes we make before I ask Linus to pull another
> pstore piece.
> 
> Thanks
> 
> -Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC PATCH v2 04/11] pstore: Add compression support to pstore

2013-09-04 Thread Seiji Aguchi
 But let's make sure that efivars, erst,
 etc. are all happy with the changes we make before I ask Linus to pull another
 pstore piece.

I will test efivars when Aruna posts the bugfix patches.

Seiji

 -Original Message-
 From: Luck, Tony [mailto:tony.l...@intel.com]
 Sent: Wednesday, September 04, 2013 12:11 PM
 To: Aruna Balakrishnaiah; Seiji Aguchi
 Cc: jkeni...@linux.vnet.ibm.com; keesc...@chromium.org; 
 mah...@linux.vnet.ibm.com; ccr...@android.com; linux-
 ker...@vger.kernel.org; linuxppc-...@ozlabs.org; cbouatmai...@gmail.com
 Subject: RE: [RFC PATCH v2 04/11] pstore: Add compression support to pstore
 
  The reason behind compression failure is the size of big_oops_buf which is 
  too
  big for efivars case. I will do some experiments with different kind of 
  texts
  for buffer size 1024 to check if 100/53 suits for all the cases.
 ...
 
  Yes this can be changed to zlib_inflateInit2().
 
 Original patch series was just pulled by Linus ... so we'll need a patch on 
 top
 of current Linus git tree to fix these issues.  But let's make sure that 
 efivars, erst,
 etc. are all happy with the changes we make before I ask Linus to pull another
 pstore piece.
 
 Thanks
 
 -Tony
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC PATCH v2 04/11] pstore: Add compression support to pstore

2013-09-03 Thread Seiji Aguchi
Aruna,

Sorry for the late response.

> Seiji,
> 
> Could you let us know the efivars buffer size with which the pstore is
> registered when
> the failure occurred.

I looked into the issue today.

I added some debug message just before pstore_compress().
As you can see below, the buffer size is a fixed value(1024).
Therefore, the size doesn't seem to be related to the failure directly.

Also, in the failure case, zlib_deflate() returns Z_OK and stream.avail_out is 
zero.
So, I thought big_oops_buf_sz is too big in my environment.
And then, I tuned  big_oops_buf_sz to (psinfo->bufsize * 100) / 53.

At the same time, while looking into this issue, I had two concerns about 
current cording.

1) In pstore_decompress(), why not use zlib_inflateInit2() instead of 
zlib_inflateInit()?
 If you use zlib_deflateInit2()  for specifying window_bit at compressing 
time, 
 zlib_inflateInit2() seems to be appropriate for decompressing.
 (Please see a comment about inflateInit2() in include/linux/zlib.h and 
source code of crypto/deflate.c)

2) As looking at crypto/deflate.c, stream->workspace is kzalloced at the 
beginning of 
   compressing/decompressing time.
So, in pstore case,  stream.workspace must be initialized to 0 with 
memset() in pstore_compress()/decompress().

I did three things above, tuning big_oops_buf_sz, using zlib_inflateInit2() and 
initializing stream.workspace , in my environment.
As a result, compressions/decmpressions of all entries succeeded with efivars 
driver.


Panic#2 Part1
<4>[   75.665020]  [] SyS_write+0x4c/0xa0
<4>[   75.665020]  [] system_call_fastpath+0x16/0x1b
<4>[   75.665020] Code: ef e8 ff f7 ff ff eb d8 66 2e 0f 1f 84 00 00 00 00 00 
0f 1f 00 0f 1f 44 00 00 55 c7 05 cc f3 c9 00 01 00 00 00 48 89 e5 0f ae f8  
04 25 00 00 00 00 01 5d c3 0f 1f 44 00 00 55 31 c0 c7 05 5e 
<1>[   75.665020] RIP  [] sysrq_handle_crash+0x16/0x20
<4>[   75.665020]  RSP 
<4>[   75.665020] CR2: 
<4>[   75.665020] ---[ end trace 97bb4a1f8d3fe7b2 ]---
<3>[   75.665020] pstore_dump hsize=13 len=2155 big_oops_buf_sz=2275 
psinfo->bufsize=1024
<3>[   75.665020] pstore_dump hsize=13 len=2246 big_oops_buf_sz=2275 
psinfo->bufsize=1024
<3>[   75.665020] pstore: compression failed for Part 2 returned -5
<3>[   75.665020] pstore: Capture uncompressed oops/panic report of Part 2
<3>[   75.665020] pstore_dump hsize=13 len=2239 big_oops_buf_sz=2275 
psinfo->bufsize=1024
<3>[   75.665020] pstore_dump hsize=13 len=2231 big_oops_buf_sz=2275 
psinfo->bufsize=1024
<3>[   75.665020] pstore_dump hsize=13 len=2185 big_oops_buf_sz=2275 
psinfo->bufsize=1024
<3>[   75.665020] pstore: compression failed for Part 5 returned -5
<3>[   75.665020] pstore: Capture uncompressed oops/panic report of Part 5
<3>[   75.665020] pstore_dump hsize=13 len=2234 big_oops_buf_sz=2275 
psinfo->bufsize=1024
<3>[   75.665020] pstore_dump hsize=13 len=2208 big_oops_buf_sz=2275 
psinfo->bufsize=1024
<3>[   75.665020] pstore_dump hsize=13 len=2218 big_oops_buf_sz=2275 
psinfo->bufsize=1024
<3>[   75.665020] pstore_dump hsize=13 len= big_oops_buf_sz=2275 
psinfo->bufsize=1024
<3>[   75.665020] pstore: compression failed for Part 9 returned -5
<3>[   75.665020] pstore: Capture uncompressed oops/panic report of Part 9
<3>[   75.665020] pstore_dump hsize=14 len=2256 big_oops_buf_sz=2275 
psinfo->bufsize=1024
<3>[   75.665020] pstore_dump hsize=14 len=2219 big_oops_buf_sz=2275 
psinfo->bufsize=1024
<3>[   75.665020] pstore_dump hsize=14 len=2226 big_oops_buf_sz=2275 
psinfo->bufsize=1024
<0>[   75.665020] Kernel panic - not syncing: Fatal exception
<3>[   75.665020] drm_kms_helper: panic occurred, switching back to text console


Seiji
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC PATCH v2 04/11] pstore: Add compression support to pstore

2013-09-03 Thread Seiji Aguchi
Aruna,

Sorry for the late response.

 Seiji,
 
 Could you let us know the efivars buffer size with which the pstore is
 registered when
 the failure occurred.

I looked into the issue today.

I added some debug message just before pstore_compress().
As you can see below, the buffer size is a fixed value(1024).
Therefore, the size doesn't seem to be related to the failure directly.

Also, in the failure case, zlib_deflate() returns Z_OK and stream.avail_out is 
zero.
So, I thought big_oops_buf_sz is too big in my environment.
And then, I tuned  big_oops_buf_sz to (psinfo-bufsize * 100) / 53.

At the same time, while looking into this issue, I had two concerns about 
current cording.

1) In pstore_decompress(), why not use zlib_inflateInit2() instead of 
zlib_inflateInit()?
 If you use zlib_deflateInit2()  for specifying window_bit at compressing 
time, 
 zlib_inflateInit2() seems to be appropriate for decompressing.
 (Please see a comment about inflateInit2() in include/linux/zlib.h and 
source code of crypto/deflate.c)

2) As looking at crypto/deflate.c, stream-workspace is kzalloced at the 
beginning of 
   compressing/decompressing time.
So, in pstore case,  stream.workspace must be initialized to 0 with 
memset() in pstore_compress()/decompress().

I did three things above, tuning big_oops_buf_sz, using zlib_inflateInit2() and 
initializing stream.workspace , in my environment.
As a result, compressions/decmpressions of all entries succeeded with efivars 
driver.

snip
Panic#2 Part1
4[   75.665020]  [811af59c] SyS_write+0x4c/0xa0
4[   75.665020]  [8168be82] system_call_fastpath+0x16/0x1b
4[   75.665020] Code: ef e8 ff f7 ff ff eb d8 66 2e 0f 1f 84 00 00 00 00 00 
0f 1f 00 0f 1f 44 00 00 55 c7 05 cc f3 c9 00 01 00 00 00 48 89 e5 0f ae f8 c6 
04 25 00 00 00 00 01 5d c3 0f 1f 44 00 00 55 31 c0 c7 05 5e 
1[   75.665020] RIP  [813d3946] sysrq_handle_crash+0x16/0x20
4[   75.665020]  RSP 88007852de80
4[   75.665020] CR2: 
4[   75.665020] ---[ end trace 97bb4a1f8d3fe7b2 ]---
3[   75.665020] pstore_dump hsize=13 len=2155 big_oops_buf_sz=2275 
psinfo-bufsize=1024
3[   75.665020] pstore_dump hsize=13 len=2246 big_oops_buf_sz=2275 
psinfo-bufsize=1024
3[   75.665020] pstore: compression failed for Part 2 returned -5
3[   75.665020] pstore: Capture uncompressed oops/panic report of Part 2
3[   75.665020] pstore_dump hsize=13 len=2239 big_oops_buf_sz=2275 
psinfo-bufsize=1024
3[   75.665020] pstore_dump hsize=13 len=2231 big_oops_buf_sz=2275 
psinfo-bufsize=1024
3[   75.665020] pstore_dump hsize=13 len=2185 big_oops_buf_sz=2275 
psinfo-bufsize=1024
3[   75.665020] pstore: compression failed for Part 5 returned -5
3[   75.665020] pstore: Capture uncompressed oops/panic report of Part 5
3[   75.665020] pstore_dump hsize=13 len=2234 big_oops_buf_sz=2275 
psinfo-bufsize=1024
3[   75.665020] pstore_dump hsize=13 len=2208 big_oops_buf_sz=2275 
psinfo-bufsize=1024
3[   75.665020] pstore_dump hsize=13 len=2218 big_oops_buf_sz=2275 
psinfo-bufsize=1024
3[   75.665020] pstore_dump hsize=13 len= big_oops_buf_sz=2275 
psinfo-bufsize=1024
3[   75.665020] pstore: compression failed for Part 9 returned -5
3[   75.665020] pstore: Capture uncompressed oops/panic report of Part 9
3[   75.665020] pstore_dump hsize=14 len=2256 big_oops_buf_sz=2275 
psinfo-bufsize=1024
3[   75.665020] pstore_dump hsize=14 len=2219 big_oops_buf_sz=2275 
psinfo-bufsize=1024
3[   75.665020] pstore_dump hsize=14 len=2226 big_oops_buf_sz=2275 
psinfo-bufsize=1024
0[   75.665020] Kernel panic - not syncing: Fatal exception
3[   75.665020] drm_kms_helper: panic occurred, switching back to text console
snip

Seiji
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] Introduce page fault tracepoint

2013-08-23 Thread Seiji Aguchi
Change from v1
 - Print regs->ip instead of regs to know where the page fault event happened
   (Suggested by Steven Rostedt)


This patch introduces page fault tracepoints to x86 architecture
by switching IDT.

[Use case of page fault events]

  Two events, for user and kernel spaces, are introduced at the beginning of
  page fault handler.

  - User space event
There is a request of page fault event for user space as below.

http://marc.info/?l=linux-mm=136807959830182=2
http://marc.info/?l=linux-mm=136807959130175=2

  - Kernel space event:
Overhead in kernel space is measurable by enabling it.

[Creating IDT]

 A way to create IDT is as below.

 - Introduce set_intr_gate_raw() to register just non-trace handler to IDT.
   This is used at boot time which tracing is disabled.
 - Make set_intr_gate() macro so that it can register trace handler to
   trace IDT and non-trace handler to normal IDT.

Signed-off-by: Seiji Aguchi 
---
 arch/x86/include/asm/desc.h | 33 +
 arch/x86/include/asm/hw_irq.h   | 14 -
 arch/x86/include/asm/trace/exceptions.h | 51 +
 arch/x86/include/asm/traps.h| 22 ++
 arch/x86/kernel/entry_32.S  | 10 +++
 arch/x86/kernel/entry_64.S  | 13 -
 arch/x86/kernel/head64.c|  2 +-
 arch/x86/kernel/irqinit.c   |  2 +-
 arch/x86/kernel/kvm.c   |  2 +-
 arch/x86/kernel/traps.c | 28 +-
 arch/x86/mm/Makefile|  2 ++
 arch/x86/mm/fault.c | 22 ++
 12 files changed, 177 insertions(+), 24 deletions(-)
 create mode 100644 arch/x86/include/asm/trace/exceptions.h

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index b90e5df..c04302b 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -327,10 +327,28 @@ static inline void write_trace_idt_entry(int entry, const 
gate_desc *gate)
 {
write_idt_entry(trace_idt_table, entry, gate);
 }
+
+static inline void _trace_set_gate(int gate, unsigned type, void *addr,
+  unsigned dpl, unsigned ist, unsigned seg)
+{
+   gate_desc s;
+
+   pack_gate(, type, (unsigned long)addr, dpl, ist, seg);
+   /*
+* does not need to be atomic because it is only done once at
+* setup time
+*/
+   write_trace_idt_entry(gate, );
+}
 #else
 static inline void write_trace_idt_entry(int entry, const gate_desc *gate)
 {
 }
+
+static inline void _trace_set_gate(int gate, unsigned type, void *addr,
+  unsigned dpl, unsigned ist, unsigned seg)
+{
+}
 #endif
 
 static inline void _set_gate(int gate, unsigned type, void *addr,
@@ -353,12 +371,20 @@ static inline void _set_gate(int gate, unsigned type, 
void *addr,
  * Pentium F0 0F bugfix can have resulted in the mapped
  * IDT being write-protected.
  */
-static inline void set_intr_gate(unsigned int n, void *addr)
+static inline void set_intr_gate_raw(unsigned int n, void *addr)
 {
BUG_ON((unsigned)n > 0xFF);
_set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS);
 }
 
+#define set_intr_gate(n, addr) \
+   do {\
+   BUG_ON((unsigned)n > 0xFF); \
+   _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS);  \
+   _trace_set_gate(n, GATE_INTERRUPT, trace_##addr, 0, 0,  \
+   __KERNEL_CS);   \
+   } while (0)
+
 extern int first_system_vector;
 /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */
 extern unsigned long used_vectors[];
@@ -395,10 +421,7 @@ static inline void trace_set_intr_gate(unsigned int gate, 
void *addr)
 #define __trace_alloc_intr_gate(n, addr)
 #endif
 
-static inline void __alloc_intr_gate(unsigned int n, void *addr)
-{
-   set_intr_gate(n, addr);
-}
+#define __alloc_intr_gate(n, addr) set_intr_gate(n, addr)
 
 #define alloc_intr_gate(n, addr)   \
do {\
diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index e4ac559..fbd73b7 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -89,10 +89,22 @@ extern void trace_reschedule_interrupt(void);
 extern void trace_threshold_interrupt(void);
 extern void trace_call_function_interrupt(void);
 extern void trace_call_function_single_interrupt(void);
+#else /* CONFIG_TRACING */
+#define trace_apic_timer_interrupt apic_timer_interrupt
+#define trace_x86_platform_ipi x86_platform_ipi
+#define trace_error_interrupt error_interrupt
+#define trace_irq_work_interrupt irq_work_interrupt
+#define trace_spurious_interrupt spurious_interrupt

[PATCH v2] Introduce page fault tracepoint

2013-08-23 Thread Seiji Aguchi
Change from v1
 - Print regs-ip instead of regs to know where the page fault event happened
   (Suggested by Steven Rostedt)


This patch introduces page fault tracepoints to x86 architecture
by switching IDT.

[Use case of page fault events]

  Two events, for user and kernel spaces, are introduced at the beginning of
  page fault handler.

  - User space event
There is a request of page fault event for user space as below.

http://marc.info/?l=linux-mmm=136807959830182w=2
http://marc.info/?l=linux-mmm=136807959130175w=2

  - Kernel space event:
Overhead in kernel space is measurable by enabling it.

[Creating IDT]

 A way to create IDT is as below.

 - Introduce set_intr_gate_raw() to register just non-trace handler to IDT.
   This is used at boot time which tracing is disabled.
 - Make set_intr_gate() macro so that it can register trace handler to
   trace IDT and non-trace handler to normal IDT.

Signed-off-by: Seiji Aguchi seiji.agu...@hds.com
---
 arch/x86/include/asm/desc.h | 33 +
 arch/x86/include/asm/hw_irq.h   | 14 -
 arch/x86/include/asm/trace/exceptions.h | 51 +
 arch/x86/include/asm/traps.h| 22 ++
 arch/x86/kernel/entry_32.S  | 10 +++
 arch/x86/kernel/entry_64.S  | 13 -
 arch/x86/kernel/head64.c|  2 +-
 arch/x86/kernel/irqinit.c   |  2 +-
 arch/x86/kernel/kvm.c   |  2 +-
 arch/x86/kernel/traps.c | 28 +-
 arch/x86/mm/Makefile|  2 ++
 arch/x86/mm/fault.c | 22 ++
 12 files changed, 177 insertions(+), 24 deletions(-)
 create mode 100644 arch/x86/include/asm/trace/exceptions.h

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index b90e5df..c04302b 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -327,10 +327,28 @@ static inline void write_trace_idt_entry(int entry, const 
gate_desc *gate)
 {
write_idt_entry(trace_idt_table, entry, gate);
 }
+
+static inline void _trace_set_gate(int gate, unsigned type, void *addr,
+  unsigned dpl, unsigned ist, unsigned seg)
+{
+   gate_desc s;
+
+   pack_gate(s, type, (unsigned long)addr, dpl, ist, seg);
+   /*
+* does not need to be atomic because it is only done once at
+* setup time
+*/
+   write_trace_idt_entry(gate, s);
+}
 #else
 static inline void write_trace_idt_entry(int entry, const gate_desc *gate)
 {
 }
+
+static inline void _trace_set_gate(int gate, unsigned type, void *addr,
+  unsigned dpl, unsigned ist, unsigned seg)
+{
+}
 #endif
 
 static inline void _set_gate(int gate, unsigned type, void *addr,
@@ -353,12 +371,20 @@ static inline void _set_gate(int gate, unsigned type, 
void *addr,
  * Pentium F0 0F bugfix can have resulted in the mapped
  * IDT being write-protected.
  */
-static inline void set_intr_gate(unsigned int n, void *addr)
+static inline void set_intr_gate_raw(unsigned int n, void *addr)
 {
BUG_ON((unsigned)n  0xFF);
_set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS);
 }
 
+#define set_intr_gate(n, addr) \
+   do {\
+   BUG_ON((unsigned)n  0xFF); \
+   _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS);  \
+   _trace_set_gate(n, GATE_INTERRUPT, trace_##addr, 0, 0,  \
+   __KERNEL_CS);   \
+   } while (0)
+
 extern int first_system_vector;
 /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */
 extern unsigned long used_vectors[];
@@ -395,10 +421,7 @@ static inline void trace_set_intr_gate(unsigned int gate, 
void *addr)
 #define __trace_alloc_intr_gate(n, addr)
 #endif
 
-static inline void __alloc_intr_gate(unsigned int n, void *addr)
-{
-   set_intr_gate(n, addr);
-}
+#define __alloc_intr_gate(n, addr) set_intr_gate(n, addr)
 
 #define alloc_intr_gate(n, addr)   \
do {\
diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index e4ac559..fbd73b7 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -89,10 +89,22 @@ extern void trace_reschedule_interrupt(void);
 extern void trace_threshold_interrupt(void);
 extern void trace_call_function_interrupt(void);
 extern void trace_call_function_single_interrupt(void);
+#else /* CONFIG_TRACING */
+#define trace_apic_timer_interrupt apic_timer_interrupt
+#define trace_x86_platform_ipi x86_platform_ipi
+#define trace_error_interrupt error_interrupt
+#define trace_irq_work_interrupt irq_work_interrupt
+#define trace_spurious_interrupt spurious_interrupt

RE: [RFC PATCH v2 04/11] pstore: Add compression support to pstore

2013-08-22 Thread Seiji Aguchi


> -Original Message-
> From: Luck, Tony [mailto:tony.l...@intel.com]
> Sent: Thursday, August 22, 2013 7:17 PM
> To: Seiji Aguchi; Aruna Balakrishnaiah; linuxppc-...@ozlabs.org; 
> linux-kernel@vger.kernel.org; keesc...@chromium.org
> Cc: jkeni...@linux.vnet.ibm.com; ana...@in.ibm.com; b...@kernel.crashing.org; 
> cbouatmai...@gmail.com;
> mah...@linux.vnet.ibm.com; ccr...@android.com
> Subject: RE: [RFC PATCH v2 04/11] pstore: Add compression support to pstore
> 
> <1>[  383.209057] RIP  [] sysrq_handle_crash+0x16/0x20
> <4>[  383.209057]  RSP 
> <4>[  383.209057] CR2: 
> <4>[  383.209057] ---[ end trace 04a1cddad37b4b33 ]---
> <3>[  383.209057] pstore: compression failed for Part 2 returned -5
> <3>[  383.209057] pstore: Capture uncompressed oops/panic report of Part 2
> <3>[  383.209057] pstore: compression failed for Part 5 returned -5
> 
> Interesting.  With ERST backend I didn't see these messages.  Traces in
> pstore recovered files go as far as the line before the "---[ end trace 
> 04a1cddad37b4b33 ]---"
> 
> Why the difference depending on which back end is in use?

I think the difference doesn't depend on the back end.
Rather it depends on the environment.

I tested on a kvm guest with OVMF.

Seiji


> 
> But I agree that we shouldn't have these messages.  They use up space
> in the persistent store that could be better used saving some more lines
> from earlier in the console log.
> 
> -Tony


RE: [RFC PATCH v2 04/11] pstore: Add compression support to pstore

2013-08-22 Thread Seiji Aguchi


>   * callback from kmsg_dump. (s2,l2) has the most recently
>   * written bytes, older bytes are in (s1,l1). Save as much
> @@ -148,23 +243,56 @@ static void pstore_dump(struct kmsg_dumper *dumper,
>   char *dst;
>   unsigned long size;
>   int hsize;
> + int zipped_len = -1;
>   size_t len;
> - bool compressed = false;
> + bool compressed;
> + size_t total_len;
> 
> - dst = psinfo->buf;
> - hsize = sprintf(dst, "%s#%d Part%d\n", why, oopscount, part);
> - size = psinfo->bufsize - hsize;
> - dst += hsize;
> + if (big_oops_buf) {
> + dst = big_oops_buf;
> + hsize = sprintf(dst, "%s#%d Part%d\n", why,
> + oopscount, part);
> + size = big_oops_buf_sz - hsize;
> 
> - if (!kmsg_dump_get_buffer(dumper, true, dst, size, ))
> - break;
> + if (!kmsg_dump_get_buffer(dumper, true, dst + hsize,
> + size, ))
> + break;
> +
> + zipped_len = pstore_compress(dst, psinfo->buf,
> + hsize + len, psinfo->bufsize);
> +
> + if (zipped_len > 0) {
> + compressed = true;
> + total_len = zipped_len;
> + } else {
> + pr_err("pstore: compression failed for Part %d"
> + " returned %d\n", part, zipped_len);
> + pr_err("pstore: Capture uncompressed"
> + " oops/panic report of Part %d\n", 
> part);

Why did you add these messages in pstore_dump()?
In my test case, they are not needed


# cat dmesg-efi-1
Panic#2 Part1
<4>[  383.209057] RBP: 88006f551e80 R08: 0002 R09: 

<4>[  383.209057] R10: 0382 R11:  R12: 
0063
<4>[  383.209057] R13: 0286 R14: 000f R15: 

<4>[  383.209057] FS:  7f53317cc740() GS:88007c40() 
knlGS:
<4>[  383.209057] CS:  0010 DS:  ES:  CR0: 80050033
<4>[  383.209057] CR2:  CR3: 78a73000 CR4: 
06f0
<4>[  383.209057] Stack:
<4>[  383.209057]  88006f551eb8 813d40a2 0002 
7f53317db000
<4>[  383.209057]  88006f551f50 0002  
88006f551ed8
<4>[  383.209057]  813d45aa 7f53317db000 88003f8c2b00 
88006f551ef8
<4>[  383.209057] Call Trace:
<4>[  383.209057]  [] __handle_sysrq+0xa2/0x170
<4>[  383.209057]  [] write_sysrq_trigger+0x4a/0x50
<4>[  383.209057]  [] proc_reg_write+0x3d/0x80
<4>[  383.209057]  [] vfs_write+0xc0/0x1f0
<4>[  383.209057]  [] SyS_write+0x4c/0xa0
<4>[  383.209057]  [] system_call_fastpath+0x16/0x1b
<4>[  383.209057] Code: ef e8 ff f7 ff ff eb d8 66 2e 0f 1f 84 00 00 00 00 00 
0f 1f 00 0f 1f 44 00 00 55 c7 05 cc f3 c9 00 01 00 00 00 48 89 e5 0f ae f8  
04 25 00 00 00 00 01 5d c3 0f 1f 44 00 00 55 31 c0 c7 05 5e 
<1>[  383.209057] RIP  [] sysrq_handle_crash+0x16/0x20
<4>[  383.209057]  RSP 
<4>[  383.209057] CR2: 
<4>[  383.209057] ---[ end trace 04a1cddad37b4b33 ]---
<3>[  383.209057] pstore: compression failed for Part 2 returned -5
<3>[  383.209057] pstore: Capture uncompressed oops/panic report of Part 2
<3>[  383.209057] pstore: compression failed for Part 5 returned -5
<3>[  383.209057] pstore: Capture uncompressed oops/panic report of Part 5
<3>[  383.209057] pstore: compression failed for Part 12 returned -5
<3>[  383.209057] pstore: Capture uncompressed oops/panic report of Part 12
<0>[  383.209057] Kernel panic - not syncing: Fatal exception
<3>[  383.209057] drm_kms_helper: panic occurred, switching back to text console



In efi-pstore case, after rebooting a system,
we are able to know which entry failed to compress with 'C' or 'D' as below.

#ls /sys/firmware/efi/vars/ |grep dump
dump-type0-10-1-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
dump-type0-10-2-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
dump-type0-11-1-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
dump-type0-1-1-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
dump-type0-11-2-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
dump-type0-12-1-1377204734-D-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
dump-type0-1-2-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
dump-type0-12-2-1377204734-D-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
dump-type0-13-1-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
dump-type0-13-2-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
dump-type0-2-1-1377204734-D-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0

RE: [RFC PATCH v2 06/11] pstore: Add decompression support to pstore

2013-08-22 Thread Seiji Aguchi


> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org 
> [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Aruna Balakrishnaiah
> Sent: Friday, August 16, 2013 9:18 AM
> To: linuxppc-...@ozlabs.org; tony.l...@intel.com; 
> linux-kernel@vger.kernel.org; keesc...@chromium.org
> Cc: jkeni...@linux.vnet.ibm.com; ana...@in.ibm.com; b...@kernel.crashing.org; 
> cbouatmai...@gmail.com;
> mah...@linux.vnet.ibm.com; ccr...@android.com
> Subject: [RFC PATCH v2 06/11] pstore: Add decompression support to pstore
> 
> Based on the flag 'compressed' set or not, pstore will decompress the
> data returning a plain text file. If decompression fails for a particular
> record it will have the compressed data in the file which can be
> decompressed with 'openssl' command line tool.

If the decompression fails and openssl doesn't work, the worst case is that 
users can't read the entry.
In that case, pstore is meaningless at all.

Also, for users who want to get a single panic message, a compression is not 
needed.

So, I think we still have to support non-compression mode.
(IMO, pstore can take kdump as a model. Kdump supports both compression and 
non-compression mode.)

But, if you think my comment is outside this patchset, it's OK.
We can make it with a separate patch.

Seiji


RE: [RFC][PATCH] Introduce page fault tracepoint

2013-08-22 Thread Seiji Aguchi
> Printing the regs pointer is rather useless. This is specific for x86,
> why not print the ip of where it happened and the faulting address
> itself?

Thank you for reviewing.
I will change the regs pointer to ip.

> Note, you only need to change the TP_printk() to do that. For
> efficiency reasons, only pass in regs.
 
OK. Will change the TP_printk().

Seiji
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC][PATCH] Introduce page fault tracepoint

2013-08-22 Thread Seiji Aguchi
 Printing the regs pointer is rather useless. This is specific for x86,
 why not print the ip of where it happened and the faulting address
 itself?

Thank you for reviewing.
I will change the regs pointer to ip.

 Note, you only need to change the TP_printk() to do that. For
 efficiency reasons, only pass in regs.
 
OK. Will change the TP_printk().

Seiji
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC PATCH v2 06/11] pstore: Add decompression support to pstore

2013-08-22 Thread Seiji Aguchi


 -Original Message-
 From: linux-kernel-ow...@vger.kernel.org 
 [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Aruna Balakrishnaiah
 Sent: Friday, August 16, 2013 9:18 AM
 To: linuxppc-...@ozlabs.org; tony.l...@intel.com; 
 linux-kernel@vger.kernel.org; keesc...@chromium.org
 Cc: jkeni...@linux.vnet.ibm.com; ana...@in.ibm.com; b...@kernel.crashing.org; 
 cbouatmai...@gmail.com;
 mah...@linux.vnet.ibm.com; ccr...@android.com
 Subject: [RFC PATCH v2 06/11] pstore: Add decompression support to pstore
 
 Based on the flag 'compressed' set or not, pstore will decompress the
 data returning a plain text file. If decompression fails for a particular
 record it will have the compressed data in the file which can be
 decompressed with 'openssl' command line tool.

If the decompression fails and openssl doesn't work, the worst case is that 
users can't read the entry.
In that case, pstore is meaningless at all.

Also, for users who want to get a single panic message, a compression is not 
needed.

So, I think we still have to support non-compression mode.
(IMO, pstore can take kdump as a model. Kdump supports both compression and 
non-compression mode.)

But, if you think my comment is outside this patchset, it's OK.
We can make it with a separate patch.

Seiji


RE: [RFC PATCH v2 04/11] pstore: Add compression support to pstore

2013-08-22 Thread Seiji Aguchi


   * callback from kmsg_dump. (s2,l2) has the most recently
   * written bytes, older bytes are in (s1,l1). Save as much
 @@ -148,23 +243,56 @@ static void pstore_dump(struct kmsg_dumper *dumper,
   char *dst;
   unsigned long size;
   int hsize;
 + int zipped_len = -1;
   size_t len;
 - bool compressed = false;
 + bool compressed;
 + size_t total_len;
 
 - dst = psinfo-buf;
 - hsize = sprintf(dst, %s#%d Part%d\n, why, oopscount, part);
 - size = psinfo-bufsize - hsize;
 - dst += hsize;
 + if (big_oops_buf) {
 + dst = big_oops_buf;
 + hsize = sprintf(dst, %s#%d Part%d\n, why,
 + oopscount, part);
 + size = big_oops_buf_sz - hsize;
 
 - if (!kmsg_dump_get_buffer(dumper, true, dst, size, len))
 - break;
 + if (!kmsg_dump_get_buffer(dumper, true, dst + hsize,
 + size, len))
 + break;
 +
 + zipped_len = pstore_compress(dst, psinfo-buf,
 + hsize + len, psinfo-bufsize);
 +
 + if (zipped_len  0) {
 + compressed = true;
 + total_len = zipped_len;
 + } else {
 + pr_err(pstore: compression failed for Part %d
 +  returned %d\n, part, zipped_len);
 + pr_err(pstore: Capture uncompressed
 +  oops/panic report of Part %d\n, 
 part);

Why did you add these messages in pstore_dump()?
In my test case, they are not needed

snip
# cat dmesg-efi-1
Panic#2 Part1
4[  383.209057] RBP: 88006f551e80 R08: 0002 R09: 

4[  383.209057] R10: 0382 R11:  R12: 
0063
4[  383.209057] R13: 0286 R14: 000f R15: 

4[  383.209057] FS:  7f53317cc740() GS:88007c40() 
knlGS:
4[  383.209057] CS:  0010 DS:  ES:  CR0: 80050033
4[  383.209057] CR2:  CR3: 78a73000 CR4: 
06f0
4[  383.209057] Stack:
4[  383.209057]  88006f551eb8 813d40a2 0002 
7f53317db000
4[  383.209057]  88006f551f50 0002  
88006f551ed8
4[  383.209057]  813d45aa 7f53317db000 88003f8c2b00 
88006f551ef8
4[  383.209057] Call Trace:
4[  383.209057]  [813d40a2] __handle_sysrq+0xa2/0x170
4[  383.209057]  [813d45aa] write_sysrq_trigger+0x4a/0x50
4[  383.209057]  [8121981d] proc_reg_write+0x3d/0x80
4[  383.209057]  [811aeb20] vfs_write+0xc0/0x1f0
4[  383.209057]  [811af59c] SyS_write+0x4c/0xa0
4[  383.209057]  [8168be82] system_call_fastpath+0x16/0x1b
4[  383.209057] Code: ef e8 ff f7 ff ff eb d8 66 2e 0f 1f 84 00 00 00 00 00 
0f 1f 00 0f 1f 44 00 00 55 c7 05 cc f3 c9 00 01 00 00 00 48 89 e5 0f ae f8 c6 
04 25 00 00 00 00 01 5d c3 0f 1f 44 00 00 55 31 c0 c7 05 5e 
1[  383.209057] RIP  [813d3946] sysrq_handle_crash+0x16/0x20
4[  383.209057]  RSP 88006f551e80
4[  383.209057] CR2: 
4[  383.209057] ---[ end trace 04a1cddad37b4b33 ]---
3[  383.209057] pstore: compression failed for Part 2 returned -5
3[  383.209057] pstore: Capture uncompressed oops/panic report of Part 2
3[  383.209057] pstore: compression failed for Part 5 returned -5
3[  383.209057] pstore: Capture uncompressed oops/panic report of Part 5
3[  383.209057] pstore: compression failed for Part 12 returned -5
3[  383.209057] pstore: Capture uncompressed oops/panic report of Part 12
0[  383.209057] Kernel panic - not syncing: Fatal exception
3[  383.209057] drm_kms_helper: panic occurred, switching back to text console
snip


In efi-pstore case, after rebooting a system,
we are able to know which entry failed to compress with 'C' or 'D' as below.

#ls /sys/firmware/efi/vars/ |grep dump
dump-type0-10-1-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
dump-type0-10-2-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
dump-type0-11-1-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
dump-type0-1-1-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
dump-type0-11-2-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
dump-type0-12-1-1377204734-D-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
dump-type0-1-2-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
dump-type0-12-2-1377204734-D-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
dump-type0-13-1-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
dump-type0-13-2-1377204734-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0
dump-type0-2-1-1377204734-D-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0

RE: [RFC PATCH v2 04/11] pstore: Add compression support to pstore

2013-08-22 Thread Seiji Aguchi


 -Original Message-
 From: Luck, Tony [mailto:tony.l...@intel.com]
 Sent: Thursday, August 22, 2013 7:17 PM
 To: Seiji Aguchi; Aruna Balakrishnaiah; linuxppc-...@ozlabs.org; 
 linux-kernel@vger.kernel.org; keesc...@chromium.org
 Cc: jkeni...@linux.vnet.ibm.com; ana...@in.ibm.com; b...@kernel.crashing.org; 
 cbouatmai...@gmail.com;
 mah...@linux.vnet.ibm.com; ccr...@android.com
 Subject: RE: [RFC PATCH v2 04/11] pstore: Add compression support to pstore
 
 1[  383.209057] RIP  [813d3946] sysrq_handle_crash+0x16/0x20
 4[  383.209057]  RSP 88006f551e80
 4[  383.209057] CR2: 
 4[  383.209057] ---[ end trace 04a1cddad37b4b33 ]---
 3[  383.209057] pstore: compression failed for Part 2 returned -5
 3[  383.209057] pstore: Capture uncompressed oops/panic report of Part 2
 3[  383.209057] pstore: compression failed for Part 5 returned -5
 
 Interesting.  With ERST backend I didn't see these messages.  Traces in
 pstore recovered files go as far as the line before the ---[ end trace 
 04a1cddad37b4b33 ]---
 
 Why the difference depending on which back end is in use?

I think the difference doesn't depend on the back end.
Rather it depends on the environment.

I tested on a kvm guest with OVMF.

Seiji


 
 But I agree that we shouldn't have these messages.  They use up space
 in the persistent store that could be better used saving some more lines
 from earlier in the console log.
 
 -Tony


[RFC][PATCH] efivars,efi-pstore: Hold off deletion of sysfs entry until the scan is completed

2013-08-20 Thread Seiji Aguchi
Currently, when mounting pstore file system, a read callback of efi_pstore
driver runs mutiple times as below.

- In the first read callback, scan efivar_sysfs_list from head and pass
  a kmsg buffer of a entry to an upper pstore layer.
- In the second read callback, rescan efivar_sysfs_list from the entry and pass
  another kmsg buffer to it.
- Repeat the scan and pass until the end of efivar_sysfs_list.

In this process, an entry is read across the multiple read function calls.
To avoid race between the read and erasion, the whole process above is
protected by a spinlock, holding in open() and releasing in close().

At the same time, kmemdup() is called to pass the buffer to pstore filesystem
during it.
And then, it causes a following lockdep warning.

To make the read callback runnable without taking spinlok,
holding off a deletion of sysfs entry if it happens while scanning it
via efi_pstore, and deleting it after the scan is completed.

To implement it, this patch introduces two flags, scanning and deleting,
to efivar_entry.
Also, __efivar_entry_get() is removed because it was used in efi_pstore only.

[1.143710] [ cut here ]
[1.144058] WARNING: CPU: 1 PID: 1 at kernel/lockdep.c:2740
lockdep_trace_alloc+0x104/0x110()
[1.144058] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
[1.144058] Modules linked in:

[1.144058] CPU: 1 PID: 1 Comm: systemd Not tainted 3.11.0-rc5 #2
[1.144058]  0009 8800797e9ae0 816614a5
8800797e9b28
[1.144058]  8800797e9b18 8105510d 0080
0046
[1.144058]  00d0 03af 81ccd0c0
8800797e9b78
[1.144058] Call Trace:
[1.144058]  [] dump_stack+0x54/0x74
[1.144058]  [] warn_slowpath_common+0x7d/0xa0
[1.144058]  [] warn_slowpath_fmt+0x4c/0x50
[1.144058]  [] ? vsscanf+0x57f/0x7b0
[1.144058]  [] lockdep_trace_alloc+0x104/0x110
[1.144058]  [] __kmalloc_track_caller+0x50/0x280
[1.144058]  [] ?
efi_pstore_read_func.part.1+0x12b/0x170
[1.144058]  [] kmemdup+0x20/0x50
[1.144058]  [] efi_pstore_read_func.part.1+0x12b/0x170
[1.144058]  [] ?
efi_pstore_read_func.part.1+0x170/0x170
[1.144058]  [] efi_pstore_read_func+0xb4/0xe0
[1.144058]  [] __efivar_entry_iter+0xfb/0x120
[1.144058]  [] efi_pstore_read+0x3f/0x50
[1.144058]  [] pstore_get_records+0x9a/0x150
[1.158207]  [] ? selinux_d_instantiate+0x1c/0x20
[1.158207]  [] ? parse_options+0x80/0x80
[1.158207]  [] pstore_fill_super+0xa5/0xc0
[1.158207]  [] mount_single+0xa2/0xd0
[1.158207]  [] pstore_mount+0x18/0x20
[1.158207]  [] mount_fs+0x39/0x1b0
[1.158207]  [] ? __alloc_percpu+0x10/0x20
[1.158207]  [] vfs_kern_mount+0x63/0xf0
[1.158207]  [] do_mount+0x23e/0xa20
[1.158207]  [] ? strndup_user+0x4b/0xf0
[1.158207]  [] SyS_mount+0x83/0xc0
[1.158207]  [] system_call_fastpath+0x16/0x1b
[1.158207] ---[ end trace 61981bc62de9f6f4 ]---

Signed-off-by: Seiji Aguchi 
---
 drivers/firmware/efi/efi-pstore.c | 145 +++---
 drivers/firmware/efi/efivars.c|   3 +-
 drivers/firmware/efi/vars.c   |  39 +++---
 include/linux/efi.h   |   4 +-
 4 files changed, 151 insertions(+), 40 deletions(-)

diff --git a/drivers/firmware/efi/efi-pstore.c 
b/drivers/firmware/efi/efi-pstore.c
index 73de5a9..db96b0d 100644
--- a/drivers/firmware/efi/efi-pstore.c
+++ b/drivers/firmware/efi/efi-pstore.c
@@ -18,14 +18,12 @@ module_param_named(pstore_disable, efivars_pstore_disable, 
bool, 0644);
 
 static int efi_pstore_open(struct pstore_info *psi)
 {
-   efivar_entry_iter_begin();
psi->data = NULL;
return 0;
 }
 
 static int efi_pstore_close(struct pstore_info *psi)
 {
-   efivar_entry_iter_end();
psi->data = NULL;
return 0;
 }
@@ -38,6 +36,23 @@ struct pstore_read_data {
char **buf;
 };
 
+/**
+ * efi_pstore_read_func
+ * @entry: reading entry
+ * @data:  data of the entry
+ *
+ * This function runs in non-atomic context.
+ *
+ * Also, it returns a size of NVRAM entry logged via efi_pstore_write().
+ * pstore in accordance with the returned value as below.
+ *
+ * size > 0: Got data of an entry logged via efi_pstore_write() successfully,
+ *   and pstore filesystem will continue reading subsequent entries.
+ * size == 0: Entry was not logged via efi_pstore_write(),
+ *and efi_pstore driver will continue reading subsequent entries.
+ * size < 0: Failed to get data of entry logging via efi_pstore_write(),
+ *   and pstore will stop reading entry.
+ */
 static int efi_pstore_read_func(struct efivar_entry *entry, void *data)
 {
efi_guid_t vendor = LINUX_EFI_CRASH_GUID;
@@ -75,8 +90,9 @@ static int efi_pstore_read_func(struct efivar_entry *entry, 
void *data)
return 0;
 
entry->var.DataSize = 1024;
-   __efivar_entry_get(entry, 

[RFC][PATCH] efivars,efi-pstore: Hold off deletion of sysfs entry until the scan is completed

2013-08-20 Thread Seiji Aguchi
Currently, when mounting pstore file system, a read callback of efi_pstore
driver runs mutiple times as below.

- In the first read callback, scan efivar_sysfs_list from head and pass
  a kmsg buffer of a entry to an upper pstore layer.
- In the second read callback, rescan efivar_sysfs_list from the entry and pass
  another kmsg buffer to it.
- Repeat the scan and pass until the end of efivar_sysfs_list.

In this process, an entry is read across the multiple read function calls.
To avoid race between the read and erasion, the whole process above is
protected by a spinlock, holding in open() and releasing in close().

At the same time, kmemdup() is called to pass the buffer to pstore filesystem
during it.
And then, it causes a following lockdep warning.

To make the read callback runnable without taking spinlok,
holding off a deletion of sysfs entry if it happens while scanning it
via efi_pstore, and deleting it after the scan is completed.

To implement it, this patch introduces two flags, scanning and deleting,
to efivar_entry.
Also, __efivar_entry_get() is removed because it was used in efi_pstore only.

[1.143710] [ cut here ]
[1.144058] WARNING: CPU: 1 PID: 1 at kernel/lockdep.c:2740
lockdep_trace_alloc+0x104/0x110()
[1.144058] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
[1.144058] Modules linked in:

[1.144058] CPU: 1 PID: 1 Comm: systemd Not tainted 3.11.0-rc5 #2
[1.144058]  0009 8800797e9ae0 816614a5
8800797e9b28
[1.144058]  8800797e9b18 8105510d 0080
0046
[1.144058]  00d0 03af 81ccd0c0
8800797e9b78
[1.144058] Call Trace:
[1.144058]  [816614a5] dump_stack+0x54/0x74
[1.144058]  [8105510d] warn_slowpath_common+0x7d/0xa0
[1.144058]  [8105517c] warn_slowpath_fmt+0x4c/0x50
[1.144058]  [8131290f] ? vsscanf+0x57f/0x7b0
[1.144058]  [810bbd74] lockdep_trace_alloc+0x104/0x110
[1.144058]  [81192da0] __kmalloc_track_caller+0x50/0x280
[1.144058]  [815147bb] ?
efi_pstore_read_func.part.1+0x12b/0x170
[1.144058]  [8115b260] kmemdup+0x20/0x50
[1.144058]  [815147bb] efi_pstore_read_func.part.1+0x12b/0x170
[1.144058]  [81514800] ?
efi_pstore_read_func.part.1+0x170/0x170
[1.144058]  [815148b4] efi_pstore_read_func+0xb4/0xe0
[1.144058]  [81512b7b] __efivar_entry_iter+0xfb/0x120
[1.144058]  [8151428f] efi_pstore_read+0x3f/0x50
[1.144058]  [8128d7ba] pstore_get_records+0x9a/0x150
[1.158207]  [812af25c] ? selinux_d_instantiate+0x1c/0x20
[1.158207]  [8128ce30] ? parse_options+0x80/0x80
[1.158207]  [8128ced5] pstore_fill_super+0xa5/0xc0
[1.158207]  [811ae7d2] mount_single+0xa2/0xd0
[1.158207]  [8128ccf8] pstore_mount+0x18/0x20
[1.158207]  [811ae8b9] mount_fs+0x39/0x1b0
[1.158207]  [81160550] ? __alloc_percpu+0x10/0x20
[1.158207]  [811c9493] vfs_kern_mount+0x63/0xf0
[1.158207]  [811cbb0e] do_mount+0x23e/0xa20
[1.158207]  [8115b51b] ? strndup_user+0x4b/0xf0
[1.158207]  [811cc373] SyS_mount+0x83/0xc0
[1.158207]  [81673cc2] system_call_fastpath+0x16/0x1b
[1.158207] ---[ end trace 61981bc62de9f6f4 ]---

Signed-off-by: Seiji Aguchi seiji.agu...@hds.com
---
 drivers/firmware/efi/efi-pstore.c | 145 +++---
 drivers/firmware/efi/efivars.c|   3 +-
 drivers/firmware/efi/vars.c   |  39 +++---
 include/linux/efi.h   |   4 +-
 4 files changed, 151 insertions(+), 40 deletions(-)

diff --git a/drivers/firmware/efi/efi-pstore.c 
b/drivers/firmware/efi/efi-pstore.c
index 73de5a9..db96b0d 100644
--- a/drivers/firmware/efi/efi-pstore.c
+++ b/drivers/firmware/efi/efi-pstore.c
@@ -18,14 +18,12 @@ module_param_named(pstore_disable, efivars_pstore_disable, 
bool, 0644);
 
 static int efi_pstore_open(struct pstore_info *psi)
 {
-   efivar_entry_iter_begin();
psi-data = NULL;
return 0;
 }
 
 static int efi_pstore_close(struct pstore_info *psi)
 {
-   efivar_entry_iter_end();
psi-data = NULL;
return 0;
 }
@@ -38,6 +36,23 @@ struct pstore_read_data {
char **buf;
 };
 
+/**
+ * efi_pstore_read_func
+ * @entry: reading entry
+ * @data:  data of the entry
+ *
+ * This function runs in non-atomic context.
+ *
+ * Also, it returns a size of NVRAM entry logged via efi_pstore_write().
+ * pstore in accordance with the returned value as below.
+ *
+ * size  0: Got data of an entry logged via efi_pstore_write() successfully,
+ *   and pstore filesystem will continue reading subsequent entries.
+ * size == 0: Entry was not logged via efi_pstore_write(),
+ *and efi_pstore driver will continue reading subsequent entries.
+ * size  0: Failed to get data of entry logging

RE: [RFC][PATCH] Introduce page fault tracepoint

2013-08-09 Thread Seiji Aguchi
Any comment?

> -Original Message-
> From: Seiji Aguchi [mailto:seiji.agu...@hds.com]
> Sent: Tuesday, July 30, 2013 6:53 PM
> To: linux-kernel@vger.kernel.org; x...@kernel.org
> Cc: h...@zytor.com; rost...@goodmis.org; mi...@elte.hu; b...@alien8.de; 
> t...@linutronix.de; fdesl...@gmail.com;
> raphael.beamo...@gmail.com; dle-deve...@lists.sourceforge.net; Tomoki Sekiyama
> Subject: [RFC][PATCH] Introduce page fault tracepoint
> 
> This patch introduces page fault tracepoints to x86 architecture
> by switching IDT.
> 
> [Use case of page fault events]
> 
>   Two events, for user and kernel spaces, are introduced at the beginning of
>   page fault handler.
> 
>   - User space event
> There is a request of page fault event for user space as below.
> 
> http://marc.info/?l=linux-mm=136807959830182=2
> http://marc.info/?l=linux-mm=136807959130175=2
> 
>   - Kernel space event:
> Overhead in kernel space is measurable by enabling it.
> 
> [Creating IDT]
> 
>  A way to create IDT is as below.
> 
>  - Introduce set_intr_gate_raw() to register just non-trace handler to IDT.
>This is used at boot time which tracing is disabled.
>  - Make set_intr_gate() macro so that it can register trace handler to
>trace IDT and non-trace handler to normal IDT.
> 
> Signed-off-by: Seiji Aguchi 
> ---
>  arch/x86/include/asm/desc.h |   33 +---
>  arch/x86/include/asm/hw_irq.h   |   14 -
>  arch/x86/include/asm/trace/exceptions.h |   51 
> +++
>  arch/x86/include/asm/traps.h|   22 +
>  arch/x86/kernel/entry_32.S  |   10 ++
>  arch/x86/kernel/entry_64.S  |   13 +++-
>  arch/x86/kernel/head64.c|2 +-
>  arch/x86/kernel/irqinit.c   |2 +-
>  arch/x86/kernel/kvm.c   |2 +-
>  arch/x86/kernel/traps.c |   28 
>  arch/x86/mm/Makefile|2 +
>  arch/x86/mm/fault.c |   22 +
>  12 files changed, 177 insertions(+), 24 deletions(-)
>  create mode 100644 arch/x86/include/asm/trace/exceptions.h
> 
> diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
> index b90e5df..c04302b 100644
> --- a/arch/x86/include/asm/desc.h
> +++ b/arch/x86/include/asm/desc.h
> @@ -327,10 +327,28 @@ static inline void write_trace_idt_entry(int entry, 
> const gate_desc *gate)
>  {
>   write_idt_entry(trace_idt_table, entry, gate);
>  }
> +
> +static inline void _trace_set_gate(int gate, unsigned type, void *addr,
> +unsigned dpl, unsigned ist, unsigned seg)
> +{
> + gate_desc s;
> +
> + pack_gate(, type, (unsigned long)addr, dpl, ist, seg);
> + /*
> +  * does not need to be atomic because it is only done once at
> +  * setup time
> +  */
> + write_trace_idt_entry(gate, );
> +}
>  #else
>  static inline void write_trace_idt_entry(int entry, const gate_desc *gate)
>  {
>  }
> +
> +static inline void _trace_set_gate(int gate, unsigned type, void *addr,
> +unsigned dpl, unsigned ist, unsigned seg)
> +{
> +}
>  #endif
> 
>  static inline void _set_gate(int gate, unsigned type, void *addr,
> @@ -353,12 +371,20 @@ static inline void _set_gate(int gate, unsigned type, 
> void *addr,
>   * Pentium F0 0F bugfix can have resulted in the mapped
>   * IDT being write-protected.
>   */
> -static inline void set_intr_gate(unsigned int n, void *addr)
> +static inline void set_intr_gate_raw(unsigned int n, void *addr)
>  {
>   BUG_ON((unsigned)n > 0xFF);
>   _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS);
>  }
> 
> +#define set_intr_gate(n, addr)   
> \
> + do {\
> + BUG_ON((unsigned)n > 0xFF); \
> + _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS);  \
> + _trace_set_gate(n, GATE_INTERRUPT, trace_##addr, 0, 0,  \
> + __KERNEL_CS);   \
> + } while (0)
> +
>  extern int first_system_vector;
>  /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */
>  extern unsigned long used_vectors[];
> @@ -395,10 +421,7 @@ static inline void trace_set_intr_gate(unsigned int 
> gate, void *addr)
>  #define __trace_alloc_intr_gate(n, addr)
>  #endif
> 
> -static inline void __alloc_intr_gate(unsigned int n, void *addr)
> -{
> - set_intr_gate(n, addr);
> -}
> +#define 

RE: [RFC][PATCH] Introduce page fault tracepoint

2013-08-09 Thread Seiji Aguchi
Any comment?

 -Original Message-
 From: Seiji Aguchi [mailto:seiji.agu...@hds.com]
 Sent: Tuesday, July 30, 2013 6:53 PM
 To: linux-kernel@vger.kernel.org; x...@kernel.org
 Cc: h...@zytor.com; rost...@goodmis.org; mi...@elte.hu; b...@alien8.de; 
 t...@linutronix.de; fdesl...@gmail.com;
 raphael.beamo...@gmail.com; dle-deve...@lists.sourceforge.net; Tomoki Sekiyama
 Subject: [RFC][PATCH] Introduce page fault tracepoint
 
 This patch introduces page fault tracepoints to x86 architecture
 by switching IDT.
 
 [Use case of page fault events]
 
   Two events, for user and kernel spaces, are introduced at the beginning of
   page fault handler.
 
   - User space event
 There is a request of page fault event for user space as below.
 
 http://marc.info/?l=linux-mmm=136807959830182w=2
 http://marc.info/?l=linux-mmm=136807959130175w=2
 
   - Kernel space event:
 Overhead in kernel space is measurable by enabling it.
 
 [Creating IDT]
 
  A way to create IDT is as below.
 
  - Introduce set_intr_gate_raw() to register just non-trace handler to IDT.
This is used at boot time which tracing is disabled.
  - Make set_intr_gate() macro so that it can register trace handler to
trace IDT and non-trace handler to normal IDT.
 
 Signed-off-by: Seiji Aguchi seiji.agu...@hds.com
 ---
  arch/x86/include/asm/desc.h |   33 +---
  arch/x86/include/asm/hw_irq.h   |   14 -
  arch/x86/include/asm/trace/exceptions.h |   51 
 +++
  arch/x86/include/asm/traps.h|   22 +
  arch/x86/kernel/entry_32.S  |   10 ++
  arch/x86/kernel/entry_64.S  |   13 +++-
  arch/x86/kernel/head64.c|2 +-
  arch/x86/kernel/irqinit.c   |2 +-
  arch/x86/kernel/kvm.c   |2 +-
  arch/x86/kernel/traps.c |   28 
  arch/x86/mm/Makefile|2 +
  arch/x86/mm/fault.c |   22 +
  12 files changed, 177 insertions(+), 24 deletions(-)
  create mode 100644 arch/x86/include/asm/trace/exceptions.h
 
 diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
 index b90e5df..c04302b 100644
 --- a/arch/x86/include/asm/desc.h
 +++ b/arch/x86/include/asm/desc.h
 @@ -327,10 +327,28 @@ static inline void write_trace_idt_entry(int entry, 
 const gate_desc *gate)
  {
   write_idt_entry(trace_idt_table, entry, gate);
  }
 +
 +static inline void _trace_set_gate(int gate, unsigned type, void *addr,
 +unsigned dpl, unsigned ist, unsigned seg)
 +{
 + gate_desc s;
 +
 + pack_gate(s, type, (unsigned long)addr, dpl, ist, seg);
 + /*
 +  * does not need to be atomic because it is only done once at
 +  * setup time
 +  */
 + write_trace_idt_entry(gate, s);
 +}
  #else
  static inline void write_trace_idt_entry(int entry, const gate_desc *gate)
  {
  }
 +
 +static inline void _trace_set_gate(int gate, unsigned type, void *addr,
 +unsigned dpl, unsigned ist, unsigned seg)
 +{
 +}
  #endif
 
  static inline void _set_gate(int gate, unsigned type, void *addr,
 @@ -353,12 +371,20 @@ static inline void _set_gate(int gate, unsigned type, 
 void *addr,
   * Pentium F0 0F bugfix can have resulted in the mapped
   * IDT being write-protected.
   */
 -static inline void set_intr_gate(unsigned int n, void *addr)
 +static inline void set_intr_gate_raw(unsigned int n, void *addr)
  {
   BUG_ON((unsigned)n  0xFF);
   _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS);
  }
 
 +#define set_intr_gate(n, addr)   
 \
 + do {\
 + BUG_ON((unsigned)n  0xFF); \
 + _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS);  \
 + _trace_set_gate(n, GATE_INTERRUPT, trace_##addr, 0, 0,  \
 + __KERNEL_CS);   \
 + } while (0)
 +
  extern int first_system_vector;
  /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */
  extern unsigned long used_vectors[];
 @@ -395,10 +421,7 @@ static inline void trace_set_intr_gate(unsigned int 
 gate, void *addr)
  #define __trace_alloc_intr_gate(n, addr)
  #endif
 
 -static inline void __alloc_intr_gate(unsigned int n, void *addr)
 -{
 - set_intr_gate(n, addr);
 -}
 +#define __alloc_intr_gate(n, addr) set_intr_gate(n, addr)
 
  #define alloc_intr_gate(n, addr) \
   do {\
 diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
 index e4ac559..fbd73b7 100644
 --- a/arch/x86/include/asm/hw_irq.h
 +++ b/arch/x86/include/asm/hw_irq.h
 @@ -89,10 +89,22 @@ extern void trace_reschedule_interrupt(void);
  extern void trace_threshold_interrupt

[RFC][PATCH] Introduce page fault tracepoint

2013-07-30 Thread Seiji Aguchi
This patch introduces page fault tracepoints to x86 architecture
by switching IDT.

[Use case of page fault events]

  Two events, for user and kernel spaces, are introduced at the beginning of
  page fault handler.

  - User space event
There is a request of page fault event for user space as below.

http://marc.info/?l=linux-mm=136807959830182=2
http://marc.info/?l=linux-mm=136807959130175=2

  - Kernel space event:
Overhead in kernel space is measurable by enabling it.

[Creating IDT]

 A way to create IDT is as below.

 - Introduce set_intr_gate_raw() to register just non-trace handler to IDT.
   This is used at boot time which tracing is disabled.
 - Make set_intr_gate() macro so that it can register trace handler to
   trace IDT and non-trace handler to normal IDT.

Signed-off-by: Seiji Aguchi 
---
 arch/x86/include/asm/desc.h |   33 +---
 arch/x86/include/asm/hw_irq.h   |   14 -
 arch/x86/include/asm/trace/exceptions.h |   51 +++
 arch/x86/include/asm/traps.h|   22 +
 arch/x86/kernel/entry_32.S  |   10 ++
 arch/x86/kernel/entry_64.S  |   13 +++-
 arch/x86/kernel/head64.c|2 +-
 arch/x86/kernel/irqinit.c   |2 +-
 arch/x86/kernel/kvm.c   |2 +-
 arch/x86/kernel/traps.c |   28 
 arch/x86/mm/Makefile|2 +
 arch/x86/mm/fault.c |   22 +
 12 files changed, 177 insertions(+), 24 deletions(-)
 create mode 100644 arch/x86/include/asm/trace/exceptions.h

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index b90e5df..c04302b 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -327,10 +327,28 @@ static inline void write_trace_idt_entry(int entry, const 
gate_desc *gate)
 {
write_idt_entry(trace_idt_table, entry, gate);
 }
+
+static inline void _trace_set_gate(int gate, unsigned type, void *addr,
+  unsigned dpl, unsigned ist, unsigned seg)
+{
+   gate_desc s;
+
+   pack_gate(, type, (unsigned long)addr, dpl, ist, seg);
+   /*
+* does not need to be atomic because it is only done once at
+* setup time
+*/
+   write_trace_idt_entry(gate, );
+}
 #else
 static inline void write_trace_idt_entry(int entry, const gate_desc *gate)
 {
 }
+
+static inline void _trace_set_gate(int gate, unsigned type, void *addr,
+  unsigned dpl, unsigned ist, unsigned seg)
+{
+}
 #endif
 
 static inline void _set_gate(int gate, unsigned type, void *addr,
@@ -353,12 +371,20 @@ static inline void _set_gate(int gate, unsigned type, 
void *addr,
  * Pentium F0 0F bugfix can have resulted in the mapped
  * IDT being write-protected.
  */
-static inline void set_intr_gate(unsigned int n, void *addr)
+static inline void set_intr_gate_raw(unsigned int n, void *addr)
 {
BUG_ON((unsigned)n > 0xFF);
_set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS);
 }
 
+#define set_intr_gate(n, addr) \
+   do {\
+   BUG_ON((unsigned)n > 0xFF); \
+   _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS);  \
+   _trace_set_gate(n, GATE_INTERRUPT, trace_##addr, 0, 0,  \
+   __KERNEL_CS);   \
+   } while (0)
+
 extern int first_system_vector;
 /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */
 extern unsigned long used_vectors[];
@@ -395,10 +421,7 @@ static inline void trace_set_intr_gate(unsigned int gate, 
void *addr)
 #define __trace_alloc_intr_gate(n, addr)
 #endif
 
-static inline void __alloc_intr_gate(unsigned int n, void *addr)
-{
-   set_intr_gate(n, addr);
-}
+#define __alloc_intr_gate(n, addr) set_intr_gate(n, addr)
 
 #define alloc_intr_gate(n, addr)   \
do {\
diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index e4ac559..fbd73b7 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -89,10 +89,22 @@ extern void trace_reschedule_interrupt(void);
 extern void trace_threshold_interrupt(void);
 extern void trace_call_function_interrupt(void);
 extern void trace_call_function_single_interrupt(void);
+#else /* CONFIG_TRACING */
+#define trace_apic_timer_interrupt apic_timer_interrupt
+#define trace_x86_platform_ipi x86_platform_ipi
+#define trace_error_interrupt error_interrupt
+#define trace_irq_work_interrupt irq_work_interrupt
+#define trace_spurious_interrupt spurious_interrupt
+#define trace_thermal_interrupt thermal_interrupt
+#define trace_reschedule_interrupt reschedule_interrupt
+#

[RFC][PATCH] Introduce page fault tracepoint

2013-07-30 Thread Seiji Aguchi
This patch introduces page fault tracepoints to x86 architecture
by switching IDT.

[Use case of page fault events]

  Two events, for user and kernel spaces, are introduced at the beginning of
  page fault handler.

  - User space event
There is a request of page fault event for user space as below.

http://marc.info/?l=linux-mmm=136807959830182w=2
http://marc.info/?l=linux-mmm=136807959130175w=2

  - Kernel space event:
Overhead in kernel space is measurable by enabling it.

[Creating IDT]

 A way to create IDT is as below.

 - Introduce set_intr_gate_raw() to register just non-trace handler to IDT.
   This is used at boot time which tracing is disabled.
 - Make set_intr_gate() macro so that it can register trace handler to
   trace IDT and non-trace handler to normal IDT.

Signed-off-by: Seiji Aguchi seiji.agu...@hds.com
---
 arch/x86/include/asm/desc.h |   33 +---
 arch/x86/include/asm/hw_irq.h   |   14 -
 arch/x86/include/asm/trace/exceptions.h |   51 +++
 arch/x86/include/asm/traps.h|   22 +
 arch/x86/kernel/entry_32.S  |   10 ++
 arch/x86/kernel/entry_64.S  |   13 +++-
 arch/x86/kernel/head64.c|2 +-
 arch/x86/kernel/irqinit.c   |2 +-
 arch/x86/kernel/kvm.c   |2 +-
 arch/x86/kernel/traps.c |   28 
 arch/x86/mm/Makefile|2 +
 arch/x86/mm/fault.c |   22 +
 12 files changed, 177 insertions(+), 24 deletions(-)
 create mode 100644 arch/x86/include/asm/trace/exceptions.h

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index b90e5df..c04302b 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -327,10 +327,28 @@ static inline void write_trace_idt_entry(int entry, const 
gate_desc *gate)
 {
write_idt_entry(trace_idt_table, entry, gate);
 }
+
+static inline void _trace_set_gate(int gate, unsigned type, void *addr,
+  unsigned dpl, unsigned ist, unsigned seg)
+{
+   gate_desc s;
+
+   pack_gate(s, type, (unsigned long)addr, dpl, ist, seg);
+   /*
+* does not need to be atomic because it is only done once at
+* setup time
+*/
+   write_trace_idt_entry(gate, s);
+}
 #else
 static inline void write_trace_idt_entry(int entry, const gate_desc *gate)
 {
 }
+
+static inline void _trace_set_gate(int gate, unsigned type, void *addr,
+  unsigned dpl, unsigned ist, unsigned seg)
+{
+}
 #endif
 
 static inline void _set_gate(int gate, unsigned type, void *addr,
@@ -353,12 +371,20 @@ static inline void _set_gate(int gate, unsigned type, 
void *addr,
  * Pentium F0 0F bugfix can have resulted in the mapped
  * IDT being write-protected.
  */
-static inline void set_intr_gate(unsigned int n, void *addr)
+static inline void set_intr_gate_raw(unsigned int n, void *addr)
 {
BUG_ON((unsigned)n  0xFF);
_set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS);
 }
 
+#define set_intr_gate(n, addr) \
+   do {\
+   BUG_ON((unsigned)n  0xFF); \
+   _set_gate(n, GATE_INTERRUPT, addr, 0, 0, __KERNEL_CS);  \
+   _trace_set_gate(n, GATE_INTERRUPT, trace_##addr, 0, 0,  \
+   __KERNEL_CS);   \
+   } while (0)
+
 extern int first_system_vector;
 /* used_vectors is BITMAP for irq is not managed by percpu vector_irq */
 extern unsigned long used_vectors[];
@@ -395,10 +421,7 @@ static inline void trace_set_intr_gate(unsigned int gate, 
void *addr)
 #define __trace_alloc_intr_gate(n, addr)
 #endif
 
-static inline void __alloc_intr_gate(unsigned int n, void *addr)
-{
-   set_intr_gate(n, addr);
-}
+#define __alloc_intr_gate(n, addr) set_intr_gate(n, addr)
 
 #define alloc_intr_gate(n, addr)   \
do {\
diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index e4ac559..fbd73b7 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -89,10 +89,22 @@ extern void trace_reschedule_interrupt(void);
 extern void trace_threshold_interrupt(void);
 extern void trace_call_function_interrupt(void);
 extern void trace_call_function_single_interrupt(void);
+#else /* CONFIG_TRACING */
+#define trace_apic_timer_interrupt apic_timer_interrupt
+#define trace_x86_platform_ipi x86_platform_ipi
+#define trace_error_interrupt error_interrupt
+#define trace_irq_work_interrupt irq_work_interrupt
+#define trace_spurious_interrupt spurious_interrupt
+#define trace_thermal_interrupt thermal_interrupt
+#define trace_reschedule_interrupt reschedule_interrupt

RE: Yet more softlockups.

2013-07-05 Thread Seiji Aguchi


> -Original Message-
> From: H. Peter Anvin [mailto:h...@zytor.com]
> Sent: Friday, July 05, 2013 12:41 PM
> To: Thomas Gleixner
> Cc: Dave Jones; Linus Torvalds; Linux Kernel; Ingo Molnar; Peter Zijlstra; 
> Seiji Aguchi
> Subject: Re: Yet more softlockups.
> 
> On 07/05/2013 09:02 AM, Thomas Gleixner wrote:
> > On Fri, 5 Jul 2013, Dave Jones wrote:
> >> On Fri, Jul 05, 2013 at 05:15:07PM +0200, Thomas Gleixner wrote:
> >>   > On Fri, 5 Jul 2013, Dave Jones wrote:
> >>   >
> >>   > > BUG: soft lockup - CPU#3 stuck for 23s! [trinity-child1:14565]
> >>   > > perf samples too long (2519 > 2500), lowering 
> >> kernel.perf_event_max_sample_rate to 5
> >>   > > INFO: NMI handler (perf_event_nmi_handler) took too long to run: 
> >> 238147.002 msecs
> >>   >
> >>   > So we see a softlockup of 23 seconds and the perf_event_nmi_handler
> >>   > claims it did run 23.8 seconds.
> >>   >
> >>   > Are there more instances of NMI handler messages ?
> >>
> >> [ 2552.006181] perf samples too long (2511 > 2500), lowering 
> >> kernel.perf_event_max_sample_rate to 5
> >> [ 2552.008680] INFO: NMI handler (perf_event_nmi_handler) took too long to 
> >> run: 500392.002 msecs
> >
> > Yuck. Spending 50 seconds in NMI context surely explains a softlockup :)
> >
> 
> Hmmm... this makes me wonder if the interrupt tracepoint stuff is at
> fault here, as it changes the IDT handling for NMI context.

This softlockup happens while disabling the interrupt tracepoints,
Because if it is enabled, "smp_trace_apic_timer_interrupt" is displayed
instead of "smp_apic_timer_interrupt" in the call trace below.

But I can't say anything how this issue is related to the tracepoint stuff,
I need to reproduce it on my machine first.

Call Trace:
  
 [] __do_softirq+0xff/0x440
 [] irq_exit+0xcd/0xe0
 [] smp_apic_timer_interrupt+0x6b/0x9b
 [] apic_timer_interrupt+0x6f/0x80

Seiji
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/urgent] x86/tracing: Add irq_enter/exit() in smp_trace_reschedule_interrupt()

2013-07-05 Thread tip-bot for Seiji Aguchi
Commit-ID:  4787c368a9bca39e173d702389ee2eaf0520abc1
Gitweb: http://git.kernel.org/tip/4787c368a9bca39e173d702389ee2eaf0520abc1
Author: Seiji Aguchi 
AuthorDate: Fri, 28 Jun 2013 14:02:11 -0400
Committer:  Ingo Molnar 
CommitDate: Tue, 2 Jul 2013 09:52:31 +0200

x86/tracing: Add irq_enter/exit() in smp_trace_reschedule_interrupt()

Reschedule vector tracepoints may be called in cpu idle state.
This causes lockdep check warning below.

The tracepoint requires rcu but for accuracy it also
requires irq_enter() (tracepoints record the irq context), thus,
the tracepoint interrupt handler should be calling irq_enter()
and not rcu_irq_enter() (irq_enter() calls rcu_irq_enter()).

So, add irq_enter/exit() to smp_trace_reschedule_interrupt()
with common pre/post processing functions, smp_entering_irq()
and exiting_irq() (exiting_irq() calls just irq_exit()
 in arch/x86/include/asm/apic.h),
because these can be shared among reschedule, call_function,
and call_function_single vectors.

[   50.720557] Testing event reschedule_exit:
[   50.721349]
[   50.721502] ===
[   50.721835] [ INFO: suspicious RCU usage. ]
[   50.722169] 3.10.0-rc6-4-gcf910e8 #190 Not tainted
[   50.722582] ---
[   50.722915] 
/c/kernel-tests/src/linux/arch/x86/include/asm/trace/irq_vectors.h:50 
suspicious rcu_dereference_check() usage!
[   50.723770]
[   50.723770] other info that might help us debug this:
[   50.723770]
[   50.724385]
[   50.724385] RCU used illegally from idle CPU!
[   50.724385] rcu_scheduler_active = 1, debug_locks = 0
[   50.725232] RCU used illegally from extended quiescent state!
[   50.725690] no locks held by swapper/0/0.
[   50.726010]
[   50.726010] stack backtrace:
[...]

Signed-off-by: Seiji Aguchi 
Reviewed-by: Steven Rostedt 
Link: http://lkml.kernel.org/r/51cdcfa3.9080...@hds.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/smp.c | 29 ++---
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index f4fe0b8..cdaa347 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -265,23 +265,30 @@ void smp_reschedule_interrupt(struct pt_regs *regs)
 */
 }
 
-void smp_trace_reschedule_interrupt(struct pt_regs *regs)
+static inline void smp_entering_irq(void)
 {
ack_APIC_irq();
+   irq_enter();
+}
+
+void smp_trace_reschedule_interrupt(struct pt_regs *regs)
+{
+   /*
+* Need to call irq_enter() before calling the trace point.
+* __smp_reschedule_interrupt() calls irq_enter/exit() too (in
+* scheduler_ipi(). This is OK, since those functions are allowed
+* to nest.
+*/
+   smp_entering_irq();
trace_reschedule_entry(RESCHEDULE_VECTOR);
__smp_reschedule_interrupt();
trace_reschedule_exit(RESCHEDULE_VECTOR);
+   exiting_irq();
/*
 * KVM uses this interrupt to force a cpu out of guest mode
 */
 }
 
-static inline void call_function_entering_irq(void)
-{
-   ack_APIC_irq();
-   irq_enter();
-}
-
 static inline void __smp_call_function_interrupt(void)
 {
generic_smp_call_function_interrupt();
@@ -290,14 +297,14 @@ static inline void __smp_call_function_interrupt(void)
 
 void smp_call_function_interrupt(struct pt_regs *regs)
 {
-   call_function_entering_irq();
+   smp_entering_irq();
__smp_call_function_interrupt();
exiting_irq();
 }
 
 void smp_trace_call_function_interrupt(struct pt_regs *regs)
 {
-   call_function_entering_irq();
+   smp_entering_irq();
trace_call_function_entry(CALL_FUNCTION_VECTOR);
__smp_call_function_interrupt();
trace_call_function_exit(CALL_FUNCTION_VECTOR);
@@ -312,14 +319,14 @@ static inline void 
__smp_call_function_single_interrupt(void)
 
 void smp_call_function_single_interrupt(struct pt_regs *regs)
 {
-   call_function_entering_irq();
+   smp_entering_irq();
__smp_call_function_single_interrupt();
exiting_irq();
 }
 
 void smp_trace_call_function_single_interrupt(struct pt_regs *regs)
 {
-   call_function_entering_irq();
+   smp_entering_irq();
trace_call_function_single_entry(CALL_FUNCTION_SINGLE_VECTOR);
__smp_call_function_single_interrupt();
trace_call_function_single_exit(CALL_FUNCTION_SINGLE_VECTOR);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   >