RE: [PATCH V2] output the cpu number when printking.

2012-12-26 Thread He, Bo
Thanks, Greg.
I did use this patch to fix many races on SMP. But to respect maintainer, I 
stop pushing the patch to upstream.


-Original Message-
From: Greg KH [mailto:gre...@linuxfoundation.org] 
Sent: Thursday, December 27, 2012 1:50 AM
To: Yanmin Zhang
Cc: He, Bo; Randy Dunlap; a...@linux-foundation.org; mi...@elte.hu; 
linux-kernel@vger.kernel.org; a.p.zijls...@chello.nl
Subject: Re: [PATCH V2] output the cpu number when printking.

On Tue, Dec 25, 2012 at 09:09:05AM +0800, Yanmin Zhang wrote:
> On Mon, 2012-12-24 at 09:55 -0800, Greg KH wrote:
> > On Mon, Dec 24, 2012 at 01:01:55PM +0800, he, bo wrote:
> > > From: "he, bo" 
> > > 
> > > We often hit kernel panic issues on SMP machines because processes 
> > > race on multiple cpu. By adding a new parameter printk.cpu, kernel 
> > > prints cpu number at printk information line. It’s useful to debug 
> > > what cpus are racing.
> > 
> > How useful is this really for normal developers?
> It's very useful to debug race conditions under SMP environment.
> We applied the patch to our Android build image on our smartphones.

That's fine for your application, and seemed to be useful to others with their 
first interactions with SMP systems.  However, once you start to get to "real" 
numbers of CPUs, this information turns pretty pointless, which is why the 
patch was originally rejected.

sorry,

greg k-h


Re: [PATCH V2] output the cpu number when printking.

2012-12-26 Thread Greg KH
On Tue, Dec 25, 2012 at 09:09:05AM +0800, Yanmin Zhang wrote:
> On Mon, 2012-12-24 at 09:55 -0800, Greg KH wrote:
> > On Mon, Dec 24, 2012 at 01:01:55PM +0800, he, bo wrote:
> > > From: "he, bo" 
> > > 
> > > We often hit kernel panic issues on SMP machines because processes race
> > > on multiple cpu. By adding a new parameter printk.cpu, kernel prints
> > > cpu number at printk information line. It’s useful to debug what cpus
> > > are racing.
> > 
> > How useful is this really for normal developers?
> It's very useful to debug race conditions under SMP environment.
> We applied the patch to our Android build image on our smartphones.

That's fine for your application, and seemed to be useful to others
with their first interactions with SMP systems.  However, once you start
to get to "real" numbers of CPUs, this information turns pretty
pointless, which is why the patch was originally rejected.

sorry,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] output the cpu number when printking.

2012-12-26 Thread Greg KH
On Tue, Dec 25, 2012 at 09:09:05AM +0800, Yanmin Zhang wrote:
 On Mon, 2012-12-24 at 09:55 -0800, Greg KH wrote:
  On Mon, Dec 24, 2012 at 01:01:55PM +0800, he, bo wrote:
   From: he, bo bo...@intel.com
   
   We often hit kernel panic issues on SMP machines because processes race
   on multiple cpu. By adding a new parameter printk.cpu, kernel prints
   cpu number at printk information line. It’s useful to debug what cpus
   are racing.
  
  How useful is this really for normal developers?
 It's very useful to debug race conditions under SMP environment.
 We applied the patch to our Android build image on our smartphones.

That's fine for your application, and seemed to be useful to others
with their first interactions with SMP systems.  However, once you start
to get to real numbers of CPUs, this information turns pretty
pointless, which is why the patch was originally rejected.

sorry,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH V2] output the cpu number when printking.

2012-12-26 Thread He, Bo
Thanks, Greg.
I did use this patch to fix many races on SMP. But to respect maintainer, I 
stop pushing the patch to upstream.


-Original Message-
From: Greg KH [mailto:gre...@linuxfoundation.org] 
Sent: Thursday, December 27, 2012 1:50 AM
To: Yanmin Zhang
Cc: He, Bo; Randy Dunlap; a...@linux-foundation.org; mi...@elte.hu; 
linux-kernel@vger.kernel.org; a.p.zijls...@chello.nl
Subject: Re: [PATCH V2] output the cpu number when printking.

On Tue, Dec 25, 2012 at 09:09:05AM +0800, Yanmin Zhang wrote:
 On Mon, 2012-12-24 at 09:55 -0800, Greg KH wrote:
  On Mon, Dec 24, 2012 at 01:01:55PM +0800, he, bo wrote:
   From: he, bo bo...@intel.com
   
   We often hit kernel panic issues on SMP machines because processes 
   race on multiple cpu. By adding a new parameter printk.cpu, kernel 
   prints cpu number at printk information line. It’s useful to debug 
   what cpus are racing.
  
  How useful is this really for normal developers?
 It's very useful to debug race conditions under SMP environment.
 We applied the patch to our Android build image on our smartphones.

That's fine for your application, and seemed to be useful to others with their 
first interactions with SMP systems.  However, once you start to get to real 
numbers of CPUs, this information turns pretty pointless, which is why the 
patch was originally rejected.

sorry,

greg k-h


Re: [PATCH V2] output the cpu number when printking.

2012-12-24 Thread Yanmin Zhang
On Mon, 2012-12-24 at 09:55 -0800, Greg KH wrote:
> On Mon, Dec 24, 2012 at 01:01:55PM +0800, he, bo wrote:
> > From: "he, bo" 
> > 
> > We often hit kernel panic issues on SMP machines because processes race
> > on multiple cpu. By adding a new parameter printk.cpu, kernel prints
> > cpu number at printk information line. It’s useful to debug what cpus
> > are racing.
> 
> How useful is this really for normal developers?
It's very useful to debug race conditions under SMP environment.
We applied the patch to our Android build image on our smartphones.

>   We complained when
> this option was proposed by the ARM developers who were, for the first
> time, handling more than one processor and the issues involved with
> that.  You are enabling this as a default option, for all developers,
> and almost no one will ever need it.
At least, I am a developer to use it every day. :). I need debug various
weird issues and tried many debug tools.

Anyway, we would change the default to _disable_.

> 
> So, if you really want this, don't enable this by default.  Also, go
> back and read the old thread about this option and why it was rejeted
> previously.
Thanks for your kind reminder. I googled it. Is the link
like https://lkml.org/lkml/2012/8/3/121?
Or http://lkml.indiana.edu/hypermail/linux/kernel/1208.0/02097.html?

Above links raise a good question that we might be able to use
ftrace (trace_printk) instead of to extend printk.
1) There are indeed lots of debug methods/tools in kernel because of many
excellent developers' contributions. However, the most frequently-used tool
is still printk.
2) ftrace (I use it often) is good when system mostly doesn't crash.
If system crashes, we couldn't get ftrace info sometimes.
When we debug some hard issues on a multi-core device, the system often
crashes when Graphics driver accesses registers incorrectly. There is no chance
for kernel to dump ftrace or panic info. With printk, we find that's because
2 cpu are racing. One cpu turns off Graphics while another cpu is accessing it.
Of course, we also applied the Android persistent ram patch, so after rebooting,
we still could see the printk buffer of previous booting.
3) Current drivers usually have many dev_dbg statements. We could enable
dynamic debug control to get such info quickly. With the printk cpu number
patch, we quickly enhance them to dump info about potential races among cpu.
With ftrace, we need change all these dev_dbg to trace_printk and that's not
the original objective of driver developers.

Mostly, the patch is to reuse all current printk statements in kernel with
a little overhead.

Thanks for your kind comments.

Yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] output the cpu number when printking.

2012-12-24 Thread Greg KH
On Mon, Dec 24, 2012 at 01:01:55PM +0800, he, bo wrote:
> From: "he, bo" 
> 
> We often hit kernel panic issues on SMP machines because processes race
> on multiple cpu. By adding a new parameter printk.cpu, kernel prints
> cpu number at printk information line. It’s useful to debug what cpus
> are racing.

How useful is this really for normal developers?  We complained when
this option was proposed by the ARM developers who were, for the first
time, handling more than one processor and the issues involved with
that.  You are enabling this as a default option, for all developers,
and almost no one will ever need it.

So, if you really want this, don't enable this by default.  Also, go
back and read the old thread about this option and why it was rejeted
previously.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] output the cpu number when printking.

2012-12-24 Thread Greg KH
On Mon, Dec 24, 2012 at 01:01:55PM +0800, he, bo wrote:
 From: he, bo bo...@intel.com
 
 We often hit kernel panic issues on SMP machines because processes race
 on multiple cpu. By adding a new parameter printk.cpu, kernel prints
 cpu number at printk information line. It’s useful to debug what cpus
 are racing.

How useful is this really for normal developers?  We complained when
this option was proposed by the ARM developers who were, for the first
time, handling more than one processor and the issues involved with
that.  You are enabling this as a default option, for all developers,
and almost no one will ever need it.

So, if you really want this, don't enable this by default.  Also, go
back and read the old thread about this option and why it was rejeted
previously.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] output the cpu number when printking.

2012-12-24 Thread Yanmin Zhang
On Mon, 2012-12-24 at 09:55 -0800, Greg KH wrote:
 On Mon, Dec 24, 2012 at 01:01:55PM +0800, he, bo wrote:
  From: he, bo bo...@intel.com
  
  We often hit kernel panic issues on SMP machines because processes race
  on multiple cpu. By adding a new parameter printk.cpu, kernel prints
  cpu number at printk information line. It’s useful to debug what cpus
  are racing.
 
 How useful is this really for normal developers?
It's very useful to debug race conditions under SMP environment.
We applied the patch to our Android build image on our smartphones.

   We complained when
 this option was proposed by the ARM developers who were, for the first
 time, handling more than one processor and the issues involved with
 that.  You are enabling this as a default option, for all developers,
 and almost no one will ever need it.
At least, I am a developer to use it every day. :). I need debug various
weird issues and tried many debug tools.

Anyway, we would change the default to _disable_.

 
 So, if you really want this, don't enable this by default.  Also, go
 back and read the old thread about this option and why it was rejeted
 previously.
Thanks for your kind reminder. I googled it. Is the link
like https://lkml.org/lkml/2012/8/3/121?
Or http://lkml.indiana.edu/hypermail/linux/kernel/1208.0/02097.html?

Above links raise a good question that we might be able to use
ftrace (trace_printk) instead of to extend printk.
1) There are indeed lots of debug methods/tools in kernel because of many
excellent developers' contributions. However, the most frequently-used tool
is still printk.
2) ftrace (I use it often) is good when system mostly doesn't crash.
If system crashes, we couldn't get ftrace info sometimes.
When we debug some hard issues on a multi-core device, the system often
crashes when Graphics driver accesses registers incorrectly. There is no chance
for kernel to dump ftrace or panic info. With printk, we find that's because
2 cpu are racing. One cpu turns off Graphics while another cpu is accessing it.
Of course, we also applied the Android persistent ram patch, so after rebooting,
we still could see the printk buffer of previous booting.
3) Current drivers usually have many dev_dbg statements. We could enable
dynamic debug control to get such info quickly. With the printk cpu number
patch, we quickly enhance them to dump info about potential races among cpu.
With ftrace, we need change all these dev_dbg to trace_printk and that's not
the original objective of driver developers.

Mostly, the patch is to reuse all current printk statements in kernel with
a little overhead.

Thanks for your kind comments.

Yanmin


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2] output the cpu number when printking.

2012-12-23 Thread he, bo
From: "he, bo" 

We often hit kernel panic issues on SMP machines because processes race
on multiple cpu. By adding a new parameter printk.cpu, kernel prints
cpu number at printk information line. It’s useful to debug what cpus
are racing.

Signed-off-by: he, bo 
---
 Documentation/kernel-parameters.txt |4 
 kernel/printk.c |   18 --
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index ddd84d6..3387a7f 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2378,6 +2378,10 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
printk.time=Show timing data prefixed to each printk message line
Format:   (1/Y/y=enable, 0/N/n=disable)
 
+   printk.cpu= Show cpu data prefixed to each printk message line
+   Format:   (1/Y/y=enable, 0/N/n=disable)
+   printk.cpu takes effect only when printk.time=y.
+
processor.max_cstate=   [HW,ACPI]
Limit processor to maximum C-state
max_cstate=9 overrides any DMI blacklist limit.
diff --git a/kernel/printk.c b/kernel/printk.c
index 19c0d7b..873a226 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -863,9 +863,14 @@ static bool printk_time;
 #endif
 module_param_named(time, printk_time, bool, S_IRUGO | S_IWUSR);
 
+static bool printk_cpu = 1;
+module_param_named(cpu, printk_cpu, bool, S_IRUGO | S_IWUSR);
+
 static size_t print_time(u64 ts, char *buf)
 {
unsigned long rem_nsec;
+   size_t len = 0;
+   int this_cpu;
 
if (!printk_time)
return 0;
@@ -874,8 +879,17 @@ static size_t print_time(u64 ts, char *buf)
return 15;
 
rem_nsec = do_div(ts, 10);
-   return sprintf(buf, "[%5lu.%06lu] ",
-  (unsigned long)ts, rem_nsec / 1000);
+
+   if (printk_cpu) {
+   this_cpu = raw_smp_processor_id();
+   len = sprintf(buf, "[%5lu.%06lu,%u] ",
+  (unsigned long)ts, rem_nsec / 1000, this_cpu);
+   } else {
+   len = sprintf(buf, "[%5lu.%06lu] ",
+  (unsigned long)ts, rem_nsec / 1000);
+   }
+
+   return len;
 }
 
 static size_t print_prefix(const struct log *msg, bool syslog, char *buf)
-- 
1.7.6



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2] output the cpu number when printking.

2012-12-23 Thread he, bo
From: he, bo bo...@intel.com

We often hit kernel panic issues on SMP machines because processes race
on multiple cpu. By adding a new parameter printk.cpu, kernel prints
cpu number at printk information line. It’s useful to debug what cpus
are racing.

Signed-off-by: he, bo bo...@intel.com
---
 Documentation/kernel-parameters.txt |4 
 kernel/printk.c |   18 --
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index ddd84d6..3387a7f 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2378,6 +2378,10 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
printk.time=Show timing data prefixed to each printk message line
Format: bool  (1/Y/y=enable, 0/N/n=disable)
 
+   printk.cpu= Show cpu data prefixed to each printk message line
+   Format: bool  (1/Y/y=enable, 0/N/n=disable)
+   printk.cpu takes effect only when printk.time=y.
+
processor.max_cstate=   [HW,ACPI]
Limit processor to maximum C-state
max_cstate=9 overrides any DMI blacklist limit.
diff --git a/kernel/printk.c b/kernel/printk.c
index 19c0d7b..873a226 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -863,9 +863,14 @@ static bool printk_time;
 #endif
 module_param_named(time, printk_time, bool, S_IRUGO | S_IWUSR);
 
+static bool printk_cpu = 1;
+module_param_named(cpu, printk_cpu, bool, S_IRUGO | S_IWUSR);
+
 static size_t print_time(u64 ts, char *buf)
 {
unsigned long rem_nsec;
+   size_t len = 0;
+   int this_cpu;
 
if (!printk_time)
return 0;
@@ -874,8 +879,17 @@ static size_t print_time(u64 ts, char *buf)
return 15;
 
rem_nsec = do_div(ts, 10);
-   return sprintf(buf, [%5lu.%06lu] ,
-  (unsigned long)ts, rem_nsec / 1000);
+
+   if (printk_cpu) {
+   this_cpu = raw_smp_processor_id();
+   len = sprintf(buf, [%5lu.%06lu,%u] ,
+  (unsigned long)ts, rem_nsec / 1000, this_cpu);
+   } else {
+   len = sprintf(buf, [%5lu.%06lu] ,
+  (unsigned long)ts, rem_nsec / 1000);
+   }
+
+   return len;
 }
 
 static size_t print_prefix(const struct log *msg, bool syslog, char *buf)
-- 
1.7.6



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/