date:20080209

Re: Scheduler(?) regression from 2.6.22 to 2.6.24 for short-lived threads

2008-02-09 Thread Willy Tarreau

On Sun, Feb 10, 2008 at 01:00:56AM -0600, Olof Johansson wrote:
> On Sun, Feb 10, 2008 at 07:15:58AM +0100, Willy Tarreau wrote:
> > On Sat, Feb 09, 2008 at 11:29:41PM -0600, Olof Johansson wrote:
> > > 40M:
> > > 2.6.22time 94315 ms
> > > 2.6.23time 107930 ms
> > > 2.6.24time 113291 ms
> > > 2.6.24-git19  time 110360 ms
> > > 
> > > So with more work per thread, the differences become less but they're
> > > still there. At the 40M loop, with 500 threads it's quite a bit of
> > > runtime per thread.
> > 
> > No, it's really nothing. I had to push the loop to 1 billion to make the 
> > load
> > noticeable. You don't have 500 threads, you have 2 threads and that load is
> > repeated 500 times. And if we look at the numbers, let's take the worst one 
> > :
> > > 40M:
> > > 2.6.24time 113291 ms
> > 113291/500 = 227 microseconds/loop. This is still very low compared to the
> > smallest timeslice you would have (1 ms at HZ=1000).
> > 
> > So your threads are still completing *before* the scheduler has to preempt
> > them.
> 
> Hmm? I get that to be 227ms per loop, which is way more than a full
> timeslice. Running the program took in the range of 2 minutes, so it's
> 11 milliseconds, not microseconds.

Damn you're right! I don't know why I assumed that the reported time was
in microseconds. Nevermind.

> > > It seems generally unfortunate that it takes longer for a new thread to
> > > move over to the second cpu even when the first is busy with the original
> > > thread. I can certainly see cases where this causes suboptimal overall
> > > system behaviour.
> > 
> > In fact, I don't think it takes longer, I think it does not do it at their
> > creation, but will do it immediately after the first slice is consumed. This
> > would explain the important differences here. I don't know how we could 
> > ensure
> > that the new thread is created on the second CPU from the start, though.
> 
> The math doesn't add up for me. Even if it rebalanced at the end of
> the first slice (i.e. after 1ms), that would be a 1ms penalty per
> iteration. With 500 threads that'd be a total penalty of 500ms.

yes you're right.

> > I tried inserting a sched_yield() at the top of the busy loop (1M loops).
> > By default, it did not change a thing. Then I simply set sched_compat_yield
> > to 1, and the two threads then ran simultaneously with a stable low time
> > (2700 ms instead of 10-12 seconds).
> > 
> > Doing so with 10k loops (initial test) shows times in the range 240-300 ms
> > only instead of 2200-6500 ms.
> 
> Right, likely because the long-running cases got stuck at the busy loop
> at the end, which would end up aborting quicker if the other thread got
> scheduled for just a bit. It was a mistake to post that variant of the
> testcase, it's not as relevant and doesn't mimic the original workload I
> was trying to mimic as well as if the first loop was made larger.

agreed, but what's important is not to change the workload, but to see
what changes induce a different behaviour.

> > Ingo, would it be possible (and wise) to ensure that a new thread being
> > created gets immediately rebalanced in order to emulate what is done here
> > with sched_compat_yield=1 and sched_yield() in both threads just after the
> > thread creation ? I don't expect any performance difference doing this,
> > but maybe some shell scripts reliying on short-lived pipes would get faster
> > on SMP.
> 
> There's always the tradeoff of losing cache warmth whenever a thread is
> moved, so I'm not sure if it's a good idea to always migrate it at
> creation time. It's not a simple problem, really.

yes I know. That should not prevent us from experimenting though. If
thread-CPU affinity is too strong and causes the second CPU to be
rarely used, there's something wrong waiting for a fix.

> > > I agree that the testcase is highly artificial. Unfortunately, it's
> > > not uncommon to see these kind of weird testcases from customers tring
> > > to evaluate new hardware. :( They tend to be pared-down versions of
> > > whatever their real workload is (the real workload is doing things more
> > > appropriately, but the smaller version is used for testing). I was lucky
> > > enough to get source snippets to base a standalone reproduction case on
> > > for this, normally we wouldn't even get copies of their binaries.
> > 
> > I'm well aware of that. What's important is to be able to explain what is
> > causing the difference and why the test case does not represent anything
> > related to performance. Maybe the code author wanted to get 500 parallel
> > threads and got his code wrong ?
> 
> I believe it started out as a simple attempt to parallelize a workload
> that sliced the problem too low, instead of slicing it in larger chunks
> and have each thread do more work at a time. It did well on 2.6.22 with
> almost a 2x speedup, but did worse than the single-treaded testcase on a
> 2.6.24 kernel.
> 
> So yes,

Re: [3/6] kgdb: core

2008-02-09 Thread Christoph Hellwig

On Sun, Feb 10, 2008 at 08:43:52AM +0100, Ingo Molnar wrote:
> 
> * Christoph Hellwig <[EMAIL PROTECTED]> wrote:
> 
> > This still doesn't address a lot of the review comments from Jason's 
> > last posting.
> 
> sorry, which mails are those?

It's all in the thread starting with '[PATCH 0/8] kgdb 2.6.25 version',
msgid [EMAIL PROTECTED]
or at http://lkml.org/lkml/2008/2/9/104
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

panic about sysfs with adm1026

2008-02-09 Thread Yinghai Lu

Calling initcall 0x80c4b575: sm_adm1026_init+0x0/0xe()
i2c-adapter i2c-1: : Unrecognized stepping 0x45. Defaulting to ADM1026.
general protection fault:  [1] SMP
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.24-smp-09379-g0cf975e-dirty #34
RIP: 0010:[]  [] sysfs_add_file+0x16/0x81
RSP: :81040503dd50  EFLAGS: 00010286
RAX:  RBX: fffe002e002d002c RCX: 48d9
RDX: 0002 RSI: fffe002e002d002c RDI: 810202c4fb90
RBP:  R08: 810202c4fb90 R09: 
R10: 0002 R11: 0002 R12: fff4
R13: 810202c4fb90 R14: 000c R15: 810202c4fb90
FS:  () GS:80bde000() knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 7fff94de3470 CR3: 00201000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process swapper (pid: 1, threadinfo 81040503c000, task 81020504)
Stack:  810202c4fb90   0001
 810202c4e000 80b87850  810202c4e118
 808e0cc0 802d933c 81040503dd55 810202c17878
Call Trace:
 [] sysfs_create_group+0xa2/0x106
 [] adm1026_detect+0x4b3/0x522
 [] adm1026_detect+0x0/0x522
 [] i2c_probe_address+0xb9/0xfc
 [] i2c_probe+0x162/0x175
 [] adm1026_detect+0x0/0x522
 [] i2c_register_driver+0x9a/0xea
 [] kernel_init+0x15d/0x2c9
 [] child_rip+0xa/0x12
 [] kernel_init+0x0/0x2c9
 [] child_rip+0x0/0x12


Code: c0 84 c0 74 0c 41 58 48 89 df 5b 5d e9 2a 07 00 00 5e 5b 5d c3
41 55 49 89 fd 41 54 41 bc f4 ff ff ff 55 53 48 89 f3 48 83 ec 28 <8b>
76 10 48 8b 3b 66 81 e6 ff 0f 66 81 ce 00 80 0f b7 f6 e8 fd
RIP  [] sysfs_add_file+0x16/0x81
 RSP 
---[ end trace b23a825db37d3043 ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [3/6] kgdb: core

2008-02-09 Thread Ingo Molnar


* Christoph Hellwig <[EMAIL PROTECTED]> wrote:

> This still doesn't address a lot of the review comments from Jason's 
> last posting.

sorry, which mails are those?

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.24-mm1] Mempolicy: silently restrict nodemask to allowed nodes V3

2008-02-09 Thread Linus Torvalds



On Sat, 9 Feb 2008, Greg KH wrote:
> 
> Once the patch goes into Linus's tree, feel free to send it to the
> [EMAIL PROTECTED] address so that we can include it in the 2.6.24.x
> tree.

I've been ignoring the patches because they say "PATCH 2.6.24-mm1", and so 
I simply don't know whether it's supposed to go into *my* kernel or just 
-mm.

There's also been several versions and discussions, so I'd really like to 
have somebody send me a final patch with all the acks etc.. One that is 
clearly for me, not for -mm.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [5/6] x86: kgdb support

2008-02-09 Thread Ingo Molnar


* Sam Ravnborg <[EMAIL PROTECTED]> wrote:

> >  config X86_64
> > def_bool 64BIT
> > +   select KGDB_ARCH_HAS_SHADOW_INFO
> >  
> >  ### Arch settings
> >  config X86
> > @@ -139,6 +140,9 @@ config AUDIT_ARCH
> >  config ARCH_SUPPORTS_AOUT
> > def_bool y
> >  
> > +config ARCH_SUPPORTS_KGDB
> > +   def_bool y
> > +
> 
> Please use the documented HAVE_ approach and not this ugly "one 
> variable per arch" idiom. This was also commented last time the 
> patchset were posted.

hm, i wasnt Cc:-ed to that so i didnt read it yet. I have just followed 
the logical ARHC_SUPPORTS_* idiom which reads more naturally than 
HAVE_ARCH_*. But ... no string feelings, changing it is easy enough, i 
renamed them and pushed out the new iteration to:

   git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-kgdb.git

diffstat and shortlog did not change (just a rename of these config 
variables) - but i'm including them below for completeness of the 
submission.

any other observations?

Ingo

-->
Ingo Molnar (3):
  pids: add pid_max prototype
  uaccess: add probe_kernel_write()
  x86: kgdb support

Jan Kiszka (1):
  consoles: polling support, kgdboc

Jason Wessel (2):
  kgdb: core
  kgdb: document parameters

 Documentation/kernel-parameters.txt |5 +
 arch/x86/Kconfig|4 +
 arch/x86/kernel/Makefile|1 +
 arch/x86/kernel/kgdb.c  |  550 ++
 drivers/char/tty_io.c   |   47 +
 drivers/serial/8250.c   |   62 ++
 drivers/serial/Kconfig  |3 +
 drivers/serial/Makefile |1 +
 drivers/serial/kgdboc.c |  164 +++
 drivers/serial/serial_core.c|   67 ++-
 include/asm-generic/kgdb.h  |   93 ++
 include/asm-x86/kgdb.h  |   87 ++
 include/linux/kgdb.h|  264 +
 include/linux/pid.h |2 +
 include/linux/serial_core.h |4 +
 include/linux/tty_driver.h  |   12 +
 include/linux/uaccess.h |   22 +
 kernel/Makefile |1 +
 kernel/kgdb.c   | 2020 +++
 kernel/sysctl.c |2 +-
 lib/Kconfig.debug   |2 +
 lib/Kconfig.kgdb|   37 +
 22 files changed, 3448 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/kernel/kgdb.c
 create mode 100644 drivers/serial/kgdboc.c
 create mode 100644 include/asm-generic/kgdb.h
 create mode 100644 include/asm-x86/kgdb.h
 create mode 100644 include/linux/kgdb.h
 create mode 100644 kernel/kgdb.c
 create mode 100644 lib/Kconfig.kgdb
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [0/6] kgdb light

2008-02-09 Thread David Miller

From: Ingo Molnar <[EMAIL PROTECTED]>
Date: Sun, 10 Feb 2008 08:13:04 +0100

> this is the "kgdb light" tree that has been also posted at:
> 
>http://lkml.org/lkml/2008/2/9/236
> 
> it is available at:
> 
>git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-kgdb.git
> 
> See the shortlog below.

Thanks for keeping this work alive.

>  - removed the GTOD/clocksource hacks. If a user uses kdgb for extended
>periods of time then GTOD clocksources can get out of sync and we
>might fall back to other clocksources. That is the _right_ thing to 
>do for the kernel, hacking it around to avoid kernel messages was
>wrong.

I suspect something will however need to be done with watchdogs
and things of that nature which will get very confused if the
kernel sits in a breakpoint for a period of time whilst the user
looks at things from the kgdb prompt.

Just a heads up...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [1/6] pids: add pid_max prototype

2008-02-09 Thread Christoph Hellwig

On Sun, Feb 10, 2008 at 08:13:21AM +0100, Ingo Molnar wrote:
> From: Ingo Molnar <[EMAIL PROTECTED]>
> 
> add pid_max prototype - used by sysctl and will be used by kgdb as well.

Looks good, and this should go in ASAP independent of kgdb.

And while you're at it, I think all of the below want to find a suitable
place in a header somewhere:

> @@ -71,7 +72,6 @@ extern int max_threads;
>  extern int core_uses_pid;
>  extern int suid_dumpable;
>  extern char core_pattern[];
> -extern int pid_max;
>  extern int min_free_kbytes;
>  extern int pid_max_min, pid_max_max;
>  extern int sysctl_drop_caches;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [3/6] kgdb: core

2008-02-09 Thread Christoph Hellwig

This still doesn't address a lot of the review comments from Jason's
last posting.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [5/6] x86: kgdb support

2008-02-09 Thread Sam Ravnborg

On Sun, Feb 10, 2008 at 08:13:45AM +0100, Ingo Molnar wrote:
> From: Ingo Molnar <[EMAIL PROTECTED]>
> 
> simplified and streamlined kgdb support on x86, both 32-bit and 64-bit,
> based on patch from:
> 
>   Subject: kgdb: core-lite
>   From: Jason Wessel <[EMAIL PROTECTED]>
> 
> [ and countless other authors - see the patch for details. ]
> 
> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
> Reviewed-by: Thomas Gleixner <[EMAIL PROTECTED]>
> ---
>  arch/x86/Kconfig |4 
>  arch/x86/kernel/Makefile |1 
>  arch/x86/kernel/kgdb.c   |  550 
> +++
>  include/asm-x86/kgdb.h   |   87 +++
>  4 files changed, 642 insertions(+)
> 
> Index: linux-kgdb.q/arch/x86/Kconfig
> ===
> --- linux-kgdb.q.orig/arch/x86/Kconfig
> +++ linux-kgdb.q/arch/x86/Kconfig
> @@ -14,6 +14,7 @@ config X86_32
>  
>  config X86_64
>   def_bool 64BIT
> + select KGDB_ARCH_HAS_SHADOW_INFO
>  
>  ### Arch settings
>  config X86
> @@ -139,6 +140,9 @@ config AUDIT_ARCH
>  config ARCH_SUPPORTS_AOUT
>   def_bool y
>  
> +config ARCH_SUPPORTS_KGDB
> + def_bool y
> +

Please use the documented HAVE_ approach and not this
ugly "one variable per arch" idiom.
This was also commented last time the patchset were posted.

Sam
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [3/6] kgdb: core

2008-02-09 Thread Sam Ravnborg

On Sun, Feb 10, 2008 at 08:13:31AM +0100, Ingo Molnar wrote:
> From: Jason Wessel <[EMAIL PROTECTED]>
> 
> kgdb core code. Handles the protocol and the arch details.
> 
> [ [EMAIL PROTECTED]: heavily modified, simplified and cleaned up. ]

Hi Ingo.

I see that only a very few of my comments posted yesterday got addressed.
On purpose or did you miss them?

Sam
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[18/19] ftrace: add ftrace_enabled sysctl to disable mcount function

2008-02-09 Thread Ingo Molnar

From: Steven Rostedt <[EMAIL PROTECTED]>

This patch adds back the sysctl ftrace_enabled. This time it is
defaulted to on, if DYNAMIC_FTRACE is configured. When ftrace_enabled
is disabled, the ftrace function is set to the stub return.

If DYNAMIC_FTRACE is also configured, on ftrace_enabled = 0,
the registered ftrace functions will all be set to jmps, but no more
new calls to ftrace recording (used to find the ftrace calling sites)
will be called.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 include/linux/ftrace.h |6 ++
 kernel/sysctl.c|   11 
 kernel/trace/ftrace.c  |  125 +
 3 files changed, 124 insertions(+), 18 deletions(-)

Index: linux/include/linux/ftrace.h
===
--- linux.orig/include/linux/ftrace.h
+++ linux/include/linux/ftrace.h
@@ -5,6 +5,12 @@
 
 #include 
 
+extern int ftrace_enabled;
+extern int
+ftrace_enable_sysctl(struct ctl_table *table, int write,
+struct file *filp, void __user *buffer, size_t *lenp,
+loff_t *ppos);
+
 typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip);
 
 struct ftrace_ops {
Index: linux/kernel/sysctl.c
===
--- linux.orig/kernel/sysctl.c
+++ linux/kernel/sysctl.c
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -488,6 +489,16 @@ static struct ctl_table kern_table[] = {
.mode   = 0644,
.proc_handler   = _dointvec,
},
+#ifdef CONFIG_FTRACE
+   {
+   .ctl_name   = CTL_UNNUMBERED,
+   .procname   = "ftrace_enabled",
+   .data   = _enabled,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = _enable_sysctl,
+   },
+#endif
 #ifdef CONFIG_KMOD
{
.ctl_name   = KERN_MODPROBE,
Index: linux/kernel/trace/ftrace.c
===
--- linux.orig/kernel/trace/ftrace.c
+++ linux/kernel/trace/ftrace.c
@@ -20,12 +20,24 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
 #include "trace.h"
 
+#ifdef CONFIG_DYNAMIC_FTRACE
+# define FTRACE_ENABLED_INIT 1
+#else
+# define FTRACE_ENABLED_INIT 0
+#endif
+
+int ftrace_enabled = FTRACE_ENABLED_INIT;
+static int last_ftrace_enabled = FTRACE_ENABLED_INIT;
+
 static DEFINE_SPINLOCK(ftrace_lock);
+static DEFINE_MUTEX(ftrace_sysctl_lock);
+
 static struct ftrace_ops ftrace_list_end __read_mostly =
 {
.func = ftrace_stub,
@@ -78,14 +90,16 @@ static int notrace __register_ftrace_fun
smp_wmb();
ftrace_list = ops;
 
-   /*
-* For one func, simply call it directly.
-* For more than one func, call the chain.
-*/
-   if (ops->next == _list_end)
-   ftrace_trace_function = ops->func;
-   else
-   ftrace_trace_function = ftrace_list_func;
+   if (ftrace_enabled) {
+   /*
+* For one func, simply call it directly.
+* For more than one func, call the chain.
+*/
+   if (ops->next == _list_end)
+   ftrace_trace_function = ops->func;
+   else
+   ftrace_trace_function = ftrace_list_func;
+   }
 
spin_unlock(_lock);
 
@@ -120,10 +134,12 @@ static int notrace __unregister_ftrace_f
 
*p = (*p)->next;
 
-   /* If we only have one func left, then call that directly */
-   if (ftrace_list == _list_end ||
-   ftrace_list->next == _list_end)
-   ftrace_trace_function = ftrace_list->func;
+   if (ftrace_enabled) {
+   /* If we only have one func left, then call that directly */
+   if (ftrace_list == _list_end ||
+   ftrace_list->next == _list_end)
+   ftrace_trace_function = ftrace_list->func;
+   }
 
  out:
spin_unlock(_lock);
@@ -263,7 +279,8 @@ static void notrace ftrace_startup(void)
goto out;
__unregister_ftrace_function(_shutdown_ops);
 
-   ftrace_run_startup_code();
+   if (ftrace_enabled)
+   ftrace_run_startup_code();
  out:
mutex_unlock(_lock);
 }
@@ -275,13 +292,32 @@ static void notrace ftrace_shutdown(void
if (ftraced_suspend)
goto out;
 
-   ftrace_run_shutdown_code();
+   if (ftrace_enabled)
+   ftrace_run_shutdown_code();
 
__register_ftrace_function(_shutdown_ops);
  out:
mutex_unlock(_lock);
 }
 
+static void notrace ftrace_startup_sysctl(void)
+{
+   mutex_lock(_lock);
+   /* ftraced_suspend is true if we want ftrace running */
+   if (ftraced_suspend)
+

[19/19] ftrace

2008-02-09 Thread Ingo Molnar


[ uhm, i cannot count apparently :-) There's no 19th patch. ]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[17/19] ftrace: dynamic enabling/disabling of function calls

2008-02-09 Thread Ingo Molnar

From: Steven Rostedt <[EMAIL PROTECTED]>

This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.

The way this works, is on bootup, a ftrace function is registered
to record the instruction pointer of all places that call the
function.

Later, a kthread is awoken once a second that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time.

e.g.

  call ftrace  /* 5 bytes */

is replaced with

  jmp 3f  /* jmp is 2 bytes and we jump 3 forward */
3:

When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace.  When it is disabled,
we replace the code back to the jmp.

Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls.  A large batch is allocated at
boot up to get most of the calls there.

Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 arch/x86/kernel/Makefile |1 
 arch/x86/kernel/ftrace.c |  237 +++
 include/linux/ftrace.h   |   18 ++
 kernel/trace/Kconfig |   17 ++
 kernel/trace/ftrace.c|  356 ++-
 5 files changed, 597 insertions(+), 32 deletions(-)

Index: linux/arch/x86/kernel/Makefile
===
--- linux.orig/arch/x86/kernel/Makefile
+++ linux/arch/x86/kernel/Makefile
@@ -54,6 +54,7 @@ obj-$(CONFIG_X86_NUMAQ)   += numaq_32.o
 obj-$(CONFIG_X86_SUMMIT_NUMA)  += summit_32.o
 obj-$(CONFIG_X86_VSMP) += vsmp_64.o
 obj-$(CONFIG_KPROBES)  += kprobes.o
+obj-$(CONFIG_DYNAMIC_FTRACE)   += ftrace.o
 obj-$(CONFIG_MODULES)  += module_$(BITS).o
 obj-$(CONFIG_ACPI_SRAT)+= srat_32.o
 obj-$(CONFIG_EFI)  += efi.o efi_$(BITS).o efi_stub_$(BITS).o
Index: linux/arch/x86/kernel/ftrace.c
===
--- /dev/null
+++ linux/arch/x86/kernel/ftrace.c
@@ -0,0 +1,237 @@
+/*
+ * Code for replacing ftrace calls with jumps.
+ *
+ * Copyright (C) 2007-2008 Steven Rostedt <[EMAIL PROTECTED]>
+ *
+ * Thanks goes to Ingo Molnar, for suggesting the idea.
+ * Mathieu Desnoyers, for suggesting postponing the modifications.
+ * Arjan van de Ven, for keeping me straight, and explaining to me
+ * the dangers of modifying code on the run.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define CALL_BACK  5
+
+#define JMPFWD 0x03eb
+
+static unsigned short ftrace_jmp = JMPFWD;
+
+struct ftrace_record {
+   struct dyn_ftrace   rec;
+   int failed;
+} __attribute__((packed));
+
+struct ftrace_page {
+   struct ftrace_page  *next;
+   int index;
+   struct ftrace_recordrecords[];
+} __attribute__((packed));
+
+#define ENTRIES_PER_PAGE \
+  ((PAGE_SIZE - sizeof(struct ftrace_page)) / sizeof(struct ftrace_record))
+
+/* estimate from running different kernels */
+#define NR_TO_INIT 1
+
+#define MCOUNT_ADDR ((long)())
+
+union ftrace_code_union {
+   char code[5];
+   struct {
+   char e8;
+   int offset;
+   } __attribute__((packed));
+};
+
+static struct ftrace_page  *ftrace_pages_start;
+static struct ftrace_page  *ftrace_pages;
+
+notrace struct dyn_ftrace *ftrace_alloc_shutdown_node(unsigned long ip)
+{
+   struct ftrace_record *rec;
+   unsigned short save;
+
+   ip -= CALL_BACK;
+   save = *(short *)ip;
+
+   /* If this was already converted, skip it */
+   if (save == JMPFWD)
+   return NULL;
+
+   if (ftrace_pages->index == ENTRIES_PER_PAGE) {
+   if (!ftrace_pages->next)
+   return NULL;
+   ftrace_pages = ftrace_pages->next;
+   }
+
+   rec = _pages->records[ftrace_pages->index++];
+
+   return >rec;
+}
+
+static int notrace
+ftrace_modify_code(unsigned long ip, unsigned char *old_code,
+  unsigned char *new_code)
+{
+   unsigned short old = *(unsigned short *)old_code;
+   unsigned short new = *(unsigned short *)new_code;
+   unsigned short replaced;
+   int faulted = 0;
+
+   /*
+*

[16/19] ftrace: trace preempt off critical timings

2008-02-09 Thread Ingo Molnar

From: Steven Rostedt <[EMAIL PROTECTED]>

Add preempt off timings. A lot of kernel core code is taken from the RT patch
latency trace that was written by Ingo Molnar.

This adds "preemptoff" and "preemptirqsoff" to 
/debugfs/tracing/available_tracers

Now instead of just tracing irqs off, preemption off can be selected
to be recorded.

When this is selected, it shares the same files as irqs off timings.
One can either trace preemption off, irqs off, or one or the other off.

By echoing "preemptoff" into /debugfs/tracing/current_tracer, recording
of preempt off only is performed. "irqsoff" will only record the time
irqs are disabled, but "preemptirqsoff" will take the total time irqs
or preemption are disabled. Runtime switching of these options is now
supported by simpling echoing in the appropriate trace name into
/debugfs/tracing/current_tracer.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 arch/x86/kernel/process_32.c |3 
 include/linux/ftrace.h   |8 +
 include/linux/irqflags.h |3 
 include/linux/preempt.h  |2 
 kernel/sched.c   |   24 +
 kernel/trace/Kconfig |   25 +
 kernel/trace/Makefile|1 
 kernel/trace/trace_irqsoff.c |  184 +++
 8 files changed, 197 insertions(+), 53 deletions(-)

Index: linux/arch/x86/kernel/process_32.c
===
--- linux.orig/arch/x86/kernel/process_32.c
+++ linux/arch/x86/kernel/process_32.c
@@ -207,7 +207,10 @@ void cpu_idle(void)
play_dead();
 
__get_cpu_var(irq_stat).idle_timestamp = jiffies;
+   /* Don't trace irqs off for idle */
+   stop_critical_timings();
idle();
+   start_critical_timings();
}
tick_nohz_restart_sched_tick();
preempt_enable_no_resched();
Index: linux/include/linux/ftrace.h
===
--- linux.orig/include/linux/ftrace.h
+++ linux/include/linux/ftrace.h
@@ -58,4 +58,12 @@ extern void mcount(void);
 # define time_hardirqs_off(a0, a1) do { } while (0)
 #endif
 
+#ifdef CONFIG_PREEMPT_TRACER
+  extern void notrace trace_preempt_on(unsigned long a0, unsigned long a1);
+  extern void notrace trace_preempt_off(unsigned long a0, unsigned long a1);
+#else
+# define trace_preempt_on(a0, a1)  do { } while (0)
+# define trace_preempt_off(a0, a1) do { } while (0)
+#endif
+
 #endif /* _LINUX_FTRACE_H */
Index: linux/include/linux/irqflags.h
===
--- linux.orig/include/linux/irqflags.h
+++ linux/include/linux/irqflags.h
@@ -41,7 +41,8 @@
 # define INIT_TRACE_IRQFLAGS
 #endif
 
-#ifdef CONFIG_IRQSOFF_TRACER
+#if defined(CONFIG_IRQSOFF_TRACER) || \
+   defined(CONFIG_PREEMPT_TRACER)
  extern void stop_critical_timings(void);
  extern void start_critical_timings(void);
 #else
Index: linux/include/linux/preempt.h
===
--- linux.orig/include/linux/preempt.h
+++ linux/include/linux/preempt.h
@@ -10,7 +10,7 @@
 #include 
 #include 
 
-#ifdef CONFIG_DEBUG_PREEMPT
+#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
   extern void add_preempt_count(int val);
   extern void sub_preempt_count(int val);
 #else
Index: linux/kernel/sched.c
===
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -66,6 +66,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -3772,26 +3773,44 @@ void scheduler_tick(void)
 #endif
 }
 
-#if defined(CONFIG_PREEMPT) && defined(CONFIG_DEBUG_PREEMPT)
+#if defined(CONFIG_PREEMPT) && (defined(CONFIG_DEBUG_PREEMPT) || \
+   defined(CONFIG_PREEMPT_TRACER))
+
+static inline unsigned long get_parent_ip(unsigned long addr)
+{
+   if (in_lock_functions(addr)) {
+   addr = CALLER_ADDR2;
+   if (in_lock_functions(addr))
+   addr = CALLER_ADDR3;
+   }
+   return addr;
+}
 
 void add_preempt_count(int val)
 {
+#ifdef CONFIG_DEBUG_PREEMPT
/*
 * Underflow?
 */
if (DEBUG_LOCKS_WARN_ON((preempt_count() < 0)))
return;
+#endif
preempt_count() += val;
+#ifdef CONFIG_DEBUG_PREEMPT
/*
 * Spinlock count overflowing soon?
 */
DEBUG_LOCKS_WARN_ON((preempt_count() & PREEMPT_MASK) >=
PREEMPT_MASK - 10);
+#endif
+   if (preempt_count() == val)
+   trace_preempt_off(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
 }
 EXPORT_SYMBOL(add_preempt_count);
 
 void sub_preempt_count(int val)
 {
+#ifdef CONFIG_DEBUG_PREEMPT
/*
 *

[15/19] ftrace: trace irq disabled critical timings

2008-02-09 Thread Ingo Molnar

From: Steven Rostedt <[EMAIL PROTECTED]>

This patch adds latency tracing for critical timings
(how long interrupts are disabled for).

 "irqsoff" is added to /debugfs/tracing/available_tracers

Note:
  tracing_max_latency
also holds the max latency for irqsoff (in usecs).
   (default to large number so one must start latency tracing)

  tracing_thresh
threshold (in usecs) to always print out if irqs off
is detected to be longer than stated here.
If irq_thresh is non-zero, then max_irq_latency
is ignored.

Here's an example of a trace with ftrace_enabled = 0

===
preemption latency trace v1.1.5 on 2.6.24-rc7

 latency: 100 us, #3/3, CPU#1 | (M:rt VP:0, KP:0, SP:0 HP:0 #P:2)
-
| task: swapper-0 (uid:0 nice:0 policy:0 rt_prio:0)
-
 => started at: _spin_lock_irqsave+0x2a/0xb7
 => ended at:   _spin_unlock_irqrestore+0x32/0x5f

 _--=> CPU#
/ _-=> irqs-off
   | / _=> need-resched
   || / _---=> hardirq/softirq
   ||| / _--=> preempt-depth
    /
   | delay
   cmd pid | time  |   caller
  \   /|   \   |   /
 swapper-0 1d.s30us+: _spin_lock_irqsave+0x2a/0xb7 
(e1000_update_stats+0x47/0x64c [e1000])
 swapper-0 1d.s3  100us : _spin_unlock_irqrestore+0x32/0x5f 
(e1000_update_stats+0x641/0x64c [e1000])
 swapper-0 1d.s3  100us : trace_hardirqs_on_caller+0x75/0x89 
(_spin_unlock_irqrestore+0x32/0x5f)

vim:ft=help
===

And this is a trace with ftrace_enabled == 1

===
preemption latency trace v1.1.5 on 2.6.24-rc7

 latency: 102 us, #12/12, CPU#1 | (M:rt VP:0, KP:0, SP:0 HP:0 #P:2)
-
| task: swapper-0 (uid:0 nice:0 policy:0 rt_prio:0)
-
 => started at: _spin_lock_irqsave+0x2a/0xb7
 => ended at:   _spin_unlock_irqrestore+0x32/0x5f

 _--=> CPU#
/ _-=> irqs-off
   | / _=> need-resched
   || / _---=> hardirq/softirq
   ||| / _--=> preempt-depth
    /
   | delay
   cmd pid | time  |   caller
  \   /|   \   |   /
 swapper-0 1dNs30us+: _spin_lock_irqsave+0x2a/0xb7 
(e1000_update_stats+0x47/0x64c [e1000])
 swapper-0 1dNs3   46us : e1000_read_phy_reg+0x16/0x225 [e1000] 
(e1000_update_stats+0x5e2/0x64c [e1000])
 swapper-0 1dNs3   46us : e1000_swfw_sync_acquire+0x10/0x99 [e1000] 
(e1000_read_phy_reg+0x49/0x225 [e1000])
 swapper-0 1dNs3   46us : e1000_get_hw_eeprom_semaphore+0x12/0xa6 [e1000] 
(e1000_swfw_sync_acquire+0x36/0x99 [e1000])
 swapper-0 1dNs3   47us : __const_udelay+0x9/0x47 
(e1000_read_phy_reg+0x116/0x225 [e1000])
 swapper-0 1dNs3   47us+: __delay+0x9/0x50 (__const_udelay+0x45/0x47)
 swapper-0 1dNs3   97us : preempt_schedule+0xc/0x84 (__delay+0x4e/0x50)
 swapper-0 1dNs3   98us : e1000_swfw_sync_release+0xc/0x55 [e1000] 
(e1000_read_phy_reg+0x211/0x225 [e1000])
 swapper-0 1dNs3   99us+: e1000_put_hw_eeprom_semaphore+0x9/0x35 [e1000] 
(e1000_swfw_sync_release+0x50/0x55 [e1000])
 swapper-0 1dNs3  101us : _spin_unlock_irqrestore+0xe/0x5f 
(e1000_update_stats+0x641/0x64c [e1000])
 swapper-0 1dNs3  102us : _spin_unlock_irqrestore+0x32/0x5f 
(e1000_update_stats+0x641/0x64c [e1000])
 swapper-0 1dNs3  102us : trace_hardirqs_on_caller+0x75/0x89 
(_spin_unlock_irqrestore+0x32/0x5f)

vim:ft=help
===

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 arch/x86/kernel/process_64.c |3 
 arch/x86/lib/Makefile|1 
 arch/x86/lib/thunk_32.S  |   47 +
 arch/x86/lib/thunk_64.S  |   19 +-
 include/asm-x86/irqflags.h   |   24 --
 include/linux/ftrace.h   |8 
 include/linux/irqflags.h |   12 +
 kernel/fork.c|2 
 kernel/lockdep.c |   23 ++
 kernel/printk.c  |2 
 kernel/trace/Kconfig |   18 +
 kernel/trace/Makefile|1 
 kernel/trace/trace_irqsoff.c |  402 +++
 13 files changed, 531 insertions(+), 31 deletions(-)

Index: linux/arch/x86/kernel/process_64.c
===
--- linux.orig/arch/x86/kernel/process_64.c
+++ linux/arch/x86/kernel/process_64.c
@@ -189,7 +189,10 @@ void cpu_idle(void)
 */
local_irq_disable();
enter_idle();
+   /* Don't trace irqs off for idle */
+   stop_critical_timings();
idle();
+   start_critical_timings();
/* In many cases the interrupt that ended idle
   has already called exit_idle.

[14/19] ftrace: tracer for scheduler wakeup latency

2008-02-09 Thread Ingo Molnar

From: Steven Rostedt <[EMAIL PROTECTED]>

This patch adds the tracer that tracks the wakeup latency of the
highest priority waking task.

  "wakeup" is added to /debugfs/tracing/available_tracers

Also added to /debugfs/tracing

  tracing_max_latency
 holds the current max latency for the wakeup

  wakeup_thresh
 if set to other than zero, a log will be recorded
 for every wakeup that takes longer than the number
 entered in here (usecs for all counters)
 (deletes previous trace)

Examples:

  (with ftrace_enabled = 0)


preemption latency trace v1.1.5 on 2.6.24-rc8

 latency: 26 us, #2/2, CPU#1 | (M:rt VP:0, KP:0, SP:0 HP:0 #P:2)
-
| task: migration/0-3 (uid:0 nice:-5 policy:1 rt_prio:99)
-

 _--=> CPU#
/ _-=> irqs-off
   | / _=> need-resched
   || / _---=> hardirq/softirq
   ||| / _--=> preempt-depth
    /
   | delay
   cmd pid | time  |   caller
  \   /|   \   |   /
   quilt-8551  0d..30us+: wake_up_process+0x15/0x17  
(sched_exec+0xc9/0x100 )
   quilt-8551  0d..4   26us : sched_switch_callback+0x73/0x81 
 (schedule+0x483/0x6d5 )

vim:ft=help


  (with ftrace_enabled = 1)


preemption latency trace v1.1.5 on 2.6.24-rc8

 latency: 36 us, #45/45, CPU#0 | (M:rt VP:0, KP:0, SP:0 HP:0 #P:2)
-
| task: migration/1-5 (uid:0 nice:-5 policy:1 rt_prio:99)
-

 _--=> CPU#
/ _-=> irqs-off
   | / _=> need-resched
   || / _---=> hardirq/softirq
   ||| / _--=> preempt-depth
    /
   | delay
   cmd pid | time  |   caller
  \   /|   \   |   /
bash-10653 1d..30us : wake_up_process+0x15/0x17  
(sched_exec+0xc9/0x100 )
bash-10653 1d..31us : try_to_wake_up+0x271/0x2e7  
(sub_preempt_count+0xc/0x7a )
bash-10653 1d..22us : try_to_wake_up+0x296/0x2e7  
(update_rq_clock+0x9/0x20 )
bash-10653 1d..22us : update_rq_clock+0x1e/0x20  
(__update_rq_clock+0xc/0x90 )
bash-10653 1d..23us : __update_rq_clock+0x1b/0x90  
(sched_clock+0x9/0x29 )
bash-10653 1d..24us : try_to_wake_up+0x2a6/0x2e7  
(activate_task+0xc/0x3f )
bash-10653 1d..24us : activate_task+0x2d/0x3f  
(enqueue_task+0xe/0x66 )
bash-10653 1d..25us : enqueue_task+0x5b/0x66  
(enqueue_task_rt+0x9/0x3c )
bash-10653 1d..26us : try_to_wake_up+0x2ba/0x2e7  
(check_preempt_wakeup+0x12/0x99 )
[...]
bash-10653 1d..5   33us : tracing_record_cmdline+0xcf/0xd4 
 (_spin_unlock+0x9/0x33 )
bash-10653 1d..5   34us : _spin_unlock+0x19/0x33  
(sub_preempt_count+0xc/0x7a )
bash-10653 1d..4   35us : wakeup_sched_switch+0x65/0x2ff  
(_spin_lock_irqsave+0xc/0xa9 )
bash-10653 1d..4   35us : _spin_lock_irqsave+0x19/0xa9  
(add_preempt_count+0xe/0x77 )
bash-10653 1d..4   36us : sched_switch_callback+0x73/0x81 
 (schedule+0x483/0x6d5 )

vim:ft=help


The [...] was added here to not waste your email box space.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 include/linux/ftrace.h|   23 ++
 kernel/trace/Kconfig  |   13 +
 kernel/trace/Makefile |1 
 kernel/trace/trace_sched_wakeup.c |  310 ++
 4 files changed, 343 insertions(+), 4 deletions(-)

Index: linux/include/linux/ftrace.h
===
--- linux.orig/include/linux/ftrace.h
+++ linux/include/linux/ftrace.h
@@ -5,10 +5,6 @@
 
 #include 
 
-#define CALLER_ADDR0 ((unsigned long)__builtin_return_address(0))
-#define CALLER_ADDR1 ((unsigned long)__builtin_return_address(1))
-#define CALLER_ADDR2 ((unsigned long)__builtin_return_address(2))
-
 typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip);
 
 struct ftrace_ops {
@@ -35,4 +31,23 @@ extern void mcount(void);
 # define unregister_ftrace_function(ops) do { } while (0)
 # define clear_ftrace_function(ops) do { } while (0)
 #endif /* CONFIG_FTRACE */
+
+
+#ifdef CONFIG_FRAME_POINTER
+/* TODO: need to fix this for ARM */
+# define CALLER_ADDR0 ((unsigned long)__builtin_return_address(0))
+# define CALLER_ADDR1 ((unsigned long)__builtin_return_address(1))
+# define CALLER_ADDR2 ((unsigned long)__builtin_return_address(2))
+# define CALLER_ADDR3 ((unsigned long)__builtin_return_address(3))
+# define CALLER_ADDR4 ((unsigned long)__builtin_return_address(4))
+# define CALLER_ADDR5 ((unsigned long)__builtin_return_address(5))
+#else
+# define CALLER_ADDR0 ((unsigned long)__builtin_return_address(0))
+# define CALLER_ADDR1 0UL
+# define CALLER_ADDR2 0UL
+#

[12/19] ftrace: function tracer

2008-02-09 Thread Ingo Molnar

From: Steven Rostedt <[EMAIL PROTECTED]>

This is a simple trace that uses the ftrace infrastructure. It is
designed to be fast and small, and easy to use. It is useful to
record things that happen over a very short period of time, and
not to analyze the system in general.

 Updates:

  available_tracers
 "function" is added to this file.

  current_tracer
To enable the function tracer:

  echo function > /debugfs/tracing/current_tracer

 To disable the tracer:

   echo disable > /debugfs/tracing/current_tracer

The output of the function_trace file is as follows

  "echo noverbose > /debugfs/tracing/iter_ctrl"

preemption latency trace v1.1.5 on 2.6.24-rc7-tst

 latency: 0 us, #419428/4361791, CPU#1 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
-
| task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
-

 _--=> CPU#
/ _-=> irqs-off
   | / _=> need-resched
   || / _---=> hardirq/softirq
   ||| / _--=> preempt-depth
    /
   | delay
   cmd pid | time  |   caller
  \   /|   \   |   /
 swapper-0 0d.h. 1595128us+: set_normalized_timespec+0x8/0x2d  
(ktime_get_ts+0x4a/0x4e )
 swapper-0 0d.h. 1595131us+: _spin_lock+0x8/0x18  
(hrtimer_interrupt+0x6e/0x1b0 )

Or with verbose turned on:

  "echo verbose > /debugfs/tracing/iter_ctrl"

preemption latency trace v1.1.5 on 2.6.24-rc7-tst

 latency: 0 us, #419428/4361791, CPU#1 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
-
| task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
-

 swapper 0 0 9   [f3675f41] 1595.128ms (+0.003ms): 
set_normalized_timespec+0x8/0x2d  (ktime_get_ts+0x4a/0x4e )
 swapper 0 0 9  0001 [f3675f45] 1595.131ms (+0.003ms): 
_spin_lock+0x8/0x18  (hrtimer_interrupt+0x6e/0x1b0 )
 swapper 0 0 9  0002 [f3675f48] 1595.135ms (+0.003ms): 
_spin_lock+0x8/0x18  (hrtimer_interrupt+0x6e/0x1b0 )

The "trace" file is not affected by the verbose mode, but is by the symonly.

 echo "nosymonly" > /debugfs/tracing/iter_ctrl

tracer:
[   81.479967] CPU 0: bash:3154 register_ftrace_function+0x5f/0x66 
 <-- _spin_unlock_irqrestore+0xe/0x5a 
[   81.479967] CPU 0: bash:3154 _spin_unlock_irqrestore+0x3e/0x5a 
 <-- sub_preempt_count+0xc/0x7a 
[   81.479968] CPU 0: bash:3154 sub_preempt_count+0x30/0x7a  
<-- in_lock_functions+0x9/0x24 
[   81.479968] CPU 0: bash:3154 vfs_write+0x11d/0x155  <-- 
dnotify_parent+0x12/0x78 
[   81.479968] CPU 0: bash:3154 dnotify_parent+0x2d/0x78  <-- 
_spin_lock+0xe/0x70 
[   81.479969] CPU 0: bash:3154 _spin_lock+0x1b/0x70  <-- 
add_preempt_count+0xe/0x77 
[   81.479969] CPU 0: bash:3154 add_preempt_count+0x3e/0x77  
<-- in_lock_functions+0x9/0x24 

 echo "symonly" > /debugfs/tracing/iter_ctrl

tracer:
[   81.479913] CPU 0: bash:3154 register_ftrace_function+0x5f/0x66 <-- 
_spin_unlock_irqrestore+0xe/0x5a
[   81.479913] CPU 0: bash:3154 _spin_unlock_irqrestore+0x3e/0x5a <-- 
sub_preempt_count+0xc/0x7a
[   81.479913] CPU 0: bash:3154 sub_preempt_count+0x30/0x7a <-- 
in_lock_functions+0x9/0x24
[   81.479914] CPU 0: bash:3154 vfs_write+0x11d/0x155 <-- 
dnotify_parent+0x12/0x78
[   81.479914] CPU 0: bash:3154 dnotify_parent+0x2d/0x78 <-- _spin_lock+0xe/0x70
[   81.479914] CPU 0: bash:3154 _spin_lock+0x1b/0x70 <-- 
add_preempt_count+0xe/0x77
[   81.479914] CPU 0: bash:3154 add_preempt_count+0x3e/0x77 <-- 
in_lock_functions+0x9/0x24

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 kernel/trace/Kconfig   |   13 +++
 kernel/trace/Makefile  |1 
 kernel/trace/trace_functions.c |   73 +
 3 files changed, 87 insertions(+)

Index: linux/kernel/trace/Kconfig
===
--- linux.orig/kernel/trace/Kconfig
+++ linux/kernel/trace/Kconfig
@@ -8,3 +8,16 @@ config TRACING
bool
select DEBUG_FS
 
+config FTRACE
+   bool "Kernel Function Tracer"
+   depends on DEBUG_KERNEL && HAVE_FTRACE
+   select FRAME_POINTER
+   select TRACING
+   help
+ Enable the kernel to trace every kernel function. This is done
+ by using a compiler feature to insert a small, 5-byte No-Operation
+ instruction to the beginning of every kernel function, which NOP
+ sequence is then dynamically patched into a tracer call when
+ tracing is enabled by the administrator. If it's runtime disabled
+ (the bootup default), then the overhead of the instructions is very
+ small and not measurable even in micro-benchmarks.
Index:

[13/19] ftrace: add tracing of context switches

2008-02-09 Thread Ingo Molnar

From: Steven Rostedt <[EMAIL PROTECTED]>

This patch adds context switch tracing, of the format of:

  _--=> CPU#
 / _-=> irqs-off
| / _=> need-resched
|| / _---=> hardirq/softirq
||| / _--=> preempt-depth
 /
| delay
cmd pid | time  |  pid:prio:state
   \   /|   \   |  /
  swapper-0 1d..3137us+:  0:140:R --> 2912:120
 sshd-2912  1d..3216us+:  2912:120:S --> 0:140
  swapper-0 1d..3261us+:  0:140:R --> 2912:120
 bash-2920  0d..3267us+:  2920:120:S --> 0:140
 sshd-2912  1d..3330us!:  2912:120:S --> 0:140
  swapper-0 1d..3   2389us+:  0:140:R --> 2847:120
 yum-upda-2847  1d..3   2411us!:  2847:120:S --> 0:140
  swapper-0 0d..3  11089us+:  0:140:R --> 3139:120
 gdm-bina-3139  0d..3  3us!:  3139:120:S --> 0:140
  swapper-0 1d..3 102328us+:  0:140:R --> 2847:120
 yum-upda-2847  1d..3 102348us!:  2847:120:S --> 0:140

 "sched_switch" is added to /debugfs/tracing/available_tracers

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Cc: Mathieu Desnoyers <[EMAIL PROTECTED]>
Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 kernel/trace/Kconfig  |   11 +++
 kernel/trace/Makefile |1 
 kernel/trace/trace_sched_switch.c |  125 ++
 3 files changed, 137 insertions(+)

Index: linux/kernel/trace/Kconfig
===
--- linux.orig/kernel/trace/Kconfig
+++ linux/kernel/trace/Kconfig
@@ -13,6 +13,7 @@ config FTRACE
depends on DEBUG_KERNEL && HAVE_FTRACE
select FRAME_POINTER
select TRACING
+   select CONTEXT_SWITCH_TRACER
help
  Enable the kernel to trace every kernel function. This is done
  by using a compiler feature to insert a small, 5-byte No-Operation
@@ -21,3 +22,13 @@ config FTRACE
  tracing is enabled by the administrator. If it's runtime disabled
  (the bootup default), then the overhead of the instructions is very
  small and not measurable even in micro-benchmarks.
+
+config CONTEXT_SWITCH_TRACER
+   bool "Trace process context switches"
+   depends on DEBUG_KERNEL
+   select TRACING
+   select MARKERS
+   help
+ This tracer gets called from the context switch and records
+ all switching of tasks.
+
Index: linux/kernel/trace/Makefile
===
--- linux.orig/kernel/trace/Makefile
+++ linux/kernel/trace/Makefile
@@ -1,6 +1,7 @@
 obj-$(CONFIG_FTRACE) += libftrace.o
 
 obj-$(CONFIG_TRACING) += trace.o
+obj-$(CONFIG_CONTEXT_SWITCH_TRACER) += trace_sched_switch.o
 obj-$(CONFIG_FTRACE) += trace_functions.o
 
 libftrace-y := ftrace.o
Index: linux/kernel/trace/trace_sched_switch.c
===
--- /dev/null
+++ linux/kernel/trace/trace_sched_switch.c
@@ -0,0 +1,125 @@
+/*
+ * trace context switch
+ *
+ * Copyright (C) 2007 Steven Rostedt <[EMAIL PROTECTED]>
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "trace.h"
+
+static struct trace_array  *ctx_trace;
+static int __read_mostly   tracer_enabled;
+int __read_mostly  tracing_sched_switch_enabled;
+
+static void notrace
+ctx_switch_func(struct task_struct *prev, struct task_struct *next)
+{
+   struct trace_array *tr = ctx_trace;
+   struct trace_array_cpu *data;
+   unsigned long flags;
+   long disabled;
+   int cpu;
+
+   if (!tracer_enabled)
+   return;
+
+   raw_local_irq_save(flags);
+   cpu = raw_smp_processor_id();
+   data = tr->data[cpu];
+   disabled = atomic_inc_return(>disabled);
+
+   if (likely(disabled == 1))
+   tracing_sched_switch_trace(tr, data, prev, next, flags);
+
+   atomic_dec(>disabled);
+   raw_local_irq_restore(flags);
+}
+
+void ftrace_ctx_switch(struct task_struct *prev, struct task_struct *next)
+{
+   tracing_record_cmdline(prev);
+
+   /*
+* If tracer_switch_func only points to the local
+* switch func, it still needs the ptr passed to it.
+*/
+   ctx_switch_func(prev, next);
+
+   /*
+* Chain to the wakeup tracer (this is a NOP if disabled):
+*/
+   wakeup_sched_switch(prev, next);
+}
+
+static notrace void sched_switch_reset(struct trace_array *tr)
+{
+   int cpu;
+
+   tr->time_start = now(tr->cpu);
+
+   for_each_online_cpu(cpu)
+   tracing_reset(tr->data[cpu]);
+}
+
+static notrace void start_sched_trace(struct trace_array *tr)
+{
+   sched_switch_reset(tr);
+   tracer_enabled = 1;
+}
+
+static notrace void stop_sched_trace(struct trace_array *tr)
+{
+   tracer_enabled = 0;
+}
+
+static notrace void sched_switch_trace_init(struct

[11/19] ftrace: latency tracer infrastructure

2008-02-09 Thread Ingo Molnar

From: Steven Rostedt <[EMAIL PROTECTED]>

This patch adds the latency tracer infrastructure. This patch
does not add anything that will select and turn it on, but will
be used by later patches.

If it were to be compiled, it would add the following files
to the debugfs:

 The root tracing directory:

  /debugfs/tracing/

This patch also adds the following files:

  available_tracers
 list of available tracers. Currently no tracers are
 available. Looking into this file only shows
 "none" which is used to unregister all tracers.

  current_tracer
 The trace that is currently active. Empty on start up.
 To switch to a tracer simply echo one of the tracers that
 are listed in available_tracers:

   example: (used with later patches)

  echo function > /debugfs/tracing/current_tracer

 To disable the tracer:

   echo disable > /debugfs/tracing/current_tracer

  tracing_enabled
 echoing "1" into this file starts the ftrace function tracing
  (if sysctl kernel.ftrace_enabled=1)
 echoing "0" turns it off.

  latency_trace
  This file is readonly and holds the result of the trace.

  trace
  This file outputs a easier to read version of the trace.

  iter_ctrl
  Controls the way the output of traces look.
  So far there's two controls:
echoing in "symonly" will only show the kallsyms variables
without the addresses (if kallsyms was configured)
echoing in "verbose" will change the output to show
a lot more data, but not very easy to understand by
humans.
echoing in "nosymonly" turns off symonly.
echoing in "noverbose" turns off verbose.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 kernel/Makefile   |1 
 kernel/trace/Kconfig  |5 
 kernel/trace/Makefile |2 
 kernel/trace/trace.c  | 1547 ++
 kernel/trace/trace.h  |  185 +
 5 files changed, 1740 insertions(+)

Index: linux/kernel/Makefile
===
--- linux.orig/kernel/Makefile
+++ linux/kernel/Makefile
@@ -69,6 +69,7 @@ obj-$(CONFIG_TASKSTATS) += taskstats.o t
 obj-$(CONFIG_MARKERS) += marker.o
 obj-$(CONFIG_LATENCYTOP) += latencytop.o
 obj-$(CONFIG_FTRACE) += trace/
+obj-$(CONFIG_TRACING) += trace/
 
 ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
 # According to Alan Modra <[EMAIL PROTECTED]>, the -fno-omit-frame-pointer is
Index: linux/kernel/trace/Kconfig
===
--- linux.orig/kernel/trace/Kconfig
+++ linux/kernel/trace/Kconfig
@@ -3,3 +3,8 @@
 #
 config HAVE_FTRACE
bool
+
+config TRACING
+   bool
+   select DEBUG_FS
+
Index: linux/kernel/trace/Makefile
===
--- linux.orig/kernel/trace/Makefile
+++ linux/kernel/trace/Makefile
@@ -1,3 +1,5 @@
 obj-$(CONFIG_FTRACE) += libftrace.o
 
+obj-$(CONFIG_TRACING) += trace.o
+
 libftrace-y := ftrace.o
Index: linux/kernel/trace/trace.c
===
--- /dev/null
+++ linux/kernel/trace/trace.c
@@ -0,0 +1,1547 @@
+/*
+ * ring buffer based function tracer
+ *
+ * Copyright (C) 2007-2008 Steven Rostedt <[EMAIL PROTECTED]>
+ * Copyright (C) 2008 Ingo Molnar <[EMAIL PROTECTED]>
+ *
+ * Originally taken from the RT patch by:
+ *Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
+ *
+ * Based on code from the latency_tracer, that is:
+ *  Copyright (C) 2004-2006 Ingo Molnar
+ *  Copyright (C) 2004 William Lee Irwin III
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "trace.h"
+
+unsigned long __read_mostlytracing_max_latency = (cycle_t)ULONG_MAX;
+unsigned long __read_mostlytracing_thresh;
+
+static long notrace
+ns2usecs(cycle_t nsec)
+{
+   nsec += 500;
+   do_div(nsec, 1000);
+   return nsec;
+}
+
+static atomic_ttracer_counter;
+static struct trace_array  global_trace;
+
+static DEFINE_PER_CPU(struct trace_array_cpu, global_trace_cpu);
+
+static struct trace_array  max_tr;
+
+static DEFINE_PER_CPU(struct trace_array_cpu, max_data);
+
+static int tracer_enabled;
+static unsigned long   trace_nr_entries = 4096UL;
+
+static struct tracer   *trace_types __read_mostly;
+static struct tracer   *current_trace __read_mostly;
+static int max_tracer_type_len;
+
+static DEFINE_MUTEX(trace_types_lock);
+
+static int __init set_nr_entries(char *str)
+{
+   if (!str)
+   return 0;
+   trace_nr_entries = simple_strtoul(str, , 0);
+   return 1;
+}
+__setup("trace_entries=", set_nr_entries);
+
+enum trace_type {
+

[10/19] ftrace: add basic support for gcc profiler instrumentation

2008-02-09 Thread Ingo Molnar

From: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>

If CONFIG_FTRACE is selected and /proc/sys/kernel/ftrace_enabled is
set to a non-zero value the ftrace routine will be called everytime
we enter a kernel function that is not marked with the "notrace"
attribute.

The ftrace routine will then call a registered function if a function
happens to be registered.

[ This code has been highly hacked by Steven Rostedt and Ingo Molnar,
  so don't blame Arnaldo for all of this ;-) ]

Update:
  It is now possible to register more than one ftrace function.
  If only one ftrace function is registered, that will be the
  function that ftrace calls directly. If more than one function
  is registered, then ftrace will call a function that will loop
  through the functions to call.

Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 Makefile   |3 
 arch/x86/Kconfig   |1 
 arch/x86/kernel/entry_32.S |   27 
 arch/x86/kernel/entry_64.S |   37 
 include/linux/ftrace.h |   38 
 kernel/Makefile|1 
 kernel/trace/Kconfig   |5 +
 kernel/trace/Makefile  |3 
 kernel/trace/ftrace.c  |  138 +
 lib/Kconfig.debug  |2 
 10 files changed, 255 insertions(+)

Index: linux/Makefile
===
--- linux.orig/Makefile
+++ linux/Makefile
@@ -509,6 +509,9 @@ endif
 
 include $(srctree)/arch/$(SRCARCH)/Makefile
 
+ifdef CONFIG_FTRACE
+KBUILD_CFLAGS  += -pg
+endif
 ifdef CONFIG_FRAME_POINTER
 KBUILD_CFLAGS  += -fno-omit-frame-pointer -fno-optimize-sibling-calls
 else
Index: linux/arch/x86/Kconfig
===
--- linux.orig/arch/x86/Kconfig
+++ linux/arch/x86/Kconfig
@@ -20,6 +20,7 @@ config X86
def_bool y
select HAVE_OPROFILE
select HAVE_KPROBES
+   select HAVE_FTRACE
 
 config GENERIC_LOCKBREAK
def_bool n
Index: linux/arch/x86/kernel/entry_32.S
===
--- linux.orig/arch/x86/kernel/entry_32.S
+++ linux/arch/x86/kernel/entry_32.S
@@ -75,6 +75,33 @@ DF_MASK  = 0x0400 
 NT_MASK= 0x4000
 VM_MASK= 0x0002
 
+#ifdef CONFIG_FTRACE
+ENTRY(mcount)
+   cmpl $ftrace_stub, ftrace_trace_function
+   jnz trace
+
+.globl ftrace_stub
+ftrace_stub:
+   ret
+
+   /* taken from glibc */
+trace:
+   pushl %eax
+   pushl %ecx
+   pushl %edx
+   movl 0xc(%esp), %eax
+   movl 0x4(%ebp), %edx
+
+   call   *ftrace_trace_function
+
+   popl %edx
+   popl %ecx
+   popl %eax
+
+   jmp ftrace_stub
+END(mcount)
+#endif
+
 #ifdef CONFIG_PREEMPT
 #define preempt_stop(clobbers) DISABLE_INTERRUPTS(clobbers); TRACE_IRQS_OFF
 #else
Index: linux/arch/x86/kernel/entry_64.S
===
--- linux.orig/arch/x86/kernel/entry_64.S
+++ linux/arch/x86/kernel/entry_64.S
@@ -54,6 +54,43 @@
 
.code64
 
+#ifdef CONFIG_FTRACE
+ENTRY(mcount)
+   cmpq $ftrace_stub, ftrace_trace_function
+   jnz trace
+.globl ftrace_stub
+ftrace_stub:
+   retq
+
+trace:
+   /* taken from glibc */
+   subq $0x38, %rsp
+   movq %rax, (%rsp)
+   movq %rcx, 8(%rsp)
+   movq %rdx, 16(%rsp)
+   movq %rsi, 24(%rsp)
+   movq %rdi, 32(%rsp)
+   movq %r8, 40(%rsp)
+   movq %r9, 48(%rsp)
+
+   movq 0x38(%rsp), %rdi
+   movq 8(%rbp), %rsi
+
+   call   *ftrace_trace_function
+
+   movq 48(%rsp), %r9
+   movq 40(%rsp), %r8
+   movq 32(%rsp), %rdi
+   movq 24(%rsp), %rsi
+   movq 16(%rsp), %rdx
+   movq 8(%rsp), %rcx
+   movq (%rsp), %rax
+   addq $0x38, %rsp
+
+   jmp ftrace_stub
+END(mcount)
+#endif
+
 #ifndef CONFIG_PREEMPT
 #define retint_kernel retint_restore_args
 #endif 
Index: linux/include/linux/ftrace.h
===
--- /dev/null
+++ linux/include/linux/ftrace.h
@@ -0,0 +1,38 @@
+#ifndef _LINUX_FTRACE_H
+#define _LINUX_FTRACE_H
+
+#ifdef CONFIG_FTRACE
+
+#include 
+
+#define CALLER_ADDR0 ((unsigned long)__builtin_return_address(0))
+#define CALLER_ADDR1 ((unsigned long)__builtin_return_address(1))
+#define CALLER_ADDR2 ((unsigned long)__builtin_return_address(2))
+
+typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip);
+
+struct ftrace_ops {
+   ftrace_func_t func;
+   struct ftrace_ops *next;
+};
+
+/*
+ * The ftrace_ops must be a static and should also
+ * be read_mostly.  These functions do modify read_mostly variables
+ * so use them sparely. Never free an ftrace_op or modify the
+ * next pointer after it has been registered. Even after unregistering
+ * it, the next pointer may still be

[09/19] ftrace: add notrace annotations for NMI routines

2008-02-09 Thread Ingo Molnar

From: Steven Rostedt <[EMAIL PROTECTED]>

This annotates NMI functions with notrace. Some tracers may be able
to live with this, but some cannot. The safest is to turn it off,
it's not particularly interesting anyway.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 arch/x86/kernel/nmi_32.c   |3 ++-
 arch/x86/kernel/nmi_64.c   |6 --
 arch/x86/kernel/traps_32.c |   12 ++--
 arch/x86/kernel/traps_64.c |   11 ++-
 4 files changed, 18 insertions(+), 14 deletions(-)

Index: linux/arch/x86/kernel/nmi_32.c
===
--- linux.orig/arch/x86/kernel/nmi_32.c
+++ linux/arch/x86/kernel/nmi_32.c
@@ -320,7 +320,8 @@ EXPORT_SYMBOL(touch_nmi_watchdog);
 
 extern void die_nmi(struct pt_regs *, const char *msg);
 
-__kprobes int nmi_watchdog_tick(struct pt_regs * regs, unsigned reason)
+notrace __kprobes int
+nmi_watchdog_tick(struct pt_regs *regs, unsigned reason)
 {
 
/*
Index: linux/arch/x86/kernel/nmi_64.c
===
--- linux.orig/arch/x86/kernel/nmi_64.c
+++ linux/arch/x86/kernel/nmi_64.c
@@ -314,7 +314,8 @@ void touch_nmi_watchdog(void)
 }
 EXPORT_SYMBOL(touch_nmi_watchdog);
 
-int __kprobes nmi_watchdog_tick(struct pt_regs * regs, unsigned reason)
+notrace __kprobes int
+nmi_watchdog_tick(struct pt_regs *regs, unsigned reason)
 {
int sum;
int touched = 0;
@@ -385,7 +386,8 @@ int __kprobes nmi_watchdog_tick(struct p
 
 static unsigned ignore_nmis;
 
-asmlinkage __kprobes void do_nmi(struct pt_regs * regs, long error_code)
+asmlinkage notrace __kprobes void
+do_nmi(struct pt_regs *regs, long error_code)
 {
nmi_enter();
add_pda(__nmi_count,1);
Index: linux/arch/x86/kernel/traps_32.c
===
--- linux.orig/arch/x86/kernel/traps_32.c
+++ linux/arch/x86/kernel/traps_32.c
@@ -665,7 +665,7 @@ gp_in_kernel:
}
 }
 
-static __kprobes void
+static notrace __kprobes void
 mem_parity_error(unsigned char reason, struct pt_regs * regs)
 {
printk(KERN_EMERG "Uhhuh. NMI received for unknown reason %02x on "
@@ -688,7 +688,7 @@ mem_parity_error(unsigned char reason, s
clear_mem_error(reason);
 }
 
-static __kprobes void
+static notrace __kprobes void
 io_check_error(unsigned char reason, struct pt_regs * regs)
 {
unsigned long i;
@@ -705,7 +705,7 @@ io_check_error(unsigned char reason, str
outb(reason, 0x61);
 }
 
-static __kprobes void
+static notrace __kprobes void
 unknown_nmi_error(unsigned char reason, struct pt_regs * regs)
 {
 #ifdef CONFIG_MCA
@@ -727,7 +727,7 @@ unknown_nmi_error(unsigned char reason, 
 
 static DEFINE_SPINLOCK(nmi_print_lock);
 
-void __kprobes die_nmi(struct pt_regs *regs, const char *msg)
+void notrace __kprobes die_nmi(struct pt_regs *regs, const char *msg)
 {
if (notify_die(DIE_NMIWATCHDOG, msg, regs, 0, 2, SIGINT) ==
NOTIFY_STOP)
@@ -758,7 +758,7 @@ void __kprobes die_nmi(struct pt_regs *r
do_exit(SIGSEGV);
 }
 
-static __kprobes void default_do_nmi(struct pt_regs * regs)
+static notrace __kprobes void default_do_nmi(struct pt_regs *regs)
 {
unsigned char reason = 0;
 
@@ -798,7 +798,7 @@ static __kprobes void default_do_nmi(str
 
 static int ignore_nmis;
 
-__kprobes void do_nmi(struct pt_regs * regs, long error_code)
+notrace __kprobes void do_nmi(struct pt_regs *regs, long error_code)
 {
int cpu;
 
Index: linux/arch/x86/kernel/traps_64.c
===
--- linux.orig/arch/x86/kernel/traps_64.c
+++ linux/arch/x86/kernel/traps_64.c
@@ -598,7 +598,8 @@ void die(const char * str, struct pt_reg
oops_end(flags, regs, SIGSEGV);
 }
 
-void __kprobes die_nmi(char *str, struct pt_regs *regs, int do_panic)
+notrace __kprobes void
+die_nmi(char *str, struct pt_regs *regs, int do_panic)
 {
unsigned long flags = oops_begin();
 
@@ -765,7 +766,7 @@ asmlinkage void __kprobes do_general_pro
die("general protection fault", regs, error_code);
 }
 
-static __kprobes void
+static notrace __kprobes void
 mem_parity_error(unsigned char reason, struct pt_regs * regs)
 {
printk(KERN_EMERG "Uhhuh. NMI received for unknown reason %02x.\n",
@@ -789,7 +790,7 @@ mem_parity_error(unsigned char reason, s
outb(reason, 0x61);
 }
 
-static __kprobes void
+static notrace __kprobes void
 io_check_error(unsigned char reason, struct pt_regs * regs)
 {
printk("NMI: IOCK error (debug interrupt?)\n");
@@ -803,7 +804,7 @@ io_check_error(unsigned char reason, str
outb(reason, 0x61);
 }
 
-static __kprobes void
+static notrace __kprobes void
 unknown_nmi_error(unsigned char reason, struct pt_regs * regs)
 {
printk(KERN_EMERG "Uhhuh. NMI received for unknown reason %02x.\n",
@@ -818,7 +819,7 @@ unknown_nmi_error(unsigned char reason, 
 
 /* Runs on IST

[07/19] x86: add notrace annotations to vsyscall.

2008-02-09 Thread Ingo Molnar

From: Steven Rostedt <[EMAIL PROTECTED]>

Add the notrace annotations to the vsyscall functions - there we are
not in kernel context yet, so the tracer function cannot (and must not)
be called.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 arch/x86/kernel/vsyscall_64.c  |3 ++-
 arch/x86/vdso/vclock_gettime.c |   15 ---
 arch/x86/vdso/vgetcpu.c|3 ++-
 include/asm-x86/vsyscall.h |3 ++-
 4 files changed, 14 insertions(+), 10 deletions(-)

Index: linux/arch/x86/kernel/vsyscall_64.c
===
--- linux.orig/arch/x86/kernel/vsyscall_64.c
+++ linux/arch/x86/kernel/vsyscall_64.c
@@ -42,7 +42,8 @@
 #include 
 #include 
 
-#define __vsyscall(nr) __attribute__ ((unused,__section__(".vsyscall_" #nr)))
+#define __vsyscall(nr) \
+   __attribute__ ((unused, __section__(".vsyscall_" #nr))) notrace
 #define __syscall_clobber "r11","cx","memory"
 #define __pa_vsymbol(x)\
({unsigned long v;  \
Index: linux/arch/x86/vdso/vclock_gettime.c
===
--- linux.orig/arch/x86/vdso/vclock_gettime.c
+++ linux/arch/x86/vdso/vclock_gettime.c
@@ -23,7 +23,7 @@
 
 #define gtod vdso_vsyscall_gtod_data
 
-static long vdso_fallback_gettime(long clock, struct timespec *ts)
+notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
 {
long ret;
asm("syscall" : "=a" (ret) :
@@ -31,7 +31,7 @@ static long vdso_fallback_gettime(long c
return ret;
 }
 
-static inline long vgetns(void)
+notrace static inline long vgetns(void)
 {
long v;
cycles_t (*vread)(void);
@@ -40,7 +40,7 @@ static inline long vgetns(void)
return (v * gtod->clock.mult) >> gtod->clock.shift;
 }
 
-static noinline int do_realtime(struct timespec *ts)
+notrace static noinline int do_realtime(struct timespec *ts)
 {
unsigned long seq, ns;
do {
@@ -54,7 +54,8 @@ static noinline int do_realtime(struct t
 }
 
 /* Copy of the version in kernel/time.c which we cannot directly access */
-static void vset_normalized_timespec(struct timespec *ts, long sec, long nsec)
+notrace static void
+vset_normalized_timespec(struct timespec *ts, long sec, long nsec)
 {
while (nsec >= NSEC_PER_SEC) {
nsec -= NSEC_PER_SEC;
@@ -68,7 +69,7 @@ static void vset_normalized_timespec(str
ts->tv_nsec = nsec;
 }
 
-static noinline int do_monotonic(struct timespec *ts)
+notrace static noinline int do_monotonic(struct timespec *ts)
 {
unsigned long seq, ns, secs;
do {
@@ -82,7 +83,7 @@ static noinline int do_monotonic(struct 
return 0;
 }
 
-int __vdso_clock_gettime(clockid_t clock, struct timespec *ts)
+notrace int __vdso_clock_gettime(clockid_t clock, struct timespec *ts)
 {
if (likely(gtod->sysctl_enabled && gtod->clock.vread))
switch (clock) {
@@ -96,7 +97,7 @@ int __vdso_clock_gettime(clockid_t clock
 int clock_gettime(clockid_t, struct timespec *)
__attribute__((weak, alias("__vdso_clock_gettime")));
 
-int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz)
+notrace int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz)
 {
long ret;
if (likely(gtod->sysctl_enabled && gtod->clock.vread)) {
Index: linux/arch/x86/vdso/vgetcpu.c
===
--- linux.orig/arch/x86/vdso/vgetcpu.c
+++ linux/arch/x86/vdso/vgetcpu.c
@@ -13,7 +13,8 @@
 #include 
 #include "vextern.h"
 
-long __vdso_getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *unused)
+notrace long
+__vdso_getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *unused)
 {
unsigned int p;
 
Index: linux/include/asm-x86/vsyscall.h
===
--- linux.orig/include/asm-x86/vsyscall.h
+++ linux/include/asm-x86/vsyscall.h
@@ -24,7 +24,8 @@ enum vsyscall_num {
((unused, __section__ (".vsyscall_gtod_data"),aligned(16)))
 #define __section_vsyscall_clock __attribute__ \
((unused, __section__ (".vsyscall_clock"),aligned(16)))
-#define __vsyscall_fn __attribute__ ((unused,__section__(".vsyscall_fn")))
+#define __vsyscall_fn \
+   __attribute__ ((unused, __section__(".vsyscall_fn"))) notrace
 
 #define VGETCPU_RDTSCP 1
 #define VGETCPU_LSL2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[08/19] ftrace: annotate core code that should not be traced

2008-02-09 Thread Ingo Molnar

From: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>

Mark with "notrace" functions in core code that should not be
traced.  The "notrace" attribute will prevent gcc from adding
a call to ftrace on the annotated funtions.

Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 lib/smp_processor_id.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/lib/smp_processor_id.c
===
--- linux.orig/lib/smp_processor_id.c
+++ linux/lib/smp_processor_id.c
@@ -7,7 +7,7 @@
 #include 
 #include 
 
-unsigned int debug_smp_processor_id(void)
+notrace unsigned int debug_smp_processor_id(void)
 {
unsigned long preempt_count = preempt_count();
int this_cpu = raw_smp_processor_id();
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Change pci_raw_ops to pci_raw_read/write

2008-02-09 Thread Greg KH

On Sat, Feb 09, 2008 at 10:25:23PM -0800, Yinghai Lu wrote:
> On Feb 9, 2008 4:41 AM, Matthew Wilcox <[EMAIL PROTECTED]> wrote:
> > On Thu, Feb 07, 2008 at 10:54:05AM -0500, Tony Camuso wrote:
> > > Matthew,
> > >
> > > Perhaps I missed it, but did you address Yinghai's concerns?
> >
> > No, I was on holiday.
> >
> > > Yinghai Lu wrote:
> > > >On Jan 28, 2008 7:03 PM, Matthew Wilcox <[EMAIL PROTECTED]> wrote:
> > > >>
> > > >>-int pci_conf1_write(unsigned int seg, unsigned int bus,
> > > >>+static int pci_conf1_write(unsigned int seg, unsigned int bus,
> > > >>   unsigned int devfn, int reg, int len, u32
> > > >>   value)
> > > >
> > > >any reason to change pci_conf1_read/write to static?
> >
> > Yes -- it no longer needs to be called from outside this file.
> >
> > > >>+config ATA_RAM
> > > >>+   tristate "ATA RAM driver"
> > > >>+
> > > >
> > > >related?
> >
> 
> looks good. it should get into -mm or x86/mm for some testing

Can I get a revised version of this, without the incorrect hunk?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[06/19] tracing: add notrace to linkage.h

2008-02-09 Thread Ingo Molnar

From: Ingo Molnar <[EMAIL PROTECTED]>

notrace signals that a function should not be traced. Most of the
time this is used by tracers to annotate code that cannot be
traced - it's in a volatile state (such as in user vdso context
or NMI context) or it's in the tracer internals.

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 include/linux/linkage.h |2 ++
 1 file changed, 2 insertions(+)

Index: linux/include/linux/linkage.h
===
--- linux.orig/include/linux/linkage.h
+++ linux/include/linux/linkage.h
@@ -3,6 +3,8 @@
 
 #include 
 
+#define notrace __attribute__((no_instrument_function))
+
 #ifdef __cplusplus
 #define CPP_ASMLINKAGE extern "C"
 #else
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[05/19] ftrace: make the task state char-string visible to all

2008-02-09 Thread Ingo Molnar

From: Steven Rostedt <[EMAIL PROTECTED]>

The tracer wants to be able to convert the state number
into a user visible character. This patch pulls that conversion
string out the scheduler into the header. This way if it were to
ever change, other parts of the kernel will know.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 include/linux/sched.h |2 ++
 kernel/sched.c|2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

Index: linux/include/linux/sched.h
===
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -2117,6 +2117,8 @@ static inline void migration_init(void)
 #define TASK_SIZE_OF(tsk)  TASK_SIZE
 #endif
 
+#define TASK_STATE_TO_CHAR_STR "RSDTtZX"
+
 #endif /* __KERNEL__ */
 
 #endif
Index: linux/kernel/sched.c
===
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -5154,7 +5154,7 @@ out_unlock:
return retval;
 }
 
-static const char stat_nam[] = "RSDTtZX";
+static const char stat_nam[] = TASK_STATE_TO_CHAR_STR;
 
 void sched_show_task(struct task_struct *p)
 {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[04/19] ftrace: add preempt_enable/disable notrace macros

2008-02-09 Thread Ingo Molnar

From: Steven Rostedt <[EMAIL PROTECTED]>

The tracer may need to call preempt_enable and disable functions
for time keeping and such. The trace gets ugly when we see these
functions show up for all traces. To make the output cleaner
this patch adds preempt_enable_notrace and preempt_disable_notrace
to be used by tracer (and debugging) functions.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 include/linux/preempt.h |   32 
 1 file changed, 32 insertions(+)

Index: linux/include/linux/preempt.h
===
--- linux.orig/include/linux/preempt.h
+++ linux/include/linux/preempt.h
@@ -52,6 +52,34 @@ do { \
preempt_check_resched(); \
 } while (0)
 
+/* For debugging and tracer internals only! */
+#define add_preempt_count_notrace(val) \
+   do { preempt_count() += (val); } while (0)
+#define sub_preempt_count_notrace(val) \
+   do { preempt_count() -= (val); } while (0)
+#define inc_preempt_count_notrace() add_preempt_count_notrace(1)
+#define dec_preempt_count_notrace() sub_preempt_count_notrace(1)
+
+#define preempt_disable_notrace() \
+do { \
+   inc_preempt_count_notrace(); \
+   barrier(); \
+} while (0)
+
+#define preempt_enable_no_resched_notrace() \
+do { \
+   barrier(); \
+   dec_preempt_count_notrace(); \
+} while (0)
+
+/* preempt_check_resched is OK to trace */
+#define preempt_enable_notrace() \
+do { \
+   preempt_enable_no_resched_notrace(); \
+   barrier(); \
+   preempt_check_resched(); \
+} while (0)
+
 #else
 
 #define preempt_disable()  do { } while (0)
@@ -59,6 +87,10 @@ do { \
 #define preempt_enable()   do { } while (0)
 #define preempt_check_resched()do { } while (0)
 
+#define preempt_disable_notrace()  do { } while (0)
+#define preempt_enable_no_resched_notrace()do { } while (0)
+#define preempt_enable_notrace()   do { } while (0)
+
 #endif
 
 #ifdef CONFIG_PREEMPT_NOTIFIERS
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[03/19] printk: dont wake up klogd with the rq locked

2008-02-09 Thread Ingo Molnar

From: Steven Rostedt <[EMAIL PROTECTED]>

It is not wise to place a printk where the runqueue lock is held.

I just spent two hours debugging why some of my code was locking up,
to find that the lockup was caused by some debugging printk's that
I had in the scheduler. The printk's were only in rare paths so
they shouldn't be too much of a problem, but after I hit the printk
the system locked up.

Thinking that it was locking up on my code I went looking down the
wrong path. I finally found (after examining an NMI dump) that
the lockup happened because printk was trying to wakeup the klogd
daemon, which caused a deadlock when the try_to_wakeup code tries
to grab the runqueue lock.

This patch adds a runqueue_is_locked interface in sched.c for other
files to see if the current runqueue lock is held. This is used
in printk to determine whether it is safe or not to wake up the klogd.

And with this patch, my code ran fine ;-)

[ [EMAIL PROTECTED]: we also want this to be able to printk something in
 case the scheduler crashes. ]

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 include/linux/sched.h |2 ++
 kernel/printk.c   |   14 ++
 kernel/sched.c|   18 ++
 3 files changed, 30 insertions(+), 4 deletions(-)

Index: linux/include/linux/sched.h
===
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -245,6 +245,8 @@ extern void sched_init_smp(void);
 extern void init_idle(struct task_struct *idle, int cpu);
 extern void init_idle_bootup_task(struct task_struct *idle);
 
+extern int runqueue_is_locked(void);
+
 extern cpumask_t nohz_cpu_mask;
 #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ)
 extern int select_nohz_load_balancer(int cpu);
Index: linux/kernel/printk.c
===
--- linux.orig/kernel/printk.c
+++ linux/kernel/printk.c
@@ -583,9 +583,11 @@ static int have_callable_console(void)
  * @fmt: format string
  *
  * This is printk().  It can be called from any context.  We want it to work.
- * Be aware of the fact that if oops_in_progress is not set, we might try to
- * wake klogd up which could deadlock on runqueue lock if printk() is called
- * from scheduler code.
+ *
+ * Note: if printk() is called with the runqueue lock held, it will not wake
+ * up the klogd. This is to avoid a deadlock from calling printk() in schedule
+ * with the runqueue lock held and having the wake_up grab the runqueue lock
+ * as well.
  *
  * We try to grab the console_sem.  If we succeed, it's easy - we log the 
output and
  * call the console drivers.  If we fail to get the semaphore we place the 
output
@@ -994,7 +996,11 @@ void release_console_sem(void)
console_locked = 0;
up(_sem);
spin_unlock_irqrestore(_lock, flags);
-   if (wake_klogd)
+   /*
+* If we try to wake up klogd while printing with the runqueue lock
+* held, this will deadlock.
+*/
+   if (wake_klogd && !runqueue_is_locked())
wake_up_klogd();
 }
 EXPORT_SYMBOL(release_console_sem);
Index: linux/kernel/sched.c
===
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -621,6 +621,24 @@ unsigned long rt_needs_cpu(int cpu)
 # define const_debug static const
 #endif
 
+/**
+ * runqueue_is_locked
+ *
+ * Returns true if the current cpu runqueue is locked.
+ * This interface allows printk to be called with the runqueue lock
+ * held and know whether or not it is OK to wake up the klogd.
+ */
+int runqueue_is_locked(void)
+{
+   int cpu = get_cpu();
+   struct rq *rq = cpu_rq(cpu);
+   int ret;
+
+   ret = spin_is_locked(>lock);
+   put_cpu();
+   return ret;
+}
+
 /*
  * Debugging: various feature bits
  */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: acpi dsts loading and populate_rootfs

2008-02-09 Thread Christoph Hellwig

On Sun, Feb 10, 2008 at 08:12:26AM +0100, Christoph Hellwig wrote:
> Folks, moving this call around hidden behing in completely unreviewed
> acpi junk is not acceptable.
> 
> Either populate_rootfs _is_ safe to be called earlier and then we should
> do it always or it's not.  Either way such a change should be posted
> separately and reviewd on lkml.
> 
> Len, can you please revert "ACPI: basic initramfs DSDT override support"
> aka commit 71fc47a9adf8ee89e5c96a47222915c5485ac437 until we've sorted
> this out properly?  Thanks.

And while we're at it the file reading thing in there is utter crap
aswell.  You really should be using the firmware loader which works
perfectly fine if you initramfs is set up for it.  So please folks,
back to the drawing board, do it properly and send it out to lkml
for review please.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[01/19] rcu: add support for dynamic ticks and preempt rcu

2008-02-09 Thread Ingo Molnar

From: Steven Rostedt <[EMAIL PROTECTED]>

The PREEMPT-RCU can get stuck if a CPU goes idle and NO_HZ is set. The
idle CPU will not progress the RCU through its grace period and a
synchronize_rcu my get stuck. Without this patch I have a box that will
not boot when PREEMPT_RCU and NO_HZ are set. That same box boots fine
with this patch.

This patch comes from the -rt kernel where it has been tested for
several months.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
Signed-off-by: Paul E. McKenney <[EMAIL PROTECTED]>
Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 include/linux/hardirq.h|   10 ++
 include/linux/rcuclassic.h |3
 include/linux/rcupreempt.h |   22 
 kernel/rcupreempt.c|  224 -
 kernel/softirq.c   |1
 kernel/time/tick-sched.c   |3
 include/linux/hardirq.h|   10 ++
 include/linux/rcuclassic.h |3 
 include/linux/rcupreempt.h |   22 
 kernel/rcupreempt.c|  224 -
 kernel/softirq.c   |1 
 kernel/time/tick-sched.c   |3 
 6 files changed, 259 insertions(+), 4 deletions(-)

Index: linux/include/linux/hardirq.h
===
--- linux.orig/include/linux/hardirq.h
+++ linux/include/linux/hardirq.h
@@ -109,6 +109,14 @@ static inline void account_system_vtime(
 }
 #endif
 
+#if defined(CONFIG_PREEMPT_RCU) && defined(CONFIG_NO_HZ)
+extern void rcu_irq_enter(void);
+extern void rcu_irq_exit(void);
+#else
+# define rcu_irq_enter() do { } while (0)
+# define rcu_irq_exit() do { } while (0)
+#endif /* CONFIG_PREEMPT_RCU */
+
 /*
  * It is safe to do non-atomic ops on ->hardirq_context,
  * because NMI handlers may not preempt and the ops are
@@ -117,6 +125,7 @@ static inline void account_system_vtime(
  */
 #define __irq_enter()  \
do {\
+   rcu_irq_enter();\
account_system_vtime(current);  \
add_preempt_count(HARDIRQ_OFFSET);  \
trace_hardirq_enter();  \
@@ -135,6 +144,7 @@ extern void irq_enter(void);
trace_hardirq_exit();   \
account_system_vtime(current);  \
sub_preempt_count(HARDIRQ_OFFSET);  \
+   rcu_irq_exit(); \
} while (0)
 
 /*
Index: linux/include/linux/rcuclassic.h
===
--- linux.orig/include/linux/rcuclassic.h
+++ linux/include/linux/rcuclassic.h
@@ -160,5 +160,8 @@ extern void rcu_restart_cpu(int cpu);
 extern long rcu_batches_completed(void);
 extern long rcu_batches_completed_bh(void);
 
+#define rcu_enter_nohz()   do { } while (0)
+#define rcu_exit_nohz()do { } while (0)
+
 #endif /* __KERNEL__ */
 #endif /* __LINUX_RCUCLASSIC_H */
Index: linux/include/linux/rcupreempt.h
===
--- linux.orig/include/linux/rcupreempt.h
+++ linux/include/linux/rcupreempt.h
@@ -82,5 +82,27 @@ extern struct rcupreempt_trace *rcupreem
 
 struct softirq_action;
 
+#ifdef CONFIG_NO_HZ
+DECLARE_PER_CPU(long, dynticks_progress_counter);
+
+static inline void rcu_enter_nohz(void)
+{
+   __get_cpu_var(dynticks_progress_counter)++;
+   WARN_ON(__get_cpu_var(dynticks_progress_counter) & 0x1);
+   mb();
+}
+
+static inline void rcu_exit_nohz(void)
+{
+   mb();
+   __get_cpu_var(dynticks_progress_counter)++;
+   WARN_ON(!(__get_cpu_var(dynticks_progress_counter) & 0x1));
+}
+
+#else /* CONFIG_NO_HZ */
+#define rcu_enter_nohz()   do { } while (0)
+#define rcu_exit_nohz()do { } while (0)
+#endif /* CONFIG_NO_HZ */
+
 #endif /* __KERNEL__ */
 #endif /* __LINUX_RCUPREEMPT_H */
Index: linux/kernel/rcupreempt.c
===
--- linux.orig/kernel/rcupreempt.c
+++ linux/kernel/rcupreempt.c
@@ -23,6 +23,10 @@
  * to Suparna Bhattacharya for pushing me completely away
  * from atomic instructions on the read side.
  *
+ *  - Added handling of Dynamic Ticks
+ *  Copyright 2007 - Paul E. Mckenney <[EMAIL PROTECTED]>
+ * - Steven Rostedt <[EMAIL PROTECTED]>
+ *
  * Papers:  http://www.rdrop.com/users/paulmck/RCU
  *
  * Design Document: http://lwn.net/Articles/253651/
@@ -409,6 +413,212 @@ static void __rcu_advance_callbacks(stru
}
 }
 
+#ifdef CONFIG_NO_HZ
+
+DEFINE_PER_CPU(long, dynticks_progress_counter) = 1;
+static DEFINE_PER_CPU(long, rcu_dyntick_snapshot);
+static DEFINE_PER_CPU(int, rcu_update_flag);
+
+/**
+ * rcu_irq_enter - Called from Hard irq handlers and NMI/SMI.
+ *
+ * If the CPU was idle with dynamic ticks active, this updates the
+ * dynticks_progress_counter to let the RCU handling know that the
+

[02/19] sched: add latency tracer callbacks to the scheduler

2008-02-09 Thread Ingo Molnar

From: Ingo Molnar <[EMAIL PROTECTED]>

add 3 lightweight callbacks to the tracer backend.

zero impact if tracing is turned off.

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 include/linux/sched.h |   26 ++
 kernel/sched.c|3 +++
 2 files changed, 29 insertions(+)

Index: linux/include/linux/sched.h
===
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -2027,6 +2027,32 @@ extern int sched_mc_power_savings, sched
 
 extern void normalize_rt_tasks(void);
 
+#ifdef CONFIG_CONTEXT_SWITCH_TRACER
+extern void
+ftrace_ctx_switch(struct task_struct *prev, struct task_struct *next);
+#else
+static inline void
+ftrace_ctx_switch(struct task_struct *prev, struct task_struct *next)
+{
+}
+#endif
+
+#ifdef CONFIG_SCHED_TRACER
+extern void
+ftrace_wake_up_task(struct task_struct *wakee, struct task_struct *curr);
+extern void
+ftrace_wake_up_new_task(struct task_struct *wakee, struct task_struct *curr);
+#else
+static inline void
+ftrace_wake_up_task(struct task_struct *wakee, struct task_struct *curr)
+{
+}
+static inline void
+ftrace_wake_up_new_task(struct task_struct *wakee, struct task_struct *curr)
+{
+}
+#endif
+
 #ifdef CONFIG_FAIR_GROUP_SCHED
 
 extern struct task_group init_task_group;
Index: linux/kernel/sched.c
===
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -1867,6 +1867,7 @@ static int try_to_wake_up(struct task_st
 
 out_activate:
 #endif /* CONFIG_SMP */
+   ftrace_wake_up_task(p, rq->curr);
schedstat_inc(p, se.nr_wakeups);
if (sync)
schedstat_inc(p, se.nr_wakeups_sync);
@@ -2007,6 +2008,7 @@ void wake_up_new_task(struct task_struct
p->sched_class->task_new(rq, p);
inc_nr_running(rq);
}
+   ftrace_wake_up_new_task(p, rq->curr);
check_preempt_curr(rq, p);
 #ifdef CONFIG_SMP
if (p->sched_class->task_wake_up)
@@ -2179,6 +2181,7 @@ context_switch(struct rq *rq, struct tas
struct mm_struct *mm, *oldmm;
 
prepare_task_switch(rq, prev, next);
+   ftrace_ctx_switch(prev, next);
mm = next->mm;
oldmm = prev->active_mm;
/*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[00/19] latency tracer

2008-02-09 Thread Ingo Molnar


this is the latency tracer that has been also posted at:

http://lkml.org/lkml/2008/2/8/435
http://lkml.org/lkml/2008/2/9/127

the tree can be pulled from:

git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git

about 9 iterations of this have been posted to lkml in the past month,
this is the most recent iteration.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

KVM is not seen under X86 config with latest git (32 bit compile)

2008-02-09 Thread Balbir Singh


The KVM configuration is no longer visible in the latest git tree. It looks
like it is selected by HAVE_SETUP_PER_CPU_AREA. I've moved HAVE_KVM to
under CONFIG_X86. Hopefully, this is the right fix.

Comments?

Signed-off-by: Balbir Singh <[EMAIL PROTECTED]>
---

 arch/x86/Kconfig |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff -puN arch/x86/Kconfig~fix-kvm-build arch/x86/Kconfig
--- linux-2.6-git/arch/x86/Kconfig~fix-kvm-build2008-02-10 
12:41:18.0 +0530
+++ linux-2.6-git-balbir/arch/x86/Kconfig   2008-02-10 12:41:37.0 
+0530
@@ -20,6 +20,8 @@ config X86
def_bool y
select HAVE_OPROFILE
select HAVE_KPROBES
+   select HAVE_KVM
+
 
 config GENERIC_LOCKBREAK
def_bool n
@@ -108,8 +110,6 @@ config GENERIC_TIME_VSYSCALL
 config HAVE_SETUP_PER_CPU_AREA
def_bool X86_64
 
-select HAVE_KVM
-
 config ARCH_HIBERNATION_POSSIBLE
def_bool y
depends on !SMP || !X86_VOYAGER
_
-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[5/6] x86: kgdb support

2008-02-09 Thread Ingo Molnar

From: Ingo Molnar <[EMAIL PROTECTED]>

simplified and streamlined kgdb support on x86, both 32-bit and 64-bit,
based on patch from:

  Subject: kgdb: core-lite
  From: Jason Wessel <[EMAIL PROTECTED]>

[ and countless other authors - see the patch for details. ]

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
Reviewed-by: Thomas Gleixner <[EMAIL PROTECTED]>
---
 arch/x86/Kconfig |4 
 arch/x86/kernel/Makefile |1 
 arch/x86/kernel/kgdb.c   |  550 +++
 include/asm-x86/kgdb.h   |   87 +++
 4 files changed, 642 insertions(+)

Index: linux-kgdb.q/arch/x86/Kconfig
===
--- linux-kgdb.q.orig/arch/x86/Kconfig
+++ linux-kgdb.q/arch/x86/Kconfig
@@ -14,6 +14,7 @@ config X86_32
 
 config X86_64
def_bool 64BIT
+   select KGDB_ARCH_HAS_SHADOW_INFO
 
 ### Arch settings
 config X86
@@ -139,6 +140,9 @@ config AUDIT_ARCH
 config ARCH_SUPPORTS_AOUT
def_bool y
 
+config ARCH_SUPPORTS_KGDB
+   def_bool y
+
 # Use the generic interrupt handling code in kernel/irq/:
 config GENERIC_HARDIRQS
bool
Index: linux-kgdb.q/arch/x86/kernel/Makefile
===
--- linux-kgdb.q.orig/arch/x86/kernel/Makefile
+++ linux-kgdb.q/arch/x86/kernel/Makefile
@@ -58,6 +58,7 @@ obj-$(CONFIG_MODULES) += module_$(BITS)
 obj-$(CONFIG_ACPI_SRAT)+= srat_32.o
 obj-$(CONFIG_EFI)  += efi.o efi_$(BITS).o efi_stub_$(BITS).o
 obj-$(CONFIG_DOUBLEFAULT)  += doublefault_32.o
+obj-$(CONFIG_KGDB) += kgdb.o
 obj-$(CONFIG_VM86) += vm86_32.o
 obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
 
Index: linux-kgdb.q/arch/x86/kernel/kgdb.c
===
--- /dev/null
+++ linux-kgdb.q/arch/x86/kernel/kgdb.c
@@ -0,0 +1,550 @@
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2, or (at your option) any
+ * later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ */
+
+/*
+ * Copyright (C) 2004 Amit S. Kale <[EMAIL PROTECTED]>
+ * Copyright (C) 2000-2001 VERITAS Software Corporation.
+ * Copyright (C) 2002 Andi Kleen, SuSE Labs
+ * Copyright (C) 2004 LinSysSoft Technologies Pvt. Ltd.
+ * Copyright (C) 2007 MontaVista Software, Inc.
+ * Copyright (C) 2007-2008 Jason Wessel, Wind River Systems, Inc.
+ */
+/
+ *  Contributor: Lake Stevens Instrument Division$
+ *  Written by:  Glenn Engel $
+ *  Updated by: Amit Kale<[EMAIL PROTECTED]>
+ *  Updated by: Tom Rini <[EMAIL PROTECTED]>
+ *  Updated by: Jason Wessel <[EMAIL PROTECTED]>
+ *  Modified for 386 by Jim Kingdon, Cygnus Support.
+ *  Origianl kgdb, compatibility with 2.1.xx kernel by
+ *  David Grothe <[EMAIL PROTECTED]>
+ *  Integrated into 2.2.5 kernel by Tigran Aivazian <[EMAIL PROTECTED]>
+ *  X86_64 changes from Andi Kleen's patch merged by Jim Houston
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#ifdef CONFIG_X86_32
+# include 
+#else
+# include 
+#endif
+
+/*
+ * Put the error code here just in case the user cares:
+ */
+static int gdb_x86errcode;
+
+/*
+ * Likewise, the vector number here (since GDB only gets the signal
+ * number through the usual means, and that's not very specific):
+ */
+static int gdb_x86vector = -1;
+
+void pt_regs_to_gdb_regs(unsigned long *gdb_regs, struct pt_regs *regs)
+{
+   gdb_regs[GDB_AX]= regs->ax;
+   gdb_regs[GDB_BX]= regs->bx;
+   gdb_regs[GDB_CX]= regs->cx;
+   gdb_regs[GDB_DX]= regs->dx;
+   gdb_regs[GDB_SI]= regs->si;
+   gdb_regs[GDB_DI]= regs->di;
+   gdb_regs[GDB_BP]= regs->bp;
+   gdb_regs[GDB_PS]= regs->flags;
+   gdb_regs[GDB_PC]= regs->ip;
+#ifdef CONFIG_X86_32
+   gdb_regs[GDB_DS]= regs->ds;
+   gdb_regs[GDB_ES]= regs->es;
+   gdb_regs[GDB_CS]= regs->cs;
+   gdb_regs[GDB_SS]= __KERNEL_DS;
+   gdb_regs[GDB_FS]= 0x;
+   gdb_regs[GDB_GS]= 0x;
+#else
+   gdb_regs[GDB_R8]= regs->r8;
+   gdb_regs[GDB_R9]= regs->r9;
+   gdb_regs[GDB_R10]   = regs->r10;
+   gdb_regs[GDB_R11]   = regs->r11;
+   gdb_regs[GDB_R12]   = regs->r12;
+   gdb_regs[GDB_R13]   = regs->r13;
+   gdb_regs[GDB_R14]   = regs->r14;
+   gdb_regs[GDB_R15]   = regs->r15;

[6/6] kgdb: document parameters

2008-02-09 Thread Ingo Molnar

From: Jason Wessel <[EMAIL PROTECTED]>

document the kgdboc module/boot parameter.

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 Documentation/kernel-parameters.txt |5 +
 1 file changed, 5 insertions(+)

Index: linux-kgdb.q/Documentation/kernel-parameters.txt
===
--- linux-kgdb.q.orig/Documentation/kernel-parameters.txt
+++ linux-kgdb.q/Documentation/kernel-parameters.txt
@@ -930,6 +930,11 @@ and is between 256 and 4096 characters. 
kstack=N[X86-32,X86-64] Print N words from the kernel stack
in oops dumps.
 
+   kgdboc= [HW] kgdb over consoles.
+   Requires a tty driver that supports console polling.
+   (only serial suported for now)
+   Format: [,baud]
+
l2cr=   [PPC]
 
lapic   [X86-32,APIC] Enable the local APIC even if BIOS
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[4/6] consoles: polling support, kgdboc

2008-02-09 Thread Ingo Molnar

From: Jan Kiszka <[EMAIL PROTECTED]>

polled console handling support, to access a console in an irq-less
way while in debug or irq context.

absolutely zero impact as long as CONFIG_CONSOLE_POLL is disabled.
(which is the default)

kgdb over consoles support from:

   Jason Wessel <[EMAIL PROTECTED]>

[ [EMAIL PROTECTED]: redesign, splitups, cleanups. ]

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
Reviewed-by: Thomas Gleixner <[EMAIL PROTECTED]>
---
 drivers/char/tty_io.c|   47 
 drivers/serial/8250.c|   62 
 drivers/serial/Kconfig   |3 
 drivers/serial/Makefile  |1 
 drivers/serial/kgdboc.c  |  164 +++
 drivers/serial/serial_core.c |   67 +
 include/linux/serial_core.h  |4 +
 include/linux/tty_driver.h   |   12 +++
 8 files changed, 359 insertions(+), 1 deletion(-)

Index: linux-kgdb.q/drivers/char/tty_io.c
===
--- linux-kgdb.q.orig/drivers/char/tty_io.c
+++ linux-kgdb.q/drivers/char/tty_io.c
@@ -1155,6 +1155,48 @@ static struct tty_driver *get_tty_driver
return NULL;
 }
 
+#ifdef CONFIG_CONSOLE_POLL
+
+/**
+ * tty_find_polling_driver -   find device of a polled tty
+ * @name: name string to match
+ * @line: pointer to resulting tty line nr
+ *
+ * This routine returns a tty driver structure, given a name
+ * and the condition that the tty driver is capable of polled
+ * operation.
+ */
+struct tty_driver *tty_find_polling_driver(char *name, int *line)
+{
+   struct tty_driver *p, *res = NULL;
+   int tty_line = 0;
+   char *str;
+
+   mutex_lock(_mutex);
+   /* Search through the tty devices to look for a match */
+   list_for_each_entry(p, _drivers, tty_drivers) {
+   str = name + strlen(p->name);
+   tty_line = simple_strtoul(str, , 10);
+   if (*str == ',')
+   str++;
+   if (*str == '\0')
+   str = 0;
+
+   if (tty_line >= 0 && tty_line <= p->num && p->poll_init &&
+   !p->poll_init(p, tty_line, str)) {
+
+   res = p;
+   *line = tty_line;
+   break;
+   }
+   }
+   mutex_unlock(_mutex);
+
+   return res;
+}
+EXPORT_SYMBOL_GPL(tty_find_polling_driver);
+#endif
+
 /**
  * tty_check_change-   check for POSIX terminal changes
  * @tty: tty to check
@@ -3850,6 +3892,11 @@ void tty_set_operations(struct tty_drive
driver->write_proc = op->write_proc;
driver->tiocmget = op->tiocmget;
driver->tiocmset = op->tiocmset;
+#ifdef CONFIG_CONSOLE_POLL
+   driver->poll_init = op->poll_init;
+   driver->poll_get_char = op->poll_get_char;
+   driver->poll_put_char = op->poll_put_char;
+#endif
 }
 
 
Index: linux-kgdb.q/drivers/serial/8250.c
===
--- linux-kgdb.q.orig/drivers/serial/8250.c
+++ linux-kgdb.q/drivers/serial/8250.c
@@ -1740,6 +1740,64 @@ static inline void wait_for_xmitr(struct
}
 }
 
+#ifdef CONFIG_CONSOLE_POLL
+/*
+ * Console polling routines for writing and reading from the uart while
+ * in an interrupt or debug context.
+ */
+
+static int serial8250_get_poll_char(struct uart_port *port)
+{
+   struct uart_8250_port *up = (struct uart_8250_port *)port;
+   unsigned char lsr = serial_inp(up, UART_LSR);
+
+   while (!(lsr & UART_LSR_DR))
+   lsr = serial_inp(up, UART_LSR);
+
+   return serial_inp(up, UART_RX);
+}
+
+
+static void serial8250_put_poll_char(struct uart_port *port,
+unsigned char c)
+{
+   unsigned int ier;
+   struct uart_8250_port *up = (struct uart_8250_port *)port;
+
+   /*
+*  First save the IER then disable the interrupts
+*/
+   ier = serial_in(up, UART_IER);
+#ifdef UART_CAP_UUE
+   if (up->capabilities & UART_CAP_UUE)
+#else
+   if (up->port.type == PORT_XSCALE)
+#endif
+   serial_out(up, UART_IER, UART_IER_UUE);
+   else
+   serial_out(up, UART_IER, 0);
+
+   wait_for_xmitr(up, BOTH_EMPTY);
+   /*
+*  Send the character out.
+*  If a LF, also do CR...
+*/
+   serial_out(up, UART_TX, c);
+   if (c == 10) {
+   wait_for_xmitr(up, BOTH_EMPTY);
+   serial_out(up, UART_TX, 13);
+   }
+
+   /*
+*  Finally, wait for transmitter to become empty
+*  and restore the IER
+*/
+   wait_for_xmitr(up, BOTH_EMPTY);
+   serial_out(up, UART_IER, ier);
+}
+
+#endif /* CONFIG_CONSOLE_POLL */
+
 static int serial8250_startup(struct uart_port *port)
 {
struct uart_8250_port *up = (struct uart_8250_port *)port;
@@ -2386,6 +2444,10 @@ static struct uart_ops serial8250_pops =

[1/6] pids: add pid_max prototype

2008-02-09 Thread Ingo Molnar

From: Ingo Molnar <[EMAIL PROTECTED]>

add pid_max prototype - used by sysctl and will be used by kgdb as well.

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]>
---
 include/linux/pid.h |2 ++
 kernel/sysctl.c |2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

Index: linux-kgdb.q/include/linux/pid.h
===
--- linux-kgdb.q.orig/include/linux/pid.h
+++ linux-kgdb.q/include/linux/pid.h
@@ -86,6 +86,8 @@ extern struct task_struct *FASTCALL(get_
 
 extern struct pid *get_task_pid(struct task_struct *task, enum pid_type type);
 
+extern int pid_max;
+
 /*
  * attach_pid() and detach_pid() must be called with the tasklist_lock
  * write-held.
Index: linux-kgdb.q/kernel/sysctl.c
===
--- linux-kgdb.q.orig/kernel/sysctl.c
+++ linux-kgdb.q/kernel/sysctl.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -71,7 +72,6 @@ extern int max_threads;
 extern int core_uses_pid;
 extern int suid_dumpable;
 extern char core_pattern[];
-extern int pid_max;
 extern int min_free_kbytes;
 extern int pid_max_min, pid_max_max;
 extern int sysctl_drop_caches;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[2/6] uaccess: add probe_kernel_write()

2008-02-09 Thread Ingo Molnar

From: Ingo Molnar <[EMAIL PROTECTED]>

add probe_kernel_write() - copy & paste of the existing
probe_kernel_access(), extended to writes.

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
Reviewed-by: Thomas Gleixner <[EMAIL PROTECTED]>
---
 include/linux/uaccess.h |   22 ++
 1 file changed, 22 insertions(+)

Index: linux-kgdb.q/include/linux/uaccess.h
===
--- linux-kgdb.q.orig/include/linux/uaccess.h
+++ linux-kgdb.q/include/linux/uaccess.h
@@ -84,4 +84,26 @@ static inline unsigned long __copy_from_
ret;\
})
 
+/**
+ * probe_kernel_write(): safely attempt to write to a location
+ * @addr: address to write to - its type is type typeof(rdval)*
+ * @rdval: write to this variable
+ *
+ * Safely write to address @addr from variable @rdval.  If a kernel fault
+ * happens, handle that and return -EFAULT.
+ */
+#define probe_kernel_write(addr, rdval)\
+   ({  \
+   long ret;   \
+   mm_segment_t old_fs = get_fs(); \
+   \
+   set_fs(KERNEL_DS);  \
+   pagefault_disable();\
+   ret = __put_user(rdval, \
+(__force typeof(rdval) __user *)(addr));   \
+   pagefault_enable(); \
+   set_fs(old_fs); \
+   ret;\
+   })
+
 #endif /* __LINUX_UACCESS_H__ */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[0/6] kgdb light

2008-02-09 Thread Ingo Molnar


this is the "kgdb light" tree that has been also posted at:

   http://lkml.org/lkml/2008/2/9/236

it is available at:

   git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-kgdb.git

See the shortlog below.

various iterations of this have also been included in x86.git for the 
past 3 months.

This is a slimmed-down and cleaned up version of KGDB that i've created 
out of the original patches that we submitted two weeks ago. I went over 
the kgdb patches with Thomas and we cut out everything that we did not 
like, and cleaned up the result.

KGDB is still just as functional as it was before (i tested it on 32-bit 
and 64-bit x86) - and any desired extra capability or complexity should 
be added as a delta improvement, not in this initial merge.

The difference between the original kgdb submission and this submission 
is best visible in the diffstat:

 before: 41 files changed, 4007 insertions(+), 33 deletions(-)
 after:  22 files changed, 3448 insertions(+),  2 deletions(-)

what got removed:

 - removed _all_ critical path impact, even if KGDB is enabled and
   active. The only notifier list it is registered in is the die
   notifiers, but even there it has the minimum priority of -INT_MAX, to
   be called as the last one of the die notifiers. I removed the 'early
   trap hook', the trap handler tweaks, everything. KGDB's only impact
   now are the arch details it implements in arch/x86/kernel/kgdb.c,
   nothing else.

 - removed all the lowlevel serial drivers. KGDB should not be in the 
   business of writing special-purpose Linux drivers. In fact i found a 
   testsystem where the KGDB 8250 driver would not work - it's simply 
   reimplementing the wheel that drivers/serial already implements, and 
   poorly so. Any "early debugging" functionality should be done via 
   extending the early-console concept, not via special-purpose KGDB 
   drivers.

 - I have added a redesigned and cleaned up version of the "KGDB over 
   polled consoles" approach (KGDBOC) - i believe this should be the 
   only IO transport for KGDB: it is an extension of the "console" 
   concept - nothing more, nothing less. Netconsole fits this concept 
   quite nicely as well. The moment a console driver is extended with 
   polling functionality, KGDB will be usable through that IO transport, 
   without having to know about hardware details.

 - I have removed the longjump code. That code was ugly beyond belief, 
   it tried to fix up KGDB's own faults and needed to hook into all the 
   fault handlers. It is totally, utterly wrong to do it like that. The 
   code now uses pure probe_kernel_address() accesses.

 - removed the module symbol hacks - those need a clean solution.

 - removed the GTOD/clocksource hacks. If a user uses kdgb for extended
   periods of time then GTOD clocksources can get out of sync and we
   might fall back to other clocksources. That is the _right_ thing to 
   do for the kernel, hacking it around to avoid kernel messages was
   wrong.

 - i have removed the softlockup hacks as well.

 - removed the toplevel Makefile changes - if any change is needed in 
   that area (i'm not convinced thre is), then those changes need to go
   through Sam & the kbuild folks.

 - removed the might_sleep scheduler hack as well, and the thread_return
   hack.

 - [ and did lots of other cleanups and rewrites as well. ]

as a result, this kgdb series has _obviously_ zero impact on the kernel, 
because it just does not touch any dangerous codepath. From this point 
on KGDB can evolve in small, well-controlled baby steps, as all other 
kernel code as well.

and the resulting kgdb is still very functional: it can still break into 
a kernel (via SysRq-G), can catch crashes, can single-step, etc. It's 
already a quite usable first step.

I have tested this tree on x86 32-bit and 64-bit. Other architectures 
are not expected to be impacted.

Ingo

-->
Ingo Molnar (3):
  pids: add pid_max prototype
  uaccess: add probe_kernel_write()
  x86: kgdb support

Jan Kiszka (1):
  consoles: polling support, kgdboc

Jason Wessel (2):
  kgdb: core
  kgdb: document parameters

 Documentation/kernel-parameters.txt |5 +
 arch/x86/Kconfig|4 +
 arch/x86/kernel/Makefile|1 +
 arch/x86/kernel/kgdb.c  |  550 ++
 drivers/char/tty_io.c   |   47 +
 drivers/serial/8250.c   |   62 ++
 drivers/serial/Kconfig  |3 +
 drivers/serial/Makefile |1 +
 drivers/serial/kgdboc.c |  164 +++
 drivers/serial/serial_core.c|   67 ++-
 include/asm-generic/kgdb.h  |   93 ++
 include/asm-x86/kgdb.h  |   87 ++
 include/linux/kgdb.h|  264 +
 include/linux/pid.h |2 +
 include/linux/serial_core.h |4 +
 include/linux/tty_driver.h  |   12 +
 include/linux/uaccess.h |   22 +

acpi dsts loading and populate_rootfs

2008-02-09 Thread Christoph Hellwig

Folks, moving this call around hidden behing in completely unreviewed
acpi junk is not acceptable.

Either populate_rootfs _is_ safe to be called earlier and then we should
do it always or it's not.  Either way such a change should be posted
separately and reviewd on lkml.

Len, can you please revert "ACPI: basic initramfs DSDT override support"
aka commit 71fc47a9adf8ee89e5c96a47222915c5485ac437 until we've sorted
this out properly?  Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Scheduler(?) regression from 2.6.22 to 2.6.24 for short-lived threads

2008-02-09 Thread Olof Johansson

On Sun, Feb 10, 2008 at 07:15:58AM +0100, Willy Tarreau wrote:
> On Sat, Feb 09, 2008 at 11:29:41PM -0600, Olof Johansson wrote:
> > 40M:
> > 2.6.22  time 94315 ms
> > 2.6.23  time 107930 ms
> > 2.6.24  time 113291 ms
> > 2.6.24-git19time 110360 ms
> > 
> > So with more work per thread, the differences become less but they're
> > still there. At the 40M loop, with 500 threads it's quite a bit of
> > runtime per thread.
> 
> No, it's really nothing. I had to push the loop to 1 billion to make the load
> noticeable. You don't have 500 threads, you have 2 threads and that load is
> repeated 500 times. And if we look at the numbers, let's take the worst one :
> > 40M:
> > 2.6.24time 113291 ms
> 113291/500 = 227 microseconds/loop. This is still very low compared to the
> smallest timeslice you would have (1 ms at HZ=1000).
> 
> So your threads are still completing *before* the scheduler has to preempt
> them.

Hmm? I get that to be 227ms per loop, which is way more than a full
timeslice. Running the program took in the range of 2 minutes, so it's
11 milliseconds, not microseconds.

> > It seems generally unfortunate that it takes longer for a new thread to
> > move over to the second cpu even when the first is busy with the original
> > thread. I can certainly see cases where this causes suboptimal overall
> > system behaviour.
> 
> In fact, I don't think it takes longer, I think it does not do it at their
> creation, but will do it immediately after the first slice is consumed. This
> would explain the important differences here. I don't know how we could ensure
> that the new thread is created on the second CPU from the start, though.

The math doesn't add up for me. Even if it rebalanced at the end of
the first slice (i.e. after 1ms), that would be a 1ms penalty per
iteration. With 500 threads that'd be a total penalty of 500ms.

> I tried inserting a sched_yield() at the top of the busy loop (1M loops).
> By default, it did not change a thing. Then I simply set sched_compat_yield
> to 1, and the two threads then ran simultaneously with a stable low time
> (2700 ms instead of 10-12 seconds).
> 
> Doing so with 10k loops (initial test) shows times in the range 240-300 ms
> only instead of 2200-6500 ms.

Right, likely because the long-running cases got stuck at the busy loop
at the end, which would end up aborting quicker if the other thread got
scheduled for just a bit. It was a mistake to post that variant of the
testcase, it's not as relevant and doesn't mimic the original workload I
was trying to mimic as well as if the first loop was made larger.

> Ingo, would it be possible (and wise) to ensure that a new thread being
> created gets immediately rebalanced in order to emulate what is done here
> with sched_compat_yield=1 and sched_yield() in both threads just after the
> thread creation ? I don't expect any performance difference doing this,
> but maybe some shell scripts reliying on short-lived pipes would get faster
> on SMP.

There's always the tradeoff of losing cache warmth whenever a thread is
moved, so I'm not sure if it's a good idea to always migrate it at
creation time. It's not a simple problem, really.

> > I agree that the testcase is highly artificial. Unfortunately, it's
> > not uncommon to see these kind of weird testcases from customers tring
> > to evaluate new hardware. :( They tend to be pared-down versions of
> > whatever their real workload is (the real workload is doing things more
> > appropriately, but the smaller version is used for testing). I was lucky
> > enough to get source snippets to base a standalone reproduction case on
> > for this, normally we wouldn't even get copies of their binaries.
> 
> I'm well aware of that. What's important is to be able to explain what is
> causing the difference and why the test case does not represent anything
> related to performance. Maybe the code author wanted to get 500 parallel
> threads and got his code wrong ?

I believe it started out as a simple attempt to parallelize a workload
that sliced the problem too low, instead of slicing it in larger chunks
and have each thread do more work at a time. It did well on 2.6.22 with
almost a 2x speedup, but did worse than the single-treaded testcase on a
2.6.24 kernel.

So yes, it can clearly be handled through explanations and education
and fixen the broken testcase, but I'm still not sure the new behaviour
is desired.


-Olof
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: One minute delay when booting 2.6.24.1

2008-02-09 Thread Tvrtko A. Ursulin

On Saturday 09 February 2008 22:01:44 Jan Engelhardt wrote:
> On Feb 9 2008 13:29, Tvrtko A. Ursulin wrote:
> >Hi all,
> >
> >As the subject says I get ~1 minute delay when booting 2.6.24.1
> >pretty reliably. It is possible it is not new to 2.6.24.1 but I
> >can't tell due recent hardware changes.
> >
> >dmesg excerpt where it happens looks like this (full one attached):
>
> Do you really experience a 1 minute wait, or is this perhaps
> just the clock skipping?

It is a real delay.

Tvrtko
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel 2.6.24.1 still vulnerable to the vmsplice local root exploit

2008-02-09 Thread Niki Denev

On Feb 10, 2008 8:32 AM, Willy Tarreau <[EMAIL PROTECTED]> wrote:
> On Sun, Feb 10, 2008 at 08:04:35AM +0200, Niki Denev wrote:
> > Hi,
> >
> > As the subject says the 2.6.24.1 is still vulnerable to the vmsplice
> > local root exploit.
>
> Yes indeed, that's quite bad. 2.6.24-git is still vulnerable too, and
> also contains the fix :-(
>
> CC'd Jens as he worked on the fix.
>
> Willy
>
>

I was unable to gain root on 2.6.24-git20
but after several segfaults when executing the exploit continously
the machine crashes.

--Niki
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] ext4 update

2008-02-09 Thread Theodore Ts'o

Hi Linus,

Please pull from:

git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git for_linus

These are mostly bug fixes that we've found since the last pull request.
The one non-bugfix change is that I've added a sanity check to assure
that production ext3 filesystems don't get mounted with ext4dev
accidentally.  The need for this was discovered when Eric Sandeen
started putting ext4 into Fedora's Rawhide release for initial testing.

Thanks,

- Ted

Aneesh Kumar K.V (5):
  jbd2: Fix reference counting on the journal commit block's buffer head
  JBD2: Use the incompat macro for testing the incompat feature.
  ext4: Fix null bh pointer dereference in mballoc
  ext4: Fix circular locking dependency with migrate and rm.
  ext4: Don't panic in case of corrupt bitmap

Dave Kleikamp (1):
  JBD2:  Clear buffer_ordered flag for barried IO request on success

Eric Sandeen (2):
  allow in-inode EAs on ext4 root inode
  ext4: allocate struct ext4_allocation_context from a kmem cache

Jan Kara (2):
  jbd: Remove useless loop when writing commit record
  ext4: Fix Direct I/O locking

Mingming Cao (1):
  jbd2: Add error check to journal_wait_on_commit_record to avoid oops

Theodore Tso (1):
  ext4: Add new "development flag" to the ext4 filesystem

Valerie Clement (1):
  ext4: Don't set EXTENTS_FL flag for fast symlinks

 fs/ext4/inode.c |  115 +++-
 fs/ext4/mballoc.c   |  164 ++-
 fs/ext4/migrate.c   |  123 +++
 fs/ext4/namei.c |1 +
 fs/ext4/super.c |   11 +++
 fs/jbd/commit.c |   14 ++--
 fs/jbd2/commit.c|   10 ++-
 fs/jbd2/recovery.c  |2 +-
 include/linux/ext4_fs.h |7 ++
 9 files changed, 270 insertions(+), 177 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel 2.6.24.1 still vulnerable to the vmsplice local root exploit

2008-02-09 Thread Willy Tarreau

On Sun, Feb 10, 2008 at 08:04:35AM +0200, Niki Denev wrote:
> Hi,
> 
> As the subject says the 2.6.24.1 is still vulnerable to the vmsplice
> local root exploit.

Yes indeed, that's quite bad. 2.6.24-git is still vulnerable too, and
also contains the fix :-(

CC'd Jens as he worked on the fix.

Willy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Change pci_raw_ops to pci_raw_read/write

2008-02-09 Thread Yinghai Lu

On Feb 9, 2008 4:41 AM, Matthew Wilcox <[EMAIL PROTECTED]> wrote:
> On Thu, Feb 07, 2008 at 10:54:05AM -0500, Tony Camuso wrote:
> > Matthew,
> >
> > Perhaps I missed it, but did you address Yinghai's concerns?
>
> No, I was on holiday.
>
> > Yinghai Lu wrote:
> > >On Jan 28, 2008 7:03 PM, Matthew Wilcox <[EMAIL PROTECTED]> wrote:
> > >>
> > >>-int pci_conf1_write(unsigned int seg, unsigned int bus,
> > >>+static int pci_conf1_write(unsigned int seg, unsigned int bus,
> > >>   unsigned int devfn, int reg, int len, u32
> > >>   value)
> > >
> > >any reason to change pci_conf1_read/write to static?
>
> Yes -- it no longer needs to be called from outside this file.
>
> > >>+config ATA_RAM
> > >>+   tristate "ATA RAM driver"
> > >>+
> > >
> > >related?
>

looks good. it should get into -mm or x86/mm for some testing

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] USB: mark USB drivers as being GPL only

2008-02-09 Thread Daniel Hazelton

On Sunday 10 February 2008 00:43:49 Marcel Holtmann wrote:
> Hi Daniel,
>
> > > > > It makes no difference if you
> > > > > distribute the GPL library with it or not.
> > > >
> > > > If you do not distribute the GPL library, the library is simply being
> > > > used in the intended, ordinary way. You do not need to agree to, nor
> > > > can you violate, the GPL simply by using a work in its ordinary
> > > > intended way.
> > > >
> > > > If the application contains insufficient copyrightable expression
> > > > from the library to be considered a derivative work (and purely
> > > > functional things do not count), then it cannot be a derivative work.
> > > > The library is not being copied or distributed. So how can its
> > > > copyright be infringed?
> > >
> > > go ahead and create an application that uses a GPL only library. Then
> > > ask a lawyer if it is okay to distribute your application in binary
> > > only form without making the source code available (according to the
> > > GPL).
> > >
> > > http://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.html#IfLibraryIsGP
> > >L
> > >
> > > http://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.html#LinkingWithGP
> > >L
> >
> > In the US, at least, the belief that "Linking", in *ANY* form, with a GPL
> > library creates a derivative work, is fallacious.
>
> that is how FSF states it and it seems that most legal departments of
> big companies (US and EU based) are not taking any risk on this. So it
> seems that someone actually has to prove in court that these assumptions
> for the GPL case are wrong.

The FSF is making a claim that can be traced back to the beliefs of one 
person - RMS - and that propagate their views. As I stated in the original, 
this is not just my opinion, but that of two different lawyers I've spoken to 
and also the stated belief of numerous people on LKML. 

The fact is that the GPL only affects a "derivative work" in a viral manner. 
Merely using a GPL'd libraries API is not enough to make a program 
a "derivative work". 

> > Were I to create an
> > application that uses, say, GTK for the interface the protected
> > expression is my "unique and creative" use of the GTK API for creating
> > the specific interface and any other code I have written using the API. I
> > hold sole license to the copyright on that code and am able to license
> > said code under the specific license of my choice.
>
> Not even getting into this one since GTK+ is a LGPL based library. Get
> your examples straight.

And the LGPL was created because of the FSF propagated belief that using a 
GPL'd library means your application is automatically a "derivative work" and 
hence must be released under the GPL. So the LGPL was created with 
the "automatic" 'linking' exemption. It is not necessary and never has been.

This is why, even if the FSF claims what I've said above (that linking code 
with the GPL doesn't propagate the GPL into the non-GPL code) most companies 
won't risk it... Because the FSF has taken actions that are the exact 
opposite of their words.

> > Why? Because the pre-processor is what is including any GPL'd code in my
> > application and expanding any macros. That is a purely mechanical process
> > and hence the output is not able to be separately copyrighted - if it
> > could be, then the copyright would be held by the *COMPILER*, and I am
> > *NOT* bound by the license on that code. The same applies if GPL'd code
> > is included in my application during the linking process. QED: The
> > "Linking" argument used by most people is wholly fallacious in at least
> > one major country - and if I'm not mistaken, the output from an automated
> > process is similarly not considered as carrying a separate copyright in
> > all nations that are signatories of or follow the Bern Convention.
>
> The GPL is a license. Nobody is talking about the copyright of your code
> here. You always have the copyright on your code. The point is that you
> have to license your code under GPL (when using a GPL library) and you
> are distributing your code.

Yes, It is "my" code and "my" copyright. However, by the absolutely *common* 
belief that "linking to GPL libraries makes a program a derivative work" it 
would mean that I no longer have the freedom to license my code under the 
license of my choosing, because the *mechanical* process of linking has 
caused the GPL's "viral" clause to spread to cover my code.

And you're absolutely wrong. It doesn't matter that the library is GPL'd at 
all. My code *cannot*, under any circumstances, be affected by the GPL 
license on the library. Because the libraries API *cannot* be copyrighted and 
any GPL'd code which winds up in the final binary got there via a "mechanical 
process" and doesn't affect my right to release the code under a license of 
my choosing.

Any other belief is fallacious. Claiming otherwise would mean that any program 
that uses any library on a windows system makes an application a derivative 
work of that

/bin/sh: -c: line 0: syntax error near unexpected token `;'

2008-02-09 Thread Mr. James W. Laferriere

	Hello All ,  In a recent pull of linus's tree (*) today @ 2008-02-10 
02:49 UTC  ,  Using a previously well behaving .config I now get ...

Tia ,  JimL


make -f scripts/Makefile.clean obj=sound/usb/usx2y
make -f scripts/Makefile.clean obj=usr
  rm -rf .tmp_versions
  rm -f arch/x86/boot/fdimage arch/x86/boot/image.iso arch/x86/boot/mtools.conf 
vmlinux System.map .tmp_kallsyms* .tmp_version .tmp_vmlinux* .tmp_System.map
rm -f include/config/kernel.release
echo 2.6.24 > include/config/kernel.release
set -e; ; mkdir -p include/linux/;  (echo \#define LINUX_VERSION_CODE 132632; echo '#define 
KERNEL_VERSION(a,b,c) (((a) << 16) + ((b) << 8) + (c))';) < 
/usr/src/linux-2.6.25-git-20080209/Makefile > include/linux/version.h.tmp; if [ -r 
include/linux/version.h ] && cmp -s include/linux/version.h include/linux/version.h.tmp; then rm 
-f include/linux/version.h.tmp; else ; mv -f include/linux/version.h.tmp include/linux/version.h; fi
/bin/sh: -c: line 0: syntax error near unexpected token `;'
/bin/sh: -c: line 0: `set -e; ; mkdir -p include/linux/;(echo \#define 
LINUX_VERSION_CODE 132632; echo '#define KERNEL_VERSION(a,b,c) (((a) << 16) + ((b) << 8) + (c))';) < /usr/src/linux-2.6.25-git-20080209/Makefile > include/linux/version.h.tmp; if [ -r include/linux/version.h ] && cmp -s include/linux/version.h include/linux/version.h.tmp; then rm -f include/linux/version.h.tmp; else ; mv -f include/linux/version.h.tmp include/linux/version.h; fi'

make: *** [include/linux/version.h] Error 2


(*)git clone 
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git

--
+--+
| James   W.   Laferriere | SystemTechniques | Give me VMS |
| Network Engineer | 2133McCullam Ave |  Give me Linux  |
| [EMAIL PROTECTED] | Fairbanks, AK. 99701 |   only  on  AXP |
+--+
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Scheduler(?) regression from 2.6.22 to 2.6.24 for short-lived threads

2008-02-09 Thread Willy Tarreau

On Sat, Feb 09, 2008 at 11:29:41PM -0600, Olof Johansson wrote:
> On Sat, Feb 09, 2008 at 05:19:57PM +0100, Willy Tarreau wrote:
> > On Sat, Feb 09, 2008 at 02:37:39PM +0100, Mike Galbraith wrote:
> > > 
> > > On Sat, 2008-02-09 at 12:40 +0100, Willy Tarreau wrote: 
> > > > On Sat, Feb 09, 2008 at 11:58:25AM +0100, Mike Galbraith wrote:
> > > > > 
> > > > > On Sat, 2008-02-09 at 09:03 +0100, Willy Tarreau wrote:
> > > > > 
> > > > > > How many CPUs do you have ?
> > > > > 
> > > > > It's a P4/HT, so 1 plus $CHUMP_CHANGE_MAYBE
> > > > > 
> > > > > > > 2.6.25-smp (git today)
> > > > > > > time 29 ms
> > > > > > > time 61 ms
> > > > > > > time 72 ms
> > > > > > 
> > > > > > These ones look rather strange. What type of workload is it ? Can 
> > > > > > you
> > > > > > publish the program for others to test it ?
> > > > > 
> > > > > It's the proglet posted in this thread.
> > > > 
> > > > OK sorry, I did not notice it when I first read the report.
> > > 
> > > Hm.  The 2.6.25-smp kernel is the only one that looks like it's doing
> > > what proggy wants to do, massive context switching.  Bump threads to
> > > larger number so you can watch: the supposedly good kernel (22) is doing
> > > everything on one CPU.  Everybody else sucks differently (idleness), and
> > > the clear throughput winner, via mad over-schedule (!?!), is git today.
> > 
> > For me, 2.6.25-smp gives pretty irregular results :
> > 
> > time 6548 ms
> > time 7272 ms
> > time 1188 ms
> > time 3772 ms
> > 
> > The CPU usage is quite irregular too and never goes beyond 50% (this is a
> > dual-athlon). If I start two of these processes, 100% of the CPU is used,
> > the context switch rate is more regular (about 700/s) and the total time
> > is more regular too (between 14.8 and 18.5 seconds).
> > 
> > Increasing the parallel run time of the two threads by changing the upper
> > limit of the for(j) loop correctly saturates both processors. I think that
> > this program simply does not have enough work to do for each thread to run
> > for a full timeslice, thus showing a random behaviour.
> 
> Right. I should have tinkered a bit more with it before I posted it, the
> version posted had too little going on in the first loop and thus got
> hung up on the second busywait loop instead.
> 
> I did a bunch of runs with various loop sizes. Basically, what seems to
> happen is that the older kernels are quicker at rebalancing a new thread
> over to the other cpu, while newer kernels let them share the same cpu
> longer (and thus increases wall clock runtime).
> 
> All of these are built with gcc without optimization, larger loop size
> and an added sched_yield() in the busy-wait loop at the end to take that
> out as a factor. As you've seen yourself, runtimes can be quite noisy
> but the trends are quite clear anyway. All of these numbers were
> collected with default scheduler runtime options, same kernels and
> configs as previously posted.
> 
> Loop to 1M:
> 2.6.22time 4015 ms
> 2.6.23time 4581 ms
> 2.6.24time 10765 ms
> 2.6.24-git19  time 8286 ms
> 
> 2M:
> 2.6.22time 7574 ms
> 2.6.23time 9031 ms
> 2.6.24time 12844 ms
> 2.6.24-git19  time 10959 ms
> 
> 3M:
> 2.6.22time 8015 ms
> 2.6.23time 13053 ms
> 2.6.24time 16204 ms
> 2.6.24-git19  time 14984 ms
> 
> 4M:
> 2.6.22time 10045 ms
> 2.6.23time 16642 ms
> 2.6.24time 16910 ms
> 2.6.24-git19  time 16468 ms
> 
> 5M:
> 2.6.22time 12055 ms
> 2.6.23time 21024 ms
> 
> 2.6.24-git19  time 16040 ms
> 
> 10M:
> 2.6.22time 24030 ms
> 2.6.23time 33082 ms
> 2.6.24time 34139 ms
> 2.6.24-git19  time 33724 ms
> 
> 20M:
> 2.6.22time 50015 ms
> 2.6.23time 63963 ms
> 2.6.24time 65100 ms
> 2.6.24-git19  time 63092 ms
> 
> 40M:
> 2.6.22time 94315 ms
> 2.6.23time 107930 ms
> 2.6.24time 113291 ms
> 2.6.24-git19  time 110360 ms
> 
> So with more work per thread, the differences become less but they're
> still there. At the 40M loop, with 500 threads it's quite a bit of
> runtime per thread.

No, it's really nothing. I had to push the loop to 1 billion to make the load
noticeable. You don't have 500 threads, you have 2 threads and that load is
repeated 500 times. And if we look at the numbers, let's take the worst one :
> 40M:
> 2.6.24time 113291 ms
113291/500 = 227 microseconds/loop. This is still very low compared to the
smallest timeslice you would have (1 ms at HZ=1000).

So your threads are still completing *before* the scheduler has to preempt
them.

> > However, I fail to understand the goal of the reproducer. Granted it shows
> > irregularities in the scheduler under such conditions, but what *real*
> > workload would spend its time sequentially

kernel 2.6.24.1 still vulnerable to the vmsplice local root exploit

2008-02-09 Thread Niki Denev

Hi,

As the subject says the 2.6.24.1 is still vulnerable to the vmsplice
local root exploit.

[EMAIL PROTECTED] tmp]$ uname -a
Linux tester 2.6.24.1 #1 Sun Feb 10 00:06:49 EST 2008 i686 unknown
[EMAIL PROTECTED] tmp]$ ./vms

---
 Linux vmsplice Local Root Exploit
 By qaaz
---
[+] mmap: 0x0 .. 0x1000
[+] page: 0x0
[+] page: 0x20
[+] mmap: 0x4000 .. 0x5000
[+] page: 0x4000
[+] page: 0x4020
[+] mmap: 0x1000 .. 0x2000
[+] page: 0x1000
[+] mmap: 0xb7f56000 .. 0xb7f88000
[+] root
[EMAIL PROTECTED] tmp]#
[EMAIL PROTECTED] tmp]# id
uid=0(root) gid=0(root) groups=2033(opa)
[EMAIL PROTECTED] tmp]# uname -a
Linux test 2.6.24.1 #1 Sun Feb 10 00:06:49 EST 2008 i686 unknown

Is there any known fix/patch for this?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.24-mm1] Mempolicy: silently restrict nodemask to allowed nodes V3

2008-02-09 Thread Greg KH

On Sun, Feb 10, 2008 at 02:29:24PM +0900, KOSAKI Motohiro wrote:
> CC'd Greg KH <[EMAIL PROTECTED]>
> 
> I tested this patch on fujitsu memoryless node.
> (2.6.24 + silently-restrict-nodemask-to-allowed-nodes-V3 insted 2.6.24-mm1)
> it seems works good.
> 
> Tested-by: KOSAKI Motohiro <[EMAIL PROTECTED]>
> 
> 
> Greg, I hope this patch merge to 2.6.24.x stable tree because
> this patch is regression fixed patch.
> Please tell me what do i doing for it.

Once the patch goes into Linus's tree, feel free to send it to the
[EMAIL PROTECTED] address so that we can include it in the 2.6.24.x
tree.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] USB: mark USB drivers as being GPL only

2008-02-09 Thread Marcel Holtmann

Hi Daniel,

> > > > It makes no difference if you
> > > > distribute the GPL library with it or not.
> > >
> > > If you do not distribute the GPL library, the library is simply being
> > > used in the intended, ordinary way. You do not need to agree to, nor can
> > > you violate, the GPL simply by using a work in its ordinary intended way.
> > >
> > > If the application contains insufficient copyrightable expression from
> > > the library to be considered a derivative work (and purely functional
> > > things do not count), then it cannot be a derivative work. The library is
> > > not being copied or distributed. So how can its copyright be infringed?
> >
> > go ahead and create an application that uses a GPL only library. Then
> > ask a lawyer if it is okay to distribute your application in binary only
> > form without making the source code available (according to the GPL).
> >
> > http://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.html#IfLibraryIsGPL
> >
> > http://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.html#LinkingWithGPL
>
> In the US, at least, the belief that "Linking", in *ANY* form, with a GPL 
> library creates a derivative work, is fallacious.

that is how FSF states it and it seems that most legal departments of
big companies (US and EU based) are not taking any risk on this. So it
seems that someone actually has to prove in court that these assumptions
for the GPL case are wrong.

> Were I to create an 
> application that uses, say, GTK for the interface the protected expression is 
> my "unique and creative" use of the GTK API for creating the specific 
> interface and any other code I have written using the API. I hold sole 
> license to the copyright on that code and am able to license said code under 
> the specific license of my choice.

Not even getting into this one since GTK+ is a LGPL based library. Get
your examples straight.

> Why? Because the pre-processor is what is including any GPL'd code in my 
> application and expanding any macros. That is a purely mechanical process and 
> hence the output is not able to be separately copyrighted - if it could be, 
> then the copyright would be held by the *COMPILER*, and I am *NOT* bound by 
> the license on that code. The same applies if GPL'd code is included in my 
> application during the linking process. QED: The "Linking" argument used by 
> most people is wholly fallacious in at least one major country - and if I'm 
> not mistaken, the output from an automated process is similarly not 
> considered as carrying a separate copyright in all nations that are 
> signatories of or follow the Bern Convention.

The GPL is a license. Nobody is talking about the copyright of your code
here. You always have the copyright on your code. The point is that you
have to license your code under GPL (when using a GPL library) and you
are distributing your code.

Regards

Marcel


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.24-mm1] Mempolicy: silently restrict nodemask to allowed nodes V3

2008-02-09 Thread KOSAKI Motohiro

CC'd Greg KH <[EMAIL PROTECTED]>

I tested this patch on fujitsu memoryless node.
(2.6.24 + silently-restrict-nodemask-to-allowed-nodes-V3 insted 2.6.24-mm1)
it seems works good.

Tested-by: KOSAKI Motohiro <[EMAIL PROTECTED]>


Greg, I hope this patch merge to 2.6.24.x stable tree because
this patch is regression fixed patch.
Please tell me what do i doing for it.


[intentional full quote]

> Was "Re: [2.6.24 regression][BUGFIX] numactl --interleave=all doesn't
> works on memoryless node."
> 
> [Aside:  I noticed there were two slightly different distributions for
> this topic.  I've unified the distribution lists w/o dropping anyone, I
> think.  Apologies if you'd rather have been dropped...]
> 
> Here's V3 of the patch, accomodating Kosaki Motohiro's suggestion for
> folding contextualize_policy() into mpol_check_policy() [because my
> "was_empty" argument "was ugly" ;-)].  It does seem to clean up the
> code.
> 
> I'm still deferring David Rientjes' suggestion to fold
> mpol_check_policy() into mpol_new().  We need to sort out whether
> mempolicies specified for tmpfs and hugetlbfs mounts always need the
> same "contextualization" as user/application installed policies.  I
> don't want to hold up this bug fix for that discussion.  This is
> something Paul J will need to address with his cpuset/mempolicy rework,
> so we can sort it out in that context.
> 
> Again, tested with "numactl --interleave=all" and memtoy on ia64 using
> mem= command line argument to simulate memoryless node.
> 
> 
> Lee
> 
> 
> [PATCH] 2.6.24-mm1 - mempolicy:  silently restrict nodemask to allowed nodes
> 
> V2 -> V3:
> + As suggested by Kosaki Motohito, fold the "contextualization"
>   of policy nodemask into mpol_check_policy().  Looks a little
>   cleaner. 
> 
> V1 -> V2:
> + Communicate whether or not incoming node mask was empty to
>   mpol_check_policy() for better error checking.
> + As suggested by David Rientjes, remove the now unused
>cpuset_nodes_subset_current_mems_allowed() from cpuset.h
> 
> Kosaki Motohito noted that "numactl --interleave=all ..." failed in the
> presence of memoryless nodes.  This patch attempts to fix that problem.
> 
> Some background:  
> 
> numactl --interleave=all calls set_mempolicy(2) with a fully
> populated [out to MAXNUMNODES] nodemask.  set_mempolicy()
> [in do_set_mempolicy()] calls contextualize_policy() which
> requires that the nodemask be a subset of the current task's
> mems_allowed; else EINVAL will be returned.  A task's
> mems_allowed will always be a subset of node_states[N_HIGH_MEMORY]--
> i.e., nodes with memory.  So, a fully populated nodemask will
> be declared invalid if it includes memoryless nodes.
> 
>   NOTE:  the same thing will occur when running in a cpuset
>  with restricted mem_allowed--for the same reason:
>  node mask contains dis-allowed nodes.
> 
> mbind(2), on the other hand, just masks off any nodes in the 
> nodemask that are not included in the caller's mems_allowed.
> 
> In each case [mbind() and set_mempolicy()], mpol_check_policy()
> will complain [again, resulting in EINVAL] if the nodemask contains 
> any memoryless nodes.  This is somewhat redundant as mpol_new() 
> will remove memoryless nodes for interleave policy, as will 
> bind_zonelist()--called by mpol_new() for BIND policy.
> 
> Proposed fix:
> 
> 1) modify contextualize_policy logic to:
>a) remember whether the incoming node mask is empty.
>b) if not, restrict the nodemask to allowed nodes, as is
>   currently done in-line for mbind().  This guarantees
>   that the resulting mask includes only nodes with memory.
> 
>   NOTE:  this is a [benign, IMO] change in behavior for
>  set_mempolicy().  Dis-allowed nodes will be
>  silently ignored, rather than returning an error.
> 
>c) fold this code into mpol_check_policy(), replace 2 calls to
>   contextualize_policy() to call mpol_check_policy() directly
>   and remove contextualize_policy().
> 
> 2) In existing mpol_check_policy() logic, after "contextualization":
>a) MPOL_DEFAULT:  require that in coming mask "was_empty"
>b) MPOL_{BIND|INTERLEAVE}:  require that contextualized nodemask
>   contains at least one node.
>c) add a case for MPOL_PREFERRED:  if in coming was not empty
>   and resulting mask IS empty, user specified invalid nodes.
>   Return EINVAL.
>c) remove the now redundant check for memoryless nodes
> 
> 3) remove the now redundant masking of policy nodes for interleave
>policy from mpol_new().
> 
> 4) Now that mpol_check_policy() contextualizes the nodemask, remove
>the in-line nodes_and() from sys_mbind().  I believe that this
>restores mbind() to the behavior before the memoryless-nodes
>patch series.  E.g., we'll no longer treat an invalid nodemask
>with MPOL_PREFERRED as local allocation.
> 
> Signed-off-by:  Lee Schermerhorn <[EMAIL PROTECTED]>
> 
>

Re: [git pull] kgdb light

2008-02-09 Thread Christoph Hellwig

On Sat, Feb 09, 2008 at 09:42:59PM +0100, Ingo Molnar wrote:
> Linus,
> 
> while this is probably one of the last days of the merge window, please 
> still consider pulling the "kgdb light" git tree from:
> 
>git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-kgdb.git

Without posting patches for review first?  You must be kidding.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Scheduler(?) regression from 2.6.22 to 2.6.24 for short-lived threads

2008-02-09 Thread Olof Johansson

On Sat, Feb 09, 2008 at 05:19:57PM +0100, Willy Tarreau wrote:
> On Sat, Feb 09, 2008 at 02:37:39PM +0100, Mike Galbraith wrote:
> > 
> > On Sat, 2008-02-09 at 12:40 +0100, Willy Tarreau wrote: 
> > > On Sat, Feb 09, 2008 at 11:58:25AM +0100, Mike Galbraith wrote:
> > > > 
> > > > On Sat, 2008-02-09 at 09:03 +0100, Willy Tarreau wrote:
> > > > 
> > > > > How many CPUs do you have ?
> > > > 
> > > > It's a P4/HT, so 1 plus $CHUMP_CHANGE_MAYBE
> > > > 
> > > > > > 2.6.25-smp (git today)
> > > > > > time 29 ms
> > > > > > time 61 ms
> > > > > > time 72 ms
> > > > > 
> > > > > These ones look rather strange. What type of workload is it ? Can you
> > > > > publish the program for others to test it ?
> > > > 
> > > > It's the proglet posted in this thread.
> > > 
> > > OK sorry, I did not notice it when I first read the report.
> > 
> > Hm.  The 2.6.25-smp kernel is the only one that looks like it's doing
> > what proggy wants to do, massive context switching.  Bump threads to
> > larger number so you can watch: the supposedly good kernel (22) is doing
> > everything on one CPU.  Everybody else sucks differently (idleness), and
> > the clear throughput winner, via mad over-schedule (!?!), is git today.
> 
> For me, 2.6.25-smp gives pretty irregular results :
> 
> time 6548 ms
> time 7272 ms
> time 1188 ms
> time 3772 ms
> 
> The CPU usage is quite irregular too and never goes beyond 50% (this is a
> dual-athlon). If I start two of these processes, 100% of the CPU is used,
> the context switch rate is more regular (about 700/s) and the total time
> is more regular too (between 14.8 and 18.5 seconds).
> 
> Increasing the parallel run time of the two threads by changing the upper
> limit of the for(j) loop correctly saturates both processors. I think that
> this program simply does not have enough work to do for each thread to run
> for a full timeslice, thus showing a random behaviour.

Right. I should have tinkered a bit more with it before I posted it, the
version posted had too little going on in the first loop and thus got
hung up on the second busywait loop instead.

I did a bunch of runs with various loop sizes. Basically, what seems to
happen is that the older kernels are quicker at rebalancing a new thread
over to the other cpu, while newer kernels let them share the same cpu
longer (and thus increases wall clock runtime).

All of these are built with gcc without optimization, larger loop size
and an added sched_yield() in the busy-wait loop at the end to take that
out as a factor. As you've seen yourself, runtimes can be quite noisy
but the trends are quite clear anyway. All of these numbers were
collected with default scheduler runtime options, same kernels and
configs as previously posted.

Loop to 1M:
2.6.22  time 4015 ms
2.6.23  time 4581 ms
2.6.24  time 10765 ms
2.6.24-git19time 8286 ms

2M:
2.6.22  time 7574 ms
2.6.23  time 9031 ms
2.6.24  time 12844 ms
2.6.24-git19time 10959 ms

3M:
2.6.22  time 8015 ms
2.6.23  time 13053 ms
2.6.24  time 16204 ms
2.6.24-git19time 14984 ms

4M:
2.6.22  time 10045 ms
2.6.23  time 16642 ms
2.6.24  time 16910 ms
2.6.24-git19time 16468 ms

5M:
2.6.22  time 12055 ms
2.6.23  time 21024 ms

2.6.24-git19time 16040 ms

10M:
2.6.22  time 24030 ms
2.6.23  time 33082 ms
2.6.24  time 34139 ms
2.6.24-git19time 33724 ms

20M:
2.6.22  time 50015 ms
2.6.23  time 63963 ms
2.6.24  time 65100 ms
2.6.24-git19time 63092 ms

40M:
2.6.22  time 94315 ms
2.6.23  time 107930 ms
2.6.24  time 113291 ms
2.6.24-git19time 110360 ms

So with more work per thread, the differences become less but they're
still there. At the 40M loop, with 500 threads it's quite a bit of
runtime per thread.

> However, I fail to understand the goal of the reproducer. Granted it shows
> irregularities in the scheduler under such conditions, but what *real*
> workload would spend its time sequentially creating then immediately killing
> threads, never using more than 2 at a time ?
>
> If this could be turned into a DoS, I could understand, but here it looks
> a bit pointless :-/

It seems generally unfortunate that it takes longer for a new thread to
move over to the second cpu even when the first is busy with the original
thread. I can certainly see cases where this causes suboptimal overall
system behaviour.

I agree that the testcase is highly artificial. Unfortunately, it's
not uncommon to see these kind of weird testcases from customers tring
to evaluate new hardware. :( They tend to be pared-down versions of
whatever their real workload is (the real workload is doing things more
appropriately, but the smaller version is used for testing). I was lucky
enough to get source snippets to base a standalone reproduction case on
for this, normally we wouldn't even get copies of their binaries.

-Olof
--
To

Re: [PATCH] [resend] 3c509: convert to isa_driver and pnp_driver v4

2008-02-09 Thread Christoph Hellwig

On Sun, Feb 10, 2008 at 01:10:07AM +0100, Ondrej Zary wrote:
> > > +typedef enum { EL3_ISA, EL3_PNP, EL3_MCA, EL3_EISA } el3_cardtype;
> > > +
> >
> > No typedef please (see checkpatch)
> 
> Is there any standard way to solve this without a typedef? I added 
> el3_dev_fill() function which fills that card type value according to a 
> parameter passed to it. "int" could be used instead and "#define EL3_ISA 
> 0", "#define EL3_PNP 1" - but I think that's ugly.

enum el3_cardtype {
EL3_ISA,
EL3_PNP,
EL3_MCA,
EL3_EISA,
};

> > >  struct el3_private {
> > >   struct net_device_stats stats;
> >
> > Use network device stats in net_device now
> 
> OK, looks like the driver will need some more patches.

While I agree with Stephens comment that this driver should be using
the stats in net_device that's totally out of scope for this patch.
As you're the defacto maintainer of this driver now it would be nice
if you could submit another one for it.

> > > - struct net_device *next_dev;
> > >   spinlock_t lock;
> > >   /* skb send-queue */
> > >   int head, size;
> > >   struct sk_buff *queue[SKB_QUEUE_SIZE];
> >
> > What about sk_buff_head (linked list instead)?
> 
> I don't know anything about this, maybe in next patch.

Yes, separate patch please.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] USB: mark USB drivers as being GPL only

2008-02-09 Thread Daniel Hazelton

On Saturday 09 February 2008 23:50:17 Marcel Holtmann wrote:

> > > It makes no difference if you
> > > distribute the GPL library with it or not.
> >
> > If you do not distribute the GPL library, the library is simply being
> > used in the intended, ordinary way. You do not need to agree to, nor can
> > you violate, the GPL simply by using a work in its ordinary intended way.
> >
> > If the application contains insufficient copyrightable expression from
> > the library to be considered a derivative work (and purely functional
> > things do not count), then it cannot be a derivative work. The library is
> > not being copied or distributed. So how can its copyright be infringed?
>
> go ahead and create an application that uses a GPL only library. Then
> ask a lawyer if it is okay to distribute your application in binary only
> form without making the source code available (according to the GPL).
>
> http://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.html#IfLibraryIsGPL
>
> http://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.html#LinkingWithGPL
>
> Regards
>
> Marcel

In the US, at least, the belief that "Linking", in *ANY* form, with a GPL 
library creates a derivative work, is fallacious. Were I to create an 
application that uses, say, GTK for the interface the protected expression is 
my "unique and creative" use of the GTK API for creating the specific 
interface and any other code I have written using the API. I hold sole 
license to the copyright on that code and am able to license said code under 
the specific license of my choice.

Why? Because the pre-processor is what is including any GPL'd code in my 
application and expanding any macros. That is a purely mechanical process and 
hence the output is not able to be separately copyrighted - if it could be, 
then the copyright would be held by the *COMPILER*, and I am *NOT* bound by 
the license on that code. The same applies if GPL'd code is included in my 
application during the linking process. QED: The "Linking" argument used by 
most people is wholly fallacious in at least one major country - and if I'm 
not mistaken, the output from an automated process is similarly not 
considered as carrying a separate copyright in all nations that are 
signatories of or follow the Bern Convention.

(And yes, this also applies to some GPL'd tools that RMS extended "GPL 
Exemptions" to - such as "Bison". There is, generally, no need for such an 
exemption, because  the process by which the GPL'd code is included in the 
final binary is wholly mechanical.)

DRH
PS: The above information is a very condensed form of the result of several 
past conversations on this list about copyright law and the GPL as well as my 
own, private discussions with lawyers. I'm being lazy here and not searching 
various archives of LKML to give pointers to the past discussions.

-- 
Dialup is like pissing through a pipette. Slow and excruciatingly painful.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] ipvs: Cleanup sync daemon code

2008-02-09 Thread Christoph Hellwig

On Sun, Feb 10, 2008 at 12:38:11AM +0100, Sven Wegener wrote:
>  struct ip_vs_sync_thread_data {
> - struct completion *startup;
> + struct completion *startup; /* set to NULL once completed */

This is not needed anmore.  kthread_run guarantees that the newly
creates thread is run before returning to the caller.

> +/* wait queue for master sync daemon */
> +static DECLARE_WAIT_QUEUE_HEAD(sync_master_wait);

I don't think you need this one either.  You can use wake_up_process
on the task_struct pointer instead.

>   spin_lock(_vs_sync_lock);
>   list_add_tail(>list, _vs_sync_queue);
> + if (++ip_vs_sync_count == 10)
> + wake_up_interruptible(_master_wait);
>   spin_unlock(_vs_sync_lock);
>  }

> -static int sync_thread(void *startup)
> +static int sync_thread(void *data)

Btw, it might make sense to remove sync_thread and just call the
master and backup threads directly.
> +void __init ip_vs_sync_init(void)
> +{
> + /* set up multicast address */
> + mcast_addr.sin_family = AF_INET;
> + mcast_addr.sin_port = htons(IP_VS_SYNC_PORT);
> + mcast_addr.sin_addr.s_addr = htonl(IP_VS_SYNC_GROUP);
>  }

Why can't this be initialized at compile time by:

static struct sockaddr_in mcast_addr = {
.sin_family = AF_INET,
.sin_port   = htons(IP_VS_SYNC_PORT),
.sin_addr.s_addr= htonl(IP_VS_SYNC_GROUP),
}

(the hton* might need __constant_hton* also I'm not sure without trying)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH] USB: mark USB drivers as being GPL only

2008-02-09 Thread Marcel Holtmann

Hi David,

> > Lets phrase this in better words as Valdis pointed out: You can't
> > distribute an application (binary or source form) under anything else
> > than GPL if it uses a GPL library.
> 
> This simply cannot be correct. The only way it could be true is if the work
> was a derivative work of a GPL'd work. There is no other way it could become
> subject to the GPL.
> 
> So this argument reduces to -- any work that uses a library is a derivative
> work of that library. But this is clearly wrong. For work X to be a
> derivative work of work Y, it must contain substantial protected expression
> from work Y, but an application need not have any expression from the
> libraries it uses.
> 
> > It makes no difference if you
> > distribute the GPL library with it or not.
> 
> If you do not distribute the GPL library, the library is simply being used
> in the intended, ordinary way. You do not need to agree to, nor can you
> violate, the GPL simply by using a work in its ordinary intended way.
> 
> If the application contains insufficient copyrightable expression from the
> library to be considered a derivative work (and purely functional things do
> not count), then it cannot be a derivative work. The library is not being
> copied or distributed. So how can its copyright be infringed?

go ahead and create an application that uses a GPL only library. Then
ask a lawyer if it is okay to distribute your application in binary only
form without making the source code available (according to the GPL).

http://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.html#IfLibraryIsGPL

http://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.html#LinkingWithGPL

Regards

Marcel


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git pull] CPU isolation extensions (updated)

2008-02-09 Thread Max Krasnyansky

Paul Jackson wrote:
> Max wrote:
>> Linus, please pull CPU isolation extensions from
> 
> Did I miss something in this discussion?  I thought
> Ingo was quite clear, and Linus pretty clear too,
> that this patch should bake in *-mm or some such
> place for a bit first.
> 

Andrew said:
> The feature as a whole seems useful, and I don't actually oppose the merge
> based on what I see here.  As long as you're really sure that cpusets are
> inappropriate (and bear in mind that Paul has a track record of being wrong
> on this :)).  But I see a few glitches 

As far as I can understand Andrew is ok with the merge. And I addressed all 
his comments.

Linus said:
> Have these been in -mm and widely discussed etc? I'd like to start more 
> carefully, and (a) have that controversial last patch not merged initially 
> and (b) make sure everybody is on the same page wrt this all..

As far as I can understand Linus _asked_ whether it was in -mm or not and 
whether
everybody's on the same page. He did not say "this must be in -mm first".
I explained that it has not been in -mm, and who it was discussed with, and did 
a 
bunch more testing/investigation on the controversial patch and explained why I 
think 
it's not that controversial any more.

Ingo said a few different things (a bit too large to quote). 
- That it was not discussed. I explained that it was in fact discussed and 
provided
a bunch of pointers to the mail threads.
- That he thinks that cpuset is the way to do it. Again I explained why it's 
not.
And at the end he said:
> Also, i'd not mind some test-coverage in sched.git as well.

I far as I know "do not mind" does not mean "must go to" ;-). Also I replied 
that 
I did not mind either but I do not think that it has much (if anything) to do 
with
the scheduler.

Anyway. I think I mentioned that I did not mind -mm either. I think it's ready 
for
the mainline. But if people still strongly feel that it has to be in -mm that's 
fine.
Lets just do s/Linus/Andrew/ on the first line and move on. But if Linus pulls 
it now
even better ;-)

Andrew, Linus, I'll let you guys decide which tree it needs to go.

Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Silent compiler warning introduced by commit 801c135ce73d5df1caf3eca35b66a10824ae0707 (UBI: Unsorted Block Images)

2008-02-09 Thread S.Çağlar Onur

Hi;

Following patch silents

drivers/mtd/ubi/vmt.c: In function `ubi_create_volume':
drivers/mtd/ubi/vmt.c:379: warning: statement with no effect

compiler warning introduced by commit 801c135ce73d5df1caf3eca35b66a10824ae0707 
(UBI: Unsorted Block Images)

Signed-off-by: S.Çağlar Onur <[EMAIL PROTECTED]>

 drivers/mtd/ubi/vmt.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/mtd/ubi/vmt.c b/drivers/mtd/ubi/vmt.c
index a3ca225..eafeaf0 100644
--- a/drivers/mtd/ubi/vmt.c
+++ b/drivers/mtd/ubi/vmt.c
@@ -376,7 +376,7 @@ out_sysfs:
get_device(>dev);
volume_sysfs_close(vol);
 out_gluebi:
-   ubi_destroy_gluebi(vol);
+   err = ubi_destroy_gluebi(vol);
 out_cdev:
cdev_del(>cdev);
 out_mapping:

Cheers
-- 
S.Çağlar Onur <[EMAIL PROTECTED]>
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] drivers/media/video/em28xx/: Fix undefined symbol error with CONFIG_SND=N

2008-02-09 Thread S.Çağlar Onur

Hi;

Following patch fixes following undefined symbol errors with CONFIG_SND=N

ERROR: "snd_pcm_period_elapsed" [drivers/media/video/em28xx/em28xx-alsa.ko] 
undefined!
ERROR: "snd_pcm_hw_constraint_integer" 
[drivers/media/video/em28xx/em28xx-alsa.ko] undefined!
ERROR: "snd_pcm_set_ops" [drivers/media/video/em28xx/em28xx-alsa.ko] undefined!
ERROR: "snd_pcm_lib_ioctl" [drivers/media/video/em28xx/em28xx-alsa.ko] 
undefined!
ERROR: "snd_card_new" [drivers/media/video/em28xx/em28xx-alsa.ko] undefined!
ERROR: "snd_card_free" [drivers/media/video/em28xx/em28xx-alsa.ko] undefined!
ERROR: "snd_card_register" [drivers/media/video/em28xx/em28xx-alsa.ko] 
undefined!
ERROR: "snd_pcm_new" [drivers/media/video/em28xx/em28xx-alsa.ko] undefined!

Signed-off-by: S.Çağlar Onur <[EMAIL PROTECTED]> 
 
 drivers/media/video/em28xx/Kconfig |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/media/video/em28xx/Kconfig 
b/drivers/media/video/em28xx/Kconfig
index abbd38c..0f7a0bd 100644
--- a/drivers/media/video/em28xx/Kconfig
+++ b/drivers/media/video/em28xx/Kconfig
@@ -13,7 +13,8 @@ config VIDEO_EM28XX
  module will be called em28xx
 
 config VIDEO_EM28XX_ALSA
-   depends on VIDEO_EM28XX
+   depends on VIDEO_EM28XX && SND
+   select SND_PCM
tristate "Empia EM28xx ALSA audio module"
---help---
  This is an ALSA driver for some Empia 28xx based TV cards.

Cheeer
-- 
S.Çağlar Onur <[EMAIL PROTECTED]>
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git pull] CPU isolation extensions (updated)

2008-02-09 Thread Paul Jackson

Max wrote:
> Linus, please pull CPU isolation extensions from

Did I miss something in this discussion?  I thought
Ingo was quite clear, and Linus pretty clear too,
that this patch should bake in *-mm or some such
place for a bit first.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.940.382.4214
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Update kernel/.gitignore with new generated files

2008-02-09 Thread S.Çağlar Onur

Hi;

Following patch updates kernel/.gitignore with new auto-generated files

Signed-off-by: S.Çağlar Onur <[EMAIL PROTECTED]>

 kernel/.gitignore |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kernel/.gitignore b/kernel/.gitignore
index f2ab700..ab4f109 100644
--- a/kernel/.gitignore
+++ b/kernel/.gitignore
@@ -3,3 +3,4 @@
 #
 config_data.h
 config_data.gz
+timeconst.h

Cheers
-- 
S.Çağlar Onur <[EMAIL PROTECTED]>
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Update arch/x86/boot/.gitignore with new generated files

2008-02-09 Thread S.Çağlar Onur

Hi;

Following patch update arch/x86/boot/.gitignore with new auto-generated files

Signed-off-by: S.Çağlar Onur <[EMAIL PROTECTED]>

 arch/x86/boot/.gitignore |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/boot/.gitignore b/arch/x86/boot/.gitignore
index 1846514..b1bdc4c 100644
--- a/arch/x86/boot/.gitignore
+++ b/arch/x86/boot/.gitignore
@@ -3,3 +3,5 @@ bzImage
 setup
 setup.bin
 setup.elf
+cpustr.h
+mkcpustr

Cheers
-- 
S.Çağlar Onur <[EMAIL PROTECTED]>
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[git pull] CPU isolation extensions (updated)

2008-02-09 Thread Max Krasnyansky

Linus, please pull CPU isolation extensions from

git://git.kernel.org/pub/scm/linux/kernel/git/maxk/cpuisol-2.6.git for-linus
 
Diffstat:
 Documentation/ABI/testing/sysfs-devices-system-cpu |   41 +++
 Documentation/cpu-isolation.txt|  113 +
 arch/x86/Kconfig   |1 
 arch/x86/kernel/genapic_flat_64.c  |4 
 drivers/base/cpu.c |   48 
 include/linux/cpumask.h|3 
 kernel/Kconfig.cpuisol |   42 +++
 kernel/Makefile|4 
 kernel/cpu.c   |   54 ++
 kernel/sched.c |   36 --
 kernel/stop_machine.c  |8 +
 kernel/workqueue.c |   30 -
 12 files changed, 337 insertions(+), 47 deletions(-)

This addresses all Andrew's comments for the last submission. Details here:
   http://marc.info/?l=linux-kernel=120236394012766=2

There are no code changes since last time, besides minor fix for moving 
on-stack array 
to __initdata as suggested by Andrew. Other stuff is just documentation 
updates. 

List of commits
   cpuisol: Make cpu isolation configrable and export isolated map
   cpuisol: Do not route IRQs to the CPUs isolated at boot
   cpuisol: Do not schedule workqueues on the isolated CPUs
   cpuisol: Move on-stack array used for boot cmd parsing into __initdata
   cpuisol: Documentation updates
   cpuisol: Minor updates to the Kconfig options
   cpuisol: Do not halt isolated CPUs with Stop Machine

I suggested by Ingo I'm CC'ing everyone who is even remotely connected/affected 
;-)
 
Ingo, Peter - Scheduler.
   There are _no_ changes in this area besides moving cpu_*_map maps from 
kerne/sched.c 
   to kernel/cpu.c.

Paul - Cpuset
   Again there are _no_ changes in this area.
   For reasons why cpuset is not the right mechanism for cpu isolation see this 
thread
  http://marc.info/?l=linux-kernel=120180692331461=2

Rusty - Stop machine.
   After doing a bunch of testing last three days I actually downgraded stop 
machine 
   changes from [highly experimental] to simply [experimental]. Pleas see this 
thread 
   for more info: http://marc.info/?l=linux-kernel=120243837206248=2
   Short story is that I ran several insmod/rmmod workloads on live multi-core 
boxes 
   with stop machine _completely_ disabled and did no see any issues. Rusty did 
not get
   a chance to reply yet, I hopping that we'll be able to make "stop machine" 
completely
   optional for some configurations.

Gerg - ABI documentation.
   Nothing interesting here. I simply added 
Documentation/ABI/testing/sysfs-devices-system-cpu
   and documented some of the attributes exposed in there.
   Suggested by Andrew.

I believe this is ready for the inclusion and my impression is that Andrew is 
ok with that. 
Most changes are very simple and do not affect existing behavior. As I 
mentioned before I've 
been using Workqueue and StopMachine changes in production for a couple of 
years now and have 
high confidence in them. Yet they are marked as experimental for now, just to 
be safe.

My original explanation is included below.

btw I'll be out skiing/snow boarding for the next 4 days and will have sporadic 
email access.
Will do my best to address question/concerns (if any) during that time.

Thanx
Max

--
This patch series extends CPU isolation support. Yes, most people want to 
virtuallize 
CPUs these days and I want to isolate them  :) .

The primary idea here is to be able to use some CPU cores as the dedicated 
engines for running
user-space code with minimal kernel overhead/intervention, think of it as an 
SPE in the 
Cell processor. I'd like to be able to run a CPU intensive (%100) RT task on 
one of the 
processors without adversely affecting or being affected by the other system 
activities. 
System activities here include _kernel_ activities as well. 

I'm personally using this for hard realtime purposes. With CPU isolation it's 
very easy to 
achieve single digit usec worst case and around 200 nsec average response times 
on off-the-shelf
multi- processor/core systems (vanilla kernel plus these patches) even under 
extreme system load. 
I'm working with legal folks on releasing hard RT user-space framework for that.
I believe with the current multi-core CPU trend we will see more and more 
applications that 
explore this capability: RT gaming engines, simulators, hard RT apps, etc.

Hence the proposal is to extend current CPU isolation feature.
The new definition of the CPU isolation would be:
---
1. Isolated CPU(s) must not be subject to scheduler load balancing
  Users must explicitly bind threads in order to run on those CPU(s).

2. By default interrupts

[PATCH] Silent compiler warning introduced by commit 75b6102257874a4ea796af686de2f72cfa0452f9 (rtc: add support for Epson RTC-9701JE V4)

2008-02-09 Thread S.Çağlar Onur

Hi;

Following patch silents

drivers/rtc/rtc-r9701.c: In function `r9701_get_datetime':
drivers/rtc/rtc-r9701.c:74: warning: unused variable `time'

compiler warning introduced by commit 75b6102257874a4ea796af686de2f72cfa0452f9 
(rtc: add support for Epson RTC-9701JE V4)

Signed-off-by: S.Çağlar Onur <[EMAIL PROTECTED]>
 
 drivers/rtc/rtc-r9701.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/rtc/rtc-r9701.c b/drivers/rtc/rtc-r9701.c
index a64626a..b35f9bf 100644
--- a/drivers/rtc/rtc-r9701.c
+++ b/drivers/rtc/rtc-r9701.c
@@ -71,7 +71,6 @@ static int read_regs(struct device *dev, unsigned char *regs, 
int no_regs)
 
 static int r9701_get_datetime(struct device *dev, struct rtc_time *dt)
 {
-   unsigned long time;
int ret;
unsigned char buf[] = { RSECCNT, RMINCNT, RHRCNT,
RDAYCNT, RMONCNT, RYRCNT };

Cheers
-- 
S.Çağlar Onur <[EMAIL PROTECTED]>
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Silent compiler warning introduced by 11b0cc3a4af65413ca3bb5698769e091486e0b22 (x25_asy: Fix ref count rule violation)

2008-02-09 Thread S.Çağlar Onur

Hi;

Following patch silents
 
drivers/net/wan/x25_asy.c: In function `x25_asy_open_tty':
drivers/net/wan/x25_asy.c:557: warning: unused variable `ld'

compiler warning introduced by commit 11b0cc3a4af65413ca3bb5698769e091486e0b22 
(x25_asy: Fix ref count rule violation)

Signed-off-by: S.Çağlar Onur <[EMAIL PROTECTED]>

 drivers/net/wan/x25_asy.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/net/wan/x25_asy.c b/drivers/net/wan/x25_asy.c
index 5e2d763..0f8aca8 100644
--- a/drivers/net/wan/x25_asy.c
+++ b/drivers/net/wan/x25_asy.c
@@ -554,7 +554,6 @@ static void x25_asy_receive_buf(struct tty_struct *tty, 
const unsigned char *cp,
 static int x25_asy_open_tty(struct tty_struct *tty)
 {
struct x25_asy *sl = (struct x25_asy *) tty->disc_data;
-   struct tty_ldisc *ld;
int err;
 
/* First make sure we're not already connected. */

Cheers
-- 
S.Çağlar Onur <[EMAIL PROTECTED]>
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Silent compiler warning introduced by acea6852f32b8805e166d885ed7e9f0c7cd10d41 ([BLUETOOTH]: Move children of connection device to NULL before connection down.)

2008-02-09 Thread S.Çağlar Onur

Hi;

Following patch silents

net/bluetooth/hci_sysfs.c: In function `del_conn':
net/bluetooth/hci_sysfs.c:339: warning: suggest parentheses around assignment 
used as truth value

compiler warning introduced by commit acea6852f32b8805e166d885ed7e9f0c7cd10d41 
([BLUETOOTH]: Move children of connection device to NULL before connection 
down.)

Signed-off-by: S.Çağlar Onur <[EMAIL PROTECTED]>

 net/bluetooth/hci_sysfs.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/bluetooth/hci_sysfs.c b/net/bluetooth/hci_sysfs.c
index e13cf5e..d2d1e4f 100644
--- a/net/bluetooth/hci_sysfs.c
+++ b/net/bluetooth/hci_sysfs.c
@@ -336,7 +336,7 @@ static void del_conn(struct work_struct *work)
struct device *dev;
struct hci_conn *conn = container_of(work, struct hci_conn, work);
 
-   while (dev = device_find_child(>dev, NULL, __match_tty)) {
+   while ((dev = device_find_child(>dev, NULL, __match_tty)) != 
NULL) {
device_move(dev, NULL);
put_device(dev);
}

Cheers
-- 
S.Çağlar Onur <[EMAIL PROTECTED]>
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Kernel Panic in MPT SAS on 2.6.24 (and 2.6.23.14, 2.6.23.9)

2008-02-09 Thread Maximilian Wilhelm

Am Friday, den  8 February hub Maximilian Wilhelm folgendes in die Tasten:

> Just noticed that Eric's address was wrong, so resend with corrected Cc.

> Eric, my intial report was http://lkml.org/lkml/2008/2/6/300
> 
> > Am Thursday, den  7 February hub Krzysztof Oledzki folgendes in die Tasten:
> > 
> > Hi!
> > 
> > > >While installing my new firewall I got the following kernel panic in
> > > >the MPT SAS driver which I need for the disks.
> > 
> > > >The first kernel I bootet was 2.6.23.14 which did panic so I tried a
> > > >2.6.24 which panics, too. Our usual FAI kernel (2.6.23.9) is also
> > > >affected.
> > 
> > > Could you please try 2.6.22-stable?
> > 
> > Yes it works :-/
> > 
> > I've put some things which on the web which might be helpful:
> > 
> > dmesg   http://files.rfc2324.org/mptsas_panic/2.6.22-dmesg
> > lspci -vhttp://files.rfc2324.org/mptsas_panic/2.6.22-lspci-v
> > .config http://files.rfc2324.org/mptsas_panic/2.6.22-config
> > 
> > I'll search for the last working kernel and try to break it down to a
> > commit tommorow when I can get a serial console or direct access.
> > The Java driven console redirection is everything else than fulfilling :-(
> > 
> > > It looks *very* similar to my problem:
> > 
> > > http://bugzilla.kernel.org/show_bug.cgi?id=9909
> > 
> > It seems to be the same controller:
> > 
> > 01:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E 
> > PCI-Express Fusion-MPT SAS (rev 08)
> > Subsystem: Dell Unknown device 1f10
> > Flags: bus master, fast devsel, latency 0, IRQ 16
> > I/O ports at ec00 [size=256]
> > Memory at fc8fc000 (64-bit, non-prefetchable) [size=16K]
> > Memory at fc8e (64-bit, non-prefetchable) [size=64K]
> > Expansion ROM at fc90 [disabled] [size=1M]
> > Capabilities: [50] Power Management version 2
> > Capabilities: [68] Express Endpoint IRQ 0
> > Capabilities: [98] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 
> > Enable-
> > Capabilities: [b0] MSI-X: Enable- Mask- TabSize=1

I did a git bisect between v2.6.22 v2.6.23 and it seems that
  6cb8f91320d3e720351c21741da795fed580b21b
introduced some badness.

---snip---
Fusion MPT base driver 3.04.05
Copyright (c) 1999-2007 LSI Logic Corporation
Fusion MPT SAS Host driver 3.04.05
mptbase: Initiating ioc0 bringup
ioc0: SAS1068E: Capabilities={Initiator}
scsi0 : ioc0: LSISAS1068E, FwRev=00142e00h, Ports=1, MaxQ=511, IRQ=16
scsi 0:0:0:0: Direct-Access SEAGATE  ST973402SS   S207 PQ: 0 ANSI: 5
scsi 0:0:1:0: Direct-Access SEAGATE  ST973402SS   S207 PQ: 0 ANSI: 5
BUG: unable to handle kernel NULL pointer dereference at virtual address 
0028
 printing eip:
c014b8ca
*pde = 
Oops:  [#1]
SMP 
Modules linked in:
CPU:6
EIP:0060:[]Not tainted VLI
EFLAGS: 00010046   (2.6.22-g6cb8f913 #13)
EIP is at __kmalloc+0x35/0x5f
eax: 0006   ebx: 0246   ecx: c03fa820   edx: 00d0
esi: 0010   edi:    ebp: c23a4000   esp: c2143dbc
ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
Process swapper (pid: 1, ti=c2142000 task=c2141670 task.ti=c2142000)
Stack: c22a3e80  c013cba9 c22a3e80 c22a3e80 c2399800  c02bcb67 
   0020 c2399800 00100100 00200200  00200200 fffefe48 c02ba15d 
   c2399800 c219 c2143e1c 023a4000 0001  000a0001 c02b 
Call Trace:
 [] __kzalloc+0xd/0x34
 [] mptsas_sas_expander_pg0+0x110/0x181
 [] mpt_timer_expired+0x0/0x28
 [] megasas_lookup_instance+0x9/0x2e
 [] mptsas_probe_expander_phys+0x42/0x395
 [] mpt_timer_expired+0x0/0x28
 [] mpt_timer_expired+0x0/0x28
 [] mptsas_probe+0x309/0x387
 [] pci_device_probe+0x36/0x57
 [] driver_probe_device+0xe1/0x15f
 [] klist_next+0x4b/0x6b
 [] __driver_attach+0x0/0x79
 [] __driver_attach+0x46/0x79
 [] bus_for_each_dev+0x33/0x55
 [] driver_attach+0x16/0x18
 [] __driver_attach+0x0/0x79
 [] bus_add_driver+0x6d/0x16d
 [] __pci_register_driver+0x48/0x74
 [] kernel_init+0x14a/0x2ac
 [] ret_from_fork+0x6/0x1c
 [] kernel_init+0x0/0x2ac
 [] kernel_init+0x0/0x2ac
 [] kernel_thread_helper+0x7/0x10
 ===
Code: 3f c0 85 c0 75 05 eb 1a 83 c1 0c 3b 01 77 f9 f6 c2 01 74 05 8b 71 08 eb 
03 8b 71 04 31 c0 85 f6 74 30 9c 5b fa 64 a1 08 b0 46 c0 <8b> 0c 86 83 39 00 74 
12 c7 41 0c 01 00 00 00 8b 01 48 89 01 8b 
EIP: [] __kmalloc+0x35/0x5f SS:ESP 0068:c2143dbc
---snip---

A simple git revert did not work on the current git and I don't want
to fiddle around in this area, so I couldn't test further.

Ciao
Max
-- 
Follow the white penguin.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH] USB: mark USB drivers as being GPL only

2008-02-09 Thread David Schwartz

Marcel Holtmann wrote:

> Lets phrase this in better words as Valdis pointed out: You can't
> distribute an application (binary or source form) under anything else
> than GPL if it uses a GPL library.

This simply cannot be correct. The only way it could be true is if the work
was a derivative work of a GPL'd work. There is no other way it could become
subject to the GPL.

So this argument reduces to -- any work that uses a library is a derivative
work of that library. But this is clearly wrong. For work X to be a
derivative work of work Y, it must contain substantial protected expression
from work Y, but an application need not have any expression from the
libraries it uses.

> It makes no difference if you
> distribute the GPL library with it or not.

If you do not distribute the GPL library, the library is simply being used
in the intended, ordinary way. You do not need to agree to, nor can you
violate, the GPL simply by using a work in its ordinary intended way.

If the application contains insufficient copyrightable expression from the
library to be considered a derivative work (and purely functional things do
not count), then it cannot be a derivative work. The library is not being
copied or distributed. So how can its copyright be infringed?

> But hey (again), feel free to disagree with me here.

This argument has no basis in law or common sense. It's completely
off-the-wall.

And to Pekka Enberg:

>It doesn't matter how "hard" it was to write that code. What matters
>is whether your code requires enough copyrighted aspects of the
>original work to constitute as derived work. There's a huge difference
>between using kmalloc and spin_lock and writing a driver that is built
>on to of the full USB stack of Linux kernel, for example.

The legal standard is not whether it "requires" copyrighted aspects but
whether it *contains* them. The driver does not contain the USB stack. The
aspects of the USB stack that the driver must contain are purely
functional -- its API.

You simply can't have it both ways. If the driver must contain X in order to
do its job, then X is functional and cannot make the driver a derivative
work. You cannot protect, by copyright, every way to accomplish a particular
function. Copyright only protects creative choices among millions of (at
least arguably) equally good choices.

DS

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: DMAR EHCI failures

2008-02-09 Thread David Brownell

On Monday 04 February 2008, Jiri Slaby wrote:
> Hi,
> 
> I have this in dmesg:
> DMAR:[DMA Write] Request device [00:02.0] fault addr ee1512000
> DMAR:[fault reason 05] PTE Write access is not set
> PCI-DMA: Intel(R) Virtualization Technology for Directed I/O
> DMAR:[DMA Read] Request device [00:1d.7] fault addr 7d5f
> DMAR:[fault reason 06] PTE Read access is not set
> DMAR:[DMA Read] Request device [00:1a.7] fault addr 7d5f1000
> DMAR:[fault reason 06] PTE Read access is not set
> PCI-GART: No AMD northbridge found.
> DMAR:[DMA Read] Request device [00:1a.2] fault addr 7d5f7000
> DMAR:[fault reason 06] PTE Read access is not set
> 
> CONFIG_DMAR=y
> CONFIG_DMAR_GFX_WA=y
> CONFIG_DMAR_FLOPPY_WA=y
> 
> Without the gfx workaround, there is much more output regarding 00:02.0. Is 
> there problem with broken hw, bios or drivers?

No idea.  Someone who knows the DMA Remapping stuff should have
an answer.  Presumably it works with DMAR disabled, yes?  If so,
then just don't use DMAR.  :)


> /sys/firmware/acpi/tables/DMAR:
> http://www.fi.muni.cz/~xslaby/sklad/DMAR.bin
> dmesg:
> http://www.fi.muni.cz/~xslaby/sklad/DMAR.dmesg
> 
> # for a in 00:02.0 00:1d.7 00:1a.7 00:1a.2; do lspci -vxxx -s $a; done
> 00:02.0 VGA compatible controller: Intel Corporation 82G33/G31 Express 
> Integrated Graphics Controller (rev 02) (prog-if 00 [VGA controller])
>  ... deletia ...
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [sample] mem_notify v6: usage example

2008-02-09 Thread Pavel Machek

On Sat 2008-02-09 11:07:09, Jon Masters wrote:
> This really needs to be triggered via a generic kernel 
> event in the  final version - I picture glibc having a 
> reservation API and having  generic support for freeing 
> such reservations.

Not sure what you are talking about. This seems very right to me.

We want memory-low notification, not yet another generic communication
mechanism.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Define a NO_GPIO macro to compare against and to use as an invalid GPIO

2008-02-09 Thread David Brownell

On Saturday 09 February 2008, Guennadi Liakhovetski wrote:
> On Fri, 8 Feb 2008, David Brownell wrote:
> 
> > Actually I thought that what you needed was an is_valid_gpio();
> > your motivation was that you needed a predicate.
> > 
> > The problem I have with a #define for a single such invalid GPIO
> > number is that people will inevitably start to assume it's the
> > only such number.  In particular "if (gpio == NO_GPIO) ..."
> > is by definition incorrect.
> > 
> > So I'd really rather see a predicate like is_valid_gpio().
> > 
> > If you want to designate one value for use as an initializer,
> > then I'd rather see a simple
> > 
> > #define NO_GPIO (-EINVAL)
> > 
> > without any option for arch-specific overrides ... along with a
> > comment that this is only *one* of the numerous values which
> > will fail is_valid_gpio().
> 
> I was thinking about irq numbers and trying to avoid as early as possible 
> their problem: namely that each and every platform has its own idea of 
> which irq numbers are valid and which are not, some use 0 as invalid irq, 
> some -1, some 256, etc.

That problem came about mostly because the definition was not
part of the original interface definition.  Not unlike DMA
addressing ... for the longest time it was impossible to
report DMA mapping failures.

Whereas there's *never* been any question about whether
negative numbers are invalid GPIO numbers.  (They aren't.)

> And when those platforms share drivers, problems  
> arise. And the simple and efficient NO_IRQ notion, that would fis those 
> problems nicely, cannot seem to establish itself.

Inertia is one of the problems there ... plus, the only
obvious advantage of "#define NO_IRQ 0" is that it makes
it easier to be lazy about initialization.

Plus, changing platforms to use that convention means they
mostly need to adopt an *unnatural* step of mapping from the
hardware IRQ numbers (which often start at zero, as they do
on one system I just ssh'd into) to some "logical" ID.
Even if you believe that's worthwhile, it's work; and it
could easily break something.

> The disadvantages I see in your suggestions are:
> 
> 1. two accessors (is_valid_gpio() and NO_GPIO) instead of one

Neither of those is an "accessor".  One is a "predicate"; and
the other is an "initializer".  (A better initializer name might
be more like INIT_GPIO_INVALID.)

The "accessor" scenario is actually a natural place to rely
on errno values.  Accessors are like

int gpio = foo_get_gpio(foo_ptr);

And the normal kernel convention there is to return negative
errno values that characterize the different fault modes.
(Ditto allocators:  foo_alloc_gpio etc.)

> 2. have to include errno.h

Which most code already does.  And you'd certainly want to
do that if you were using an accessor to get GPIOs...

> 3. it doesn't seem very logical to me to define a gpio number in terms of 
>an error code

It's not a GPIO number though; it's specifically designated as
NOT being a GPIO.  So why not have it be a magic number which
has meaning in multiple contexts?  Do you object to "ssize_t",
or in general the "return negative errno on fault" conventions?

> 4. "confusing freedom" - NO_GPIO is the invalid gpio number, but, in fact, 
>you can use just any negative number

I don't see any reason to change the API to disallow using
other negative values there.  What good would come from that?
(Remember, the *CURRENT* definition covers this situation
by saying no negative number is a valid GPIO number.)

At the machine instruction level, comparing against "-1" or
any other single currently-defined-as-invalid number is more
expensive than checking "is it negative".

And at a higher level, you'd prevent normal accessor (or
allocator, etc) idioms.  I can't see any value to preventing
such usage.

> Advantages of my proposal:
> 
> 1. simplicity - only one macro, and "well-definedness" - use this and only 
>this as invalid gpio number. The rest are either valid, or undefined.

It's currently simple and well defined; negative numbers
are not GPIOs.  You want a *different* model, which is in
fact more complex ... it adds that "undefined" notion.

> 2. overridable by platforms - though I don't have any examples at hand, I 
>can imagine, that some platforms would prefer some specific "natural" 
>for them numbers.

They can already pick any positive number.  I don't know
about you, but I *shudder* to think of anyone who's
seriously trying to manage more than 2 Gbits of GPIOs
one bit at a time!

> But, this is not something I would spend too much energy arguing about, 
> and this is your code in the end:-) So, if you still disagree, I'll do it 
> the way you suggest. I might well be wrong too:-)

Well, you've not convinced me there's any reason to change
the current rules to prevent accessor/allocator idioms from
returning fault codes that could be meaningful.

- Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel"

Re: [git pull] x86 updates

2008-02-09 Thread Randy Dunlap

On Sun, 10 Feb 2008 00:24:50 +0100 (CET) Thomas Gleixner wrote:

> Linus,
> 
> please pull the pending x86 updates from:
> 
>   ssh://master.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git 
> master

Hi Thomas,
can we please get diffstats with git pull requests?
(in the future)


> The update contains:
> 
> - a couple of bugfixes
> - CPA and DEBUG_PAGEALLOC improvements
> - x86 power management consolidation 
> - GEODE updates
> - 32bit boot time page table construction rework
> - sparse and compile warning fixes
> - trivial cleanups
> 
> There are two patches out of x86 scope as well:
> 
> - lguest bugfix: x86 broke it, fix is obviously correct and Rusty
>   is away
> 
> - randomization docs: resulted out of a x86 randomization
>   discussion
> 
> Thanks,
> 
>   tglx
> 
> 
> 
> Ahmed S. Darwish (1):
>   lguest: accept guest _PAGE_PWT page table entries
> 
> Andres Salomon (5):
>   x86: GEODE: MFGPT: Minor cleanups
>   x86: GEODE: MFGPT: drop module owner usage from MFGPT API
>   x86: GEODE: MFGPT: replace 'flags' field with 'avail' bit
>   x86: GEODE: MFGPT: make mfgpt_timer_setup available outside of 
> mfgpt_32.c
>   x86: GEODE: MFGPT: fix a potential race when disabling a timer
> 
> Arnd Hannemann (1):
>   x86: GEODE: MFGPT: fix typo in printk in mfgpt_timer_setup
> 
> Denys Vlasenko (1):
>   x86: trivial printk optimizations
> 
> Harvey Harrison (6):
>   x86: fix sparse warning in xen/time.c
>   x86: sparse warning in therm_throt.c
>   x86: sparse warnings in pageattr.c
>   x86: fix sparse warning in topology.c
>   x86: fix sparse warnings in acpi/bus.c
>   x86, core: remove CONFIG_FORCED_INLINING
> 
> Ian Campbell (2):
>   x86: construct 32-bit boot time page tables in native format.
>   x86: fix early_ioremap pagetable ops
> 
> Ingo Molnar (2):
>   x86: fixup more paravirt fallout
>   brk: help text typo fix
> 
> Jiri Kosina (1):
>   brk: document randomize_va_space and CONFIG_COMPAT_BRK (was Re:
> 
> Jordan Crouse (2):
>   x86: GEODE: MFGPT: Use "just-in-time" detection for the MFGPT timers
>   x86: GEODE: make sure the right MFGPT timer fired the timer tick
> 
> Rafael J. Wysocki (4):
>   x86 PM: move 64-bit hibernation files to arch/x86/power
>   x86 PM: rename 32-bit files in arch/x86/power
>   x86 PM: consolidate suspend and hibernation code
>   x86 PM: update stale comments
> 
> Thomas Gleixner (6):
>   x86: avoid unused variable warning in mm/init_64.c
>   x86: DEBUG_PAGEALLOC: enable after mem_init()
>   x86: introduce page pool in cpa
>   x86: cpa, use page pool
>   x86: cpa, enable CONFIG_DEBUG_PAGEALLOC on 64-bit
>   x86: cpa, strict range check in try_preserve_large_page()
> 
> Willy Tarreau (1):
>   x86: GEODE fix MFGPT input clock value

---
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: how to tell i386 from x86-64 kernel

2008-02-09 Thread Pavel Machek

On Sat 2008-02-09 14:34:30, Arjan van de Ven wrote:
> On Sat, 9 Feb 2008 21:13:43 +0100 (CET)
> Jan Engelhardt <[EMAIL PROTECTED]> wrote:
> 
> > 
> > On Feb 1 2008 12:53, Alejandro Riveira Fernández wrote:
> > >> 
> > >> # uname -m
> > >> I won't tell you.
> > >> # linux32 uname -m
> > >> i686
> > >
> > > Ubuntu 7.10 64 bit userland 2.6.24
> > >
> > >$ uname -m
> > >x86_64
> > >$ linux32 uname -m
> > >i686
> > 
> > What I am saying is that uname(2) does not reliably tell you whether
> > you have a 64-bit kernel underneath unless you have other sources of 
> > information.
> 
> that's sort of a rabbit-and-the-frog problem. The 32 bit emulator tries to
> look EXACTLY like the 32 bit kernel, and it really should.
> If someone wants a method to detect even that... we would really want
> to know the exact usecase.. because very likely it's the wrong answer
> to some other problem ;-)

dmesg should really really tell you 32 vs. 64 bit, at the first line
where it prints versions... so you easily know what you are dealing
with when someone sends a bugreport.

Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] ipvs: Cleanup sync daemon code

2008-02-09 Thread Simon Horman

On Sun, Feb 10, 2008 at 12:38:11AM +0100, Sven Wegener wrote:
> Hi all,
>
> I'd like to get your feedback on this:
>
> - Use kthread_run instead of doing a double-fork via kernel_thread()
>
> - Return proper error codes to user-space on failures
>
> Currently ipvsadm --start-daemon with an invalid --mcast-interface will  
> silently suceed. With these changes we get an appropriate "No such 
> device" error.
>
> - Use wait queues for both master and backup thread
>
> Instead of doing an endless loop with sleeping for one second, we now use 
> wait queues. The master sync daemon has its own wait queue and gets woken 
> up when we have enough data to sent and also at a regular interval. The  
> backup sync daemon sits on the wait queue of the mcast socket and gets  
> woken up as soon as we have data to process.

Hi Sven,

This looks good to me, assuming that its tested and works.

A few minor things:

In sb_queue_tail() master loop is woken up if
the ip_vs_sync_count reaches 10, which seems a bit arbitary.

Perhaps its just my mail reader, but the patch seemed a bit screwy when
I saved it to a file. I this fixed the problem I was seeing using s/^  / /

Unfortuantely/Fortunately I am about to leave for a few days skiing,
so if I am quiet you will know why.

Acked-by: Simon Horman <[EMAIL PROTECTED]>

-- 
Horms

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-git20 -- BUG: sleeping function called from invalid context at include/asm/uaccess_32.h:449

2008-02-09 Thread Arjan van de Ven

On Sat, 9 Feb 2008 16:26:43 -0800
Andrew Morton <[EMAIL PROTECTED]> wrote:

> Ugh, how did I let that one through?
> 
> Guys, how often mut it be said?  PLEASE always test all code with all
> kernel deubg options enabled.

maybe we should make a CONFIG_KERNEL_DEVELOPER option that SELECTs the
various options that really should be on.

If there is a chance of reasonable agreement on what those options are I'll cook
up a patch... (but I don't want to get bogged down in a 500 mail flamewar about
CONFIG_FOO_BAR being right for this or not...)

-- 
If you want to reach me at my work email, use [EMAIL PROTECTED]
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-git20 -- BUG: sleeping function called from invalid context at include/asm/uaccess_32.h:449

2008-02-09 Thread Andrew Morton

On Sat, 9 Feb 2008 14:03:28 -0500 "Miles Lane" <[EMAIL PROTECTED]> wrote:

> Command run:
>  find /proc | xargs tail
> 
> [ 2710.028219] BUG: sleeping function called from invalid context at
> include/asm/uaccess_32.h:449
> [ 2710.028229] in_atomic():1, irqs_disabled():0
> [ 2710.028232] 1 lock held by head/9380:
> [ 2710.028234]  #0:  (hugetlb_lock){--..}, at: []
> hugetlb_overcommit_handler+0x16/0x3e
> [ 2710.028248] Pid: 9380, comm: head Not tainted 2.6.24-git20 #5
> [ 2710.028260]  [] __might_sleep+0xc2/0xc9
> [ 2710.028267]  [] copy_to_user+0x32/0x49
> [ 2710.028277]  [] do_proc_doulongvec_minmax+0x1df/0x27f
> [ 2710.028289]  [] proc_doulongvec_minmax+0x15/0x17
> [ 2710.028295]  [] hugetlb_overcommit_handler+0x2a/0x3e
> [ 2710.028303]  [] proc_sys_read+0x5e/0x7b
> [ 2710.028311]  [] ? proc_sys_read+0x0/0x7b
> [ 2710.028317]  [] vfs_read+0x8a/0x106
> [ 2710.028325]  [] sys_read+0x3b/0x60
> [ 2710.028331]  [] sysenter_past_esp+0x5f/0xa5

Ugh, how did I let that one through?

Guys, how often mut it be said?  PLEASE always test all code with all
kernel deubg options enabled.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Sectionized printk data

2008-02-09 Thread Arnaldo Carvalho de Melo

Em Sun, Feb 10, 2008 at 01:18:18AM +0100, Jan Engelhardt escreveu:
> 
> On Feb 9 2008 21:54, Arnaldo Carvalho de Melo wrote:
> >> To drop strings that are only shown once anyway, such as:
> >> 
> >> static int __init ebtables_init(void)
> >> {
> >> int ret;
> >> 
> >> mutex_lock(_mutex);
> >> list_add(_standard_target.list, _targets);
> >> mutex_unlock(_mutex);
> >> if ((ret = nf_register_sockopt(_sockopts)) < 0)
> >> return ret;
> >> 
> >> ->  printk(KERN_INFO "Ebtables v2.0 registered\n");
> >> return 0;
> >> }
> >> 
> >> >If you say "saving memory" then please let us know with specific examples
> >> >in what area these savings will really pay off.
> >
> >[...]
> >With a tool like this the advantage is that no source code has to be
> >changed, strings in __init functions are automagically moved to
> >.init.data, the disadvantage is that not all strings can be moved to
> >.init.data as there were (are?) subsystems that keep pointers to the
> >string passed and another tool would be involved in the build process.
> 
> There is one corner case to consider:
> 
> 
>   static char abc[] = "foo";
> 
>   int __init init_module(void)
>   {
>   printk(abc);
>   }
> 
> I am not sure if gcc/ld is smart enough to figure out that abc is
> only ever used from within an __init function and that it could hence
> be moved to __initdata.

The initstr tool mentioned doesn't touches this case, as it doesn't
searches specific functions such as printk, it looks for strings inside
__init marked functions. In the above example abc won't be marked as
__initdata.

So if there are two places where the same string is used, with one being
in a __init function one copy goes to .init.data and another to .data.

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: mm/slub.c warnings

2008-02-09 Thread Christoph Lameter

On Sat, 9 Feb 2008, Vegard Nossum wrote:

> Hi,
> 
> I get these warnings when compiling mm/slub.c in linux-2.6.git:
> 
> mm/slub.c: In function 'slab_alloc':
> mm/slub.c:1637: warning: assignment makes pointer from integer without a cast
> mm/slub.c:1637: warning: assignment makes pointer from integer without a cast
> mm/slub.c: In function 'slab_free':
> mm/slub.c:1796: warning: assignment makes pointer from integer without a cast
> mm/slub.c:1796: warning: assignment makes pointer from integer without a cast
> 
> The actual lines are calls to cmpxchg_local(). This is probably
> because I'm compiling with M386. I'm guessing the source of the
> warnings is in include/asm-x86/cmpxchg_32.h, lines 283 and 286. Config
> attached.

Hmmm.. That cmpxchg local needs to be fixed? Mathieu?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Sectionized printk data

2008-02-09 Thread Jan Engelhardt


On Feb 9 2008 21:54, Arnaldo Carvalho de Melo wrote:
>> To drop strings that are only shown once anyway, such as:
>> 
>> static int __init ebtables_init(void)
>> {
>> int ret;
>> 
>> mutex_lock(_mutex);
>> list_add(_standard_target.list, _targets);
>> mutex_unlock(_mutex);
>> if ((ret = nf_register_sockopt(_sockopts)) < 0)
>> return ret;
>> 
>> ->  printk(KERN_INFO "Ebtables v2.0 registered\n");
>> return 0;
>> }
>> 
>> >If you say "saving memory" then please let us know with specific examples
>> >in what area these savings will really pay off.
>
>[...]
>With a tool like this the advantage is that no source code has to be
>changed, strings in __init functions are automagically moved to
>.init.data, the disadvantage is that not all strings can be moved to
>.init.data as there were (are?) subsystems that keep pointers to the
>string passed and another tool would be involved in the build process.

There is one corner case to consider:


static char abc[] = "foo";

int __init init_module(void)
{
printk(abc);
}

I am not sure if gcc/ld is smart enough to figure out that abc is
only ever used from within an __init function and that it could hence
be moved to __initdata.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Define a NO_GPIO macro to compare against and to use as an invalid GPIO

2008-02-09 Thread Guennadi Liakhovetski

On Fri, 8 Feb 2008, David Brownell wrote:

> On Thursday 31 January 2008, Guennadi Liakhovetski wrote:
> > As discussed on i2c mailing list with David Brownell, and number
> > outside of the 0...MAX_INT range is invalid as a GPIO number.
> > Define a macro, similar to NO_IRQ, to be used as a deliberate
> > invalid GPIO, rather than defining a is_valid_gpio() macro.
> 
> Actually I thought that what you needed was an is_valid_gpio();
> your motivation was that you needed a predicate.
> 
> The problem I have with a #define for a single such invalid GPIO
> number is that people will inevitably start to assume it's the
> only such number.  In particular "if (gpio == NO_GPIO) ..."
> is by definition incorrect.
> 
> So I'd really rather see a predicate like is_valid_gpio().
> 
> If you want to designate one value for use as an initializer,
> then I'd rather see a simple
> 
>   #define NO_GPIO (-EINVAL)
> 
> without any option for arch-specific overrides ... along with a
> comment that this is only *one* of the numerous values which
> will fail is_valid_gpio().

I was thinking about irq numbers and trying to avoid as early as possible 
their problem: namely that each and every platform has its own idea of 
which irq numbers are valid and which are not, some use 0 as invalid irq, 
some -1, some 256, etc. And when those platforms share drivers, problems 
arise. And the simple and efficient NO_IRQ notion, that would fis those 
problems nicely, cannot seem to establish itself.

The disadvantages I see in your suggestions are:

1. two accessors (is_valid_gpio() and NO_GPIO) instead of one
2. have to include errno.h
3. it doesn't seem very logical to me to define a gpio number in terms of 
   an error code
4. "confusing freedom" - NO_GPIO is the invalid gpio number, but, in fact, 
   you can use just any negative number

Advantages of my proposal:

1. simplicity - only one macro, and "well-definedness" - use this and only 
   this as invalid gpio number. The rest are either valid, or undefined.
2. overridable by platforms - though I don't have any examples at hand, I 
   can imagine, that some platforms would prefer some specific "natural" 
   for them numbers.

But, this is not something I would spend too much energy arguing about, 
and this is your code in the end:-) So, if you still disagree, I'll do it 
the way you suggest. I might well be wrong too:-)

Thanks
Guennadi
---
Guennadi Liakhovetski
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [resend] 3c509: convert to isa_driver and pnp_driver v4

2008-02-09 Thread Ondrej Zary

On Saturday 09 February 2008 22:48:05 Stephen Hemminger wrote:
> On Sat, 9 Feb 2008 22:33:07 +0100
>
> Ondrej Zary <[EMAIL PROTECTED]> wrote:
> > Hello,
> > this patch converts 3c509 driver to isa_driver and pnp_driver. The result
> > is that autoloading using udev and hibernation works with ISA PnP cards.
> > It also adds hibernation support for non-PnP ISA cards.
> >
> > xcvr module parameter was removed as its value was not used.
> >
> > Tested using 3 ISA cards in various combinations of PnP and non-PnP
> > modes. EISA and MCA only compile-tested.
> >
> > Signed-off-by: Ondrej Zary <[EMAIL PROTECTED]>
> >
> > --- linux-2.6.24-orig/drivers/net/3c509.c   2008-01-27 19:48:19.0
> > +0100 +++ linux-2.6.24-pentium/drivers/net/3c509.c  2008-02-07
> > 17:58:45.0 +0100 @@ -54,25 +54,24 @@
> > v1.19a 28Oct2002 Davud Ruggiero <[EMAIL PROTECTED]>
> > - Increase *read_eeprom udelay to workaround oops with 
> > 2 cards.
> > v1.19b 08Nov2002 Marc Zyngier <[EMAIL PROTECTED]>
> > -   - Introduce driver model for EISA cards.
> > +   - Introduce driver model for EISA cards.
> > +   v1.20  04Feb2008 Ondrej Zary <[EMAIL PROTECTED]>
> > +   - convert to isa_driver and pnp_driver and some cleanups
> >  */
>
> Don't bother with comment, kernel uses git change log to figure out
> who to blame.
>
> >  #define DRV_NAME   "3c509"
> > -#define DRV_VERSION"1.19b"
> > -#define DRV_RELDATE"08Nov2002"
> > +#define DRV_VERSION"1.20"
> > +#define DRV_RELDATE"04Feb2008"
> >
> >  /* A few values that may be tweaked. */
> >
> >  /* Time in jiffies before concluding the transmitter is hung. */
> >  #define TX_TIMEOUT  (400*HZ/1000)
> > -/* Maximum events (Rx packets, etc.) to handle at each interrupt. */
> > -static int max_interrupt_work = 10;
> >
> >  #include 
> > -#ifdef CONFIG_MCA
> >  #include 
> > -#endif
> > -#include 
> > +#include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -97,10 +96,6 @@
> >
> >  static char version[] __initdata = DRV_NAME ".c:" DRV_VERSION " "
> > DRV_RELDATE " [EMAIL PROTECTED]";
> >
> > -#if defined(CONFIG_PM) && (defined(CONFIG_MCA) || defined(CONFIG_EISA))
> > -#define EL3_SUSPEND
> > -#endif
> > -
> >  #ifdef EL3_DEBUG
> >  static int el3_debug = EL3_DEBUG;
> >  #else
> > @@ -111,6 +106,7 @@
> >   * a global variable so that the mca/eisa probe routines can increment
> >   * it */
> >  static int el3_cards = 0;
> > +#define EL3_MAX_CARDS 8
> >
> >  /* To minimize the size of the driver source I only define operating
> > constants if they are used several times.  You'll need the manual
> > @@ -119,7 +115,7 @@
> >  #define EL3_DATA 0x00
> >  #define EL3_CMD 0x0e
> >  #define EL3_STATUS 0x0e
> > -#define EEPROM_READ 0x80
> > +#defineEEPROM_READ 0x80
> >
> >  #define EL3_IO_EXTENT  16
> >
> > @@ -168,23 +164,31 @@
> >   */
> >  #define SKB_QUEUE_SIZE 64
> >
> > +typedef enum { EL3_ISA, EL3_PNP, EL3_MCA, EL3_EISA } el3_cardtype;
> > +
>
> No typedef please (see checkpatch)

Is there any standard way to solve this without a typedef? I added 
el3_dev_fill() function which fills that card type value according to a 
parameter passed to it. "int" could be used instead and "#define EL3_ISA 
0", "#define EL3_PNP 1" - but I think that's ugly.

>
> >  struct el3_private {
> > struct net_device_stats stats;
>
> Use network device stats in net_device now

OK, looks like the driver will need some more patches.

> > -   struct net_device *next_dev;
> > spinlock_t lock;
> > /* skb send-queue */
> > int head, size;
> > struct sk_buff *queue[SKB_QUEUE_SIZE];
>
> What about sk_buff_head (linked list instead)?

I don't know anything about this, maybe in next patch.

>
> > -   enum {
> > -   EL3_MCA,
> > -   EL3_PNP,
> > -   EL3_EISA,
> > -   } type; /* type of device */
> > -   struct device *dev;
> > +   el3_cardtype type;
> >  };
> > -static int id_port __initdata = 0x110; /* Start with 0x110 to avoid new
> > sound cards.*/ -static struct net_device *el3_root_dev;
> > +static int id_port;
> > +static int current_tag;
> > +static struct net_device *el3_devs[EL3_MAX_CARDS];
>
> I know is only ISA, but having a limit seems silly, can't the device just
> use allocated space like other drivers.

EL3_MAX_CARDS is also used as a parameter to isa_register_driver(). The irq[] 
array (see below) is limited to 8 devices too. And finally, the card itself 
can use one of 8 different IRQs (3,5,7,2/9,10,11,12,15). So I think that it's 
not worth adding more code to support more cards.
The original driver will do bad things with more than 8 cards too - read 
beyond the end of irq[] array.

> > +
> > +/* Parameters that may be passed into the module. */
> > +static int debug = -1;
> > +static int irq[] = {-1, -1, -1, -1, -1, -1, -1, -1};
> > +/* Maximum events (Rx packets,

Re: Query about set_pages_* API

2008-02-09 Thread Arjan van de Ven

On Sat, 09 Feb 2008 15:40:12 -0700
Larry Finger <[EMAIL PROTECTED]> wrote:

> Is the set_pages_* API that replaces change_page_attr described
> somewhere? I have been unable to find it with Google.
> 
> I'm trying to modify the VirtualBox kernel module to work with
> 2.6.24-git (and 2.6.25) on x86_64 architecture. The current code has
> a value of the third argument of the call (prot) with 3 variants. All
> variations have the following bits set: _PAGE_PRESENT, _PAGE_RW,
> _PAGE_DIRTY, and _PAGE_ACCESSED. Number 2 adds _PAGE_NX to the above,
> and number 3 adds _PAGE_GLOBAL to the bits in variation 1.
> 
>  From the code in arch/x86/mm/pageattr.c, I figured I need to call
> set_pages_wb() unconditionally, and set_pages_nx() if _PAGE_NX is
> set. Will these calls be sufficient? I thought about calling
> set_pages_rw(), but that entry is not exported.
> 

ok looking at the actual code.. it seems to only care about making a piece of 
memory executable 
(and then clearing it before freeing), so all you need is set_memory_x() and 
set_memory_nx()



-- 
If you want to reach me at my work email, use [EMAIL PROTECTED]
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Query about set_pages_* API

2008-02-09 Thread Arjan van de Ven

On Sat, 09 Feb 2008 15:40:12 -0700
Larry Finger <[EMAIL PROTECTED]> wrote:

> I'm trying to modify the VirtualBox kernel module to work with
> 2.6.24-git (and 2.6.25) on x86_64 architecture. The current code has
> a value of the third argument of the call (prot) with 3 variants. All
> variations have the following bits set: _PAGE_PRESENT, _PAGE_RW,
> _PAGE_DIRTY, and _PAGE_ACCESSED. Number 2 adds _PAGE_NX to the above,
> and number 3 adds _PAGE_GLOBAL to the bits in variation 1.
> 
>  From the code in arch/x86/mm/pageattr.c, I figured I need to call
> set_pages_wb() unconditionally, and set_pages_nx() if _PAGE_NX is
> set. Will these calls be sufficient? I thought about calling
> set_pages_rw(), but that entry is not exported.

it depends on what the code is trying to achieve.
(this makes it not a trivial 1:1 scripted replacement ;-)

Which attribute is the code trying to change? Is it trying to make
a piece of code (non) cachable? or executable? You need to figure out what
the intent is..

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Sectionized printk data

2008-02-09 Thread Arnaldo Carvalho de Melo

Em Sat, Feb 09, 2008 at 11:08:45PM +0100, Jan Engelhardt escreveu:
> 
> On Feb 4 2008 19:07, Sam Ravnborg wrote:
> >> The attached patch allows something along the lines:
> >> 
> >> int __init some_function(void)
> >> {
> >> [...]
> >> pr_init(KERN_WARNING "failure %s in %s\n", ...);
> >> [...]
> >> }
> >> 
> >> Another idea I had was to make printk a macro that figures out the
> >> section of the surrounding function and then moves the data
> >> automatically when it is a literal, but I couldn't find mechanisms that
> >> allow this.  Anyone of you got an idea?
> >> 
> >> What do you think in general?
> >
> >What is the rationale behind this?
> 
> To drop strings that are only shown once anyway, such as:
> 
> static int __init ebtables_init(void)
> {
> int ret;
> 
> mutex_lock(_mutex);
> list_add(_standard_target.list, _targets);
> mutex_unlock(_mutex);
> if ((ret = nf_register_sockopt(_sockopts)) < 0)
> return ret;
> 
> ->  printk(KERN_INFO "Ebtables v2.0 registered\n");
> return 0;
> }
> 
> >If you say "saving memory" then please let us know with specific examples
> >in what area these savings will really pay off.

A long time ago I played with this, using a sparse based tool that was
inserted as the compiler and modified the code before passing to gcc,
i.e. a pre-pre-processor:

http://www.kernel.org/pub/linux/kernel/people/acme/sparse/initstr.c

I couldn't find in the archives, but IIRC some extra pages were freed
after boot, i.e. strings moved from .data to .init.data.

With a tool like this the advantage is that no source code has to be
changed, strings in __init functions are automagically moved to
.init.data, the disadvantage is that not all strings can be moved to
.init.data as there were (are?) subsystems that keep pointers to the
string passed and another tool would be involved in the build process.

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Fwd: Re: e1000 1sec latency problem

2008-02-09 Thread Kok, Auke

Ray Lee wrote:
> On Feb 9, 2008 1:51 PM, Kok, Auke <[EMAIL PROTECTED]> wrote:
>> Martin Rogge wrote:
>>> On Saturday 09 February 2008 11:07:26 Martin Rogge wrote:
 Hi,

 I am not so familiar with the various mailing lists and missed out on
 [EMAIL PROTECTED] the first time. Please cc me on any
 replies.

 I am looking for help with either making the e1000e driver work on my
 Thinkpad T60 or fixing the 1s latency issue with e1000.

 To be honest, I do not understand why the e1000e driver failed to recognize
 the NIC when I tried. At least, I noticed the correct device ID is defined
 in drivers/net/e1000e/hw.h:

 #define E1000_DEV_ID_82573L0x109A

 Any help is appreciated.

 Thanks,

 Martin

 --  Forwarded Message  --

 Subject: Re: e1000 1sec latency problem
 Date: Thursday 07 February 2008
 From: Martin Rogge <[EMAIL PROTECTED]>
 To: linux-kernel@vger.kernel.org

 Pavel Machek wrote:
> Hi!
>
> I have the famous e1000 latency problems:
 Hi, I have the same problem with my Thinkpad T60.

 [EMAIL PROTECTED]:~# ping arnold
 PING arnold (192.168.158.6) 56(84) bytes of data.
 64 bytes from arnold (192.168.158.6): icmp_seq=1 ttl=64 time=49.7 ms
 64 bytes from arnold (192.168.158.6): icmp_seq=2 ttl=64 time=0.438 ms
 64 bytes from arnold (192.168.158.6): icmp_seq=3 ttl=64 time=1000 ms
 64 bytes from arnold (192.168.158.6): icmp_seq=4 ttl=64 time=0.970 ms
 64 bytes from arnold (192.168.158.6): icmp_seq=5 ttl=64 time=885 ms
 64 bytes from arnold (192.168.158.6): icmp_seq=6 ttl=64 time=0.484 ms
 64 bytes from arnold (192.168.158.6): icmp_seq=7 ttl=64 time=529 ms
 64 bytes from arnold (192.168.158.6): icmp_seq=8 ttl=64 time=1.02 ms
 64 bytes from arnold (192.168.158.6): icmp_seq=9 ttl=64 time=149 ms
 64 bytes from arnold (192.168.158.6): icmp_seq=10 ttl=64 time=0.549 ms
 64 bytes from arnold (192.168.158.6): icmp_seq=11 ttl=64 time=0.829 ms

 --- arnold ping statistics ---
 11 packets transmitted, 11 received, 0% packet loss, time ms
 rtt min/avg/max/mdev = 0.438/238.113/1000.967/365.279 ms, pipe 2
 [EMAIL PROTECTED]:~# uname -a
 Linux zorro 2.6.24 #6 SMP PREEMPT Sun Feb 3 18:27:48 CET 2008 i686 Intel(R)
 Core(TM)2 CPU T7200  @ 2.00GHz GenuineIntel GNU/Linux
 [EMAIL PROTECTED]:~# lspci -vvv
>>> [stuff deleted]
>>>
 Unfortunately the e1000e driver is not an option as it will not detect the
 NIC:

 from dmesg with e1000 compiled in:
 Intel(R) PRO/1000 Network Driver - version 7.3.20-k2-NAPI
 Copyright (c) 1999-2006 Intel Corporation.
 ACPI: PCI Interrupt :02:00.0[A] -> GSI 16 (level, low) -> IRQ 16
 PCI: Setting latency timer of device :02:00.0 to 64
 e1000: :02:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x1)
 00:15:58:c3:3a:71
 e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection

 from dmesg with e1000e compiled in:
 e1000e: Intel(R) PRO/1000 Network Driver - 0.2.0
 e1000e: Copyright (c) 1999-2007 Intel Corporation.

 Any pointers?

 Thanks,

 Martin



 ---
>>> Just for the records, I googled the following solution for the Lenovo T60:
>>>
>>> (a) use the e1000 driver
>>> (b) if compiling as a module, add the following parameter to modprobe.conf:
>>> options e1000 RxIntDelay=5
>>> (c) if compiling a static driver, use the following patch (based on 2.6.24):
>>>
>>> --- e1000_param.c.orig2008-01-24 23:58:37.0 +0100
>>> +++ e1000_param.c 2008-02-09 20:42:23.0 +0100
>>> @@ -158,7 +158,7 @@
>>>   * Valid Range: 0-65535
>>>   */
>>>  E1000_PARAM(RxIntDelay, "Receive Interrupt Delay");
>>> -#define DEFAULT_RDTR   0
>>> +#define DEFAULT_RDTR   5
>>>  #define MAX_RXDELAY   0x
>>>  #define MIN_RXDELAY0
>>>
>>> After reboot, the average ping time is still factor 10 worse than it should
>>> be, but it stays below 2 ms (which is a remarkable improvement compared to
>>> 1000 ms).
>> correct, this was a workaround which improved things for most people, but 
>> did not
>> *fix* it.
>>
>> the real fix is to disable L1 ASPM alltogether at the cost of more power
>> consumption, which is what is in e1000e in 2.6.25-git.
> 
> e1000e doesn't recognize his NIC. Will you be adding this to the e1000
> driver as well?


no, from 2.6.25 onwards e1000e will support 82573 nics, so you'll have to 
migrate
drivers, and you will get the fix automatically that way.

after 2.6.25 releases, support for all pci-e nics will be removed from the e1000
driver.

Cheers

Auke
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at

[RFC] ipvs: Cleanup sync daemon code

2008-02-09 Thread Sven Wegener


Hi all,

I'd like to get your feedback on this:

- Use kthread_run instead of doing a double-fork via kernel_thread()

- Return proper error codes to user-space on failures

Currently ipvsadm --start-daemon with an invalid --mcast-interface will 
silently suceed. With these changes we get an appropriate "No such device" 
error.


- Use wait queues for both master and backup thread

Instead of doing an endless loop with sleeping for one second, we now use 
wait queues. The master sync daemon has its own wait queue and gets woken 
up when we have enough data to sent and also at a regular interval. The 
backup sync daemon sits on the wait queue of the mcast socket and gets 
woken up as soon as we have data to process.


diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 56f3c94..519bd96 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -890,6 +890,7 @@ extern char ip_vs_backup_mcast_ifn[IP_VS_IFNAME_MAXLEN];
 extern int start_sync_thread(int state, char *mcast_ifn, __u8 syncid);
 extern int stop_sync_thread(int state);
 extern void ip_vs_sync_conn(struct ip_vs_conn *cp);
+extern void ip_vs_sync_init(void);


 /*
diff --git a/net/ipv4/ipvs/ip_vs_core.c b/net/ipv4/ipvs/ip_vs_core.c
index 963981a..0ccee4b 100644
--- a/net/ipv4/ipvs/ip_vs_core.c
+++ b/net/ipv4/ipvs/ip_vs_core.c
@@ -1071,6 +1071,8 @@ static int __init ip_vs_init(void)
 {
int ret;

+   ip_vs_sync_init();
+
ret = ip_vs_control_init();
if (ret < 0) {
IP_VS_ERR("can't setup control.\n");
diff --git a/net/ipv4/ipvs/ip_vs_sync.c b/net/ipv4/ipvs/ip_vs_sync.c
index 948378d..36063d3 100644
--- a/net/ipv4/ipvs/ip_vs_sync.c
+++ b/net/ipv4/ipvs/ip_vs_sync.c
@@ -29,6 +29,9 @@
 #include 
 #include  /* for ip_mc_join_group */
 #include 
+#include 
+#include 
+#include 

 #include 
 #include 
@@ -68,7 +71,8 @@ struct ip_vs_sync_conn_options {
 };

 struct ip_vs_sync_thread_data {
-   struct completion *startup;
+   struct completion *startup; /* set to NULL once completed */
+   int *retval; /* only valid until startup is completed */
int state;
 };

@@ -123,9 +127,10 @@ struct ip_vs_sync_buff {
 };


-/* the sync_buff list head and the lock */
+/* the sync_buff list head, the lock and the counter */
 static LIST_HEAD(ip_vs_sync_queue);
 static DEFINE_SPINLOCK(ip_vs_sync_lock);
+static unsigned int ip_vs_sync_count;

 /* current sync_buff for accepting new conn entries */
 static struct ip_vs_sync_buff   *curr_sb = NULL;
@@ -140,6 +145,13 @@ volatile int ip_vs_backup_syncid = 0;
 char ip_vs_master_mcast_ifn[IP_VS_IFNAME_MAXLEN];
 char ip_vs_backup_mcast_ifn[IP_VS_IFNAME_MAXLEN];

+/* sync daemon tasks */
+static struct task_struct *sync_master_thread;
+static struct task_struct *sync_backup_thread;
+
+/* wait queue for master sync daemon */
+static DECLARE_WAIT_QUEUE_HEAD(sync_master_wait);
+
 /* multicast addr */
 static struct sockaddr_in mcast_addr;

@@ -148,6 +160,8 @@ static inline void sb_queue_tail(struct ip_vs_sync_buff *sb)
 {
spin_lock(_vs_sync_lock);
list_add_tail(>list, _vs_sync_queue);
+   if (++ip_vs_sync_count == 10)
+   wake_up_interruptible(_master_wait);
spin_unlock(_vs_sync_lock);
 }

@@ -163,6 +177,7 @@ static inline struct ip_vs_sync_buff * sb_dequeue(void)
struct ip_vs_sync_buff,
list);
list_del(>list);
+   ip_vs_sync_count--;
}
spin_unlock_bh(_vs_sync_lock);

@@ -536,14 +551,17 @@ static int bind_mcastif_addr(struct socket *sock, char 
*ifname)
 static struct socket * make_send_sock(void)
 {
struct socket *sock;
+   int result;

/* First create a socket */
-   if (sock_create_kern(PF_INET, SOCK_DGRAM, IPPROTO_UDP, ) < 0) {
+   result = sock_create_kern(PF_INET, SOCK_DGRAM, IPPROTO_UDP, );
+   if (result < 0) {
IP_VS_ERR("Error during creation of socket; terminating\n");
-   return NULL;
+   return ERR_PTR(result);
}

-   if (set_mcast_if(sock->sk, ip_vs_master_mcast_ifn) < 0) {
+   result = set_mcast_if(sock->sk, ip_vs_master_mcast_ifn);
+   if (result < 0) {
IP_VS_ERR("Error setting outbound mcast interface\n");
goto error;
}
@@ -551,14 +569,16 @@ static struct socket * make_send_sock(void)
set_mcast_loop(sock->sk, 0);
set_mcast_ttl(sock->sk, 1);

-   if (bind_mcastif_addr(sock, ip_vs_master_mcast_ifn) < 0) {
+   result = bind_mcastif_addr(sock, ip_vs_master_mcast_ifn);
+   if (result < 0) {
IP_VS_ERR("Error binding address of the mcast interface\n");
goto error;
}

-   if (sock->ops->connect(sock,
-  (struct sockaddr*)_addr,
-  sizeof(struct sockaddr), 0) < 0) {
+   result = sock->ops->connect(sock,
+   (struct

[git pull] x86 updates

2008-02-09 Thread Thomas Gleixner

Linus,

please pull the pending x86 updates from:

  ssh://master.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git master

The update contains:

- a couple of bugfixes
- CPA and DEBUG_PAGEALLOC improvements
- x86 power management consolidation 
- GEODE updates
- 32bit boot time page table construction rework
- sparse and compile warning fixes
- trivial cleanups

There are two patches out of x86 scope as well:

- lguest bugfix: x86 broke it, fix is obviously correct and Rusty
  is away

- randomization docs: resulted out of a x86 randomization
  discussion

Thanks,

tglx



Ahmed S. Darwish (1):
  lguest: accept guest _PAGE_PWT page table entries

Andres Salomon (5):
  x86: GEODE: MFGPT: Minor cleanups
  x86: GEODE: MFGPT: drop module owner usage from MFGPT API
  x86: GEODE: MFGPT: replace 'flags' field with 'avail' bit
  x86: GEODE: MFGPT: make mfgpt_timer_setup available outside of mfgpt_32.c
  x86: GEODE: MFGPT: fix a potential race when disabling a timer

Arnd Hannemann (1):
  x86: GEODE: MFGPT: fix typo in printk in mfgpt_timer_setup

Denys Vlasenko (1):
  x86: trivial printk optimizations

Harvey Harrison (6):
  x86: fix sparse warning in xen/time.c
  x86: sparse warning in therm_throt.c
  x86: sparse warnings in pageattr.c
  x86: fix sparse warning in topology.c
  x86: fix sparse warnings in acpi/bus.c
  x86, core: remove CONFIG_FORCED_INLINING

Ian Campbell (2):
  x86: construct 32-bit boot time page tables in native format.
  x86: fix early_ioremap pagetable ops

Ingo Molnar (2):
  x86: fixup more paravirt fallout
  brk: help text typo fix

Jiri Kosina (1):
  brk: document randomize_va_space and CONFIG_COMPAT_BRK (was Re:

Jordan Crouse (2):
  x86: GEODE: MFGPT: Use "just-in-time" detection for the MFGPT timers
  x86: GEODE: make sure the right MFGPT timer fired the timer tick

Rafael J. Wysocki (4):
  x86 PM: move 64-bit hibernation files to arch/x86/power
  x86 PM: rename 32-bit files in arch/x86/power
  x86 PM: consolidate suspend and hibernation code
  x86 PM: update stale comments

Thomas Gleixner (6):
  x86: avoid unused variable warning in mm/init_64.c
  x86: DEBUG_PAGEALLOC: enable after mem_init()
  x86: introduce page pool in cpa
  x86: cpa, use page pool
  x86: cpa, enable CONFIG_DEBUG_PAGEALLOC on 64-bit
  x86: cpa, strict range check in try_preserve_large_page()

Willy Tarreau (1):
  x86: GEODE fix MFGPT input clock value

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: HPET timer broken using 2.6.23.13 / nanosleep() hangs

2008-02-09 Thread Andrew Paprocki

Thomas,

I haven't found a good way to capture the SysRq output for this. I
found that when it locks up at boot time, even SysRq is unresponsive.
I don't have another way of getting console on the machine right now
to get the output off of it. I have since upgraded to 2.6.24 and the
problem still persists.

Another interesting twist though.. I just rebuilt my kernel with
ARCH=x86_64 and HPET works perfectly. So this only appears to break
when in 32-bit mode. For some reason it picks tsc at boot time, but if
I install hpet afterwards under x86_64 it no longer hangs when I run
'sleep 1'. Does that shed any more light on the problem?

Thanks,
-Andrew

# uname -a
Linux am2 2.6.24 #7 Sat Feb 9 18:06:50 EST 2008 x86_64 GNU/Linux
# dmesg | egrep -i clock\|hpet
ACPI: HPET 3DFE7780, 0038 (r1 RS690  AWRDACPI 42302E31 AWRD   98)
ACPI: HPET id: 0x10b9a201 base: 0xfed0
hpet clockevent registered
TSC calibrated against HPET
hpet0: at MMIO 0xfed0, IRQs 2, 8, 0, 0
hpet0: 4 32-bit timers, 14318180 Hz
Time: tsc clocksource has been installed.
Real Time Clock Driver v1.12ac
hpet_resources: 0xfed0 is busy
# echo -n hpet >
/sys/devices/system/clocksource/clocksource0/current_clocksource
# dmesg | tail -1
Time: hpet clocksource has been installed.
# time sleep 1

real0m1.001s
user0m0.000s
sys 0m0.000s

On Jan 18, 2008 5:26 AM, Thomas Gleixner <[EMAIL PROTECTED]> wrote:
> On Wed, 16 Jan 2008, Andrew Paprocki wrote:
>
> > I applied the patch and I am still locking up after
> > Time: hpet clocksource has been installed.
>
> That was expected :)
>
> > I rebooted with "clocksource=tsc" to get the logs of the trace which
> > was added. I'm assuming the grep below gets all the interesting parts.
> > I enabled the HPET character device as mentioned before, which is why
> > the hpet0 lines appear now.
> >
> > # dmesg | egrep -i "(hpet|time|clock)"
> > ACPI: HPET 37FE7400, 0038 (r1 RS690  AWRDACPI 42302E31 AWRD   98)
> > ATI board detected. Disabling timer routing over 8254.
> > ACPI: PM-Timer IO Port: 0x4008
> > ACPI: HPET id: 0x10b9a201 base: 0xfed0
> > Kernel command line: vga=0x31a root=/dev/sda1 ro clocksource=tsc
> > HPET check: t1=5 t2=1139 s=56226339975 n=56226539985
>
> Ok, the counter works when we initialize the HPET.
>
> t2-t1 = 1134 ticks ~= 79us
> s-n = 200010 ~= 2525MHz --> That should be the frequency of your CPU.
>
> > Jan 16 14:44:43 am2 kernel: Call Trace:
> > Jan 16 14:44:48 am2 kernel:  [] enqueue_hrtimer+0xd7/0xe2
> > Jan 16 14:44:48 am2 kernel:  [] hrtimer_start+0xe8/0xf4
> > Jan 16 14:44:48 am2 kernel:  [] do_nanosleep+0x48/0x73
> > Jan 16 14:44:48 am2 kernel:  [] 
> > hrtimer_nanosleep_restart+0x34/0xa1
> > Jan 16 14:44:48 am2 kernel:  [] hrtimer_wakeup+0x0/0x18
> > Jan 16 14:44:48 am2 kernel:  [] sys_restart_syscall+0xe/0xf
> > Jan 16 14:44:48 am2 kernel:  [] sysenter_past_esp+0x5f/0x85
>
> When the system is hung, can you please hit SysRq-Q wait a bit and hit
> SysRq-Q again. Please provide the output.
>
> Thanks,
> tglx
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] scsi: ses fix mem leaking when fail to add intf

2008-02-09 Thread Yinghai Lu

[PATCH] scsi: ses fix mem leaking when fail to add intf

fix leaking with scomp leaking when failing.
also remove one extra space.

Signed-off-by: Yinghai Lu <[EMAIL PROTECTED]>

Index: linux-2.6/drivers/scsi/ses.c
===
--- linux-2.6.orig/drivers/scsi/ses.c
+++ linux-2.6/drivers/scsi/ses.c
@@ -416,11 +416,11 @@ static int ses_intf_add(struct class_dev
int i, j, types, len, components = 0;
int err = -ENOMEM;
struct enclosure_device *edev;
-   struct ses_component *scomp;
+   struct ses_component *scomp = NULL;
 
if (!scsi_device_enclosure(sdev)) {
/* not an enclosure, but might be in one */
-   edev =  enclosure_find(>host->shost_gendev);
+   edev = enclosure_find(>host->shost_gendev);
if (edev) {
ses_match_to_enclosure(edev, sdev);
class_device_put(>cdev);
@@ -456,9 +456,6 @@ static int ses_intf_add(struct class_dev
if (!buf)
goto err_free;
 
-   ses_dev->page1 = buf;
-   ses_dev->page1_len = len;
-
result = ses_recv_diag(sdev, 1, buf, len);
if (result)
goto recv_failed;
@@ -473,6 +470,9 @@ static int ses_intf_add(struct class_dev
type_ptr[0] == ENCLOSURE_COMPONENT_ARRAY_DEVICE)
components += type_ptr[1];
}
+   ses_dev->page1 = buf;
+   ses_dev->page1_len = len;
+   buf = NULL;
 
result = ses_recv_diag(sdev, 2, hdr_buf, INIT_ALLOC_SIZE);
if (result)
@@ -489,6 +489,7 @@ static int ses_intf_add(struct class_dev
goto recv_failed;
ses_dev->page2 = buf;
ses_dev->page2_len = len;
+   buf = NULL;
 
/* The additional information page --- allows us
 * to match up the devices */
@@ -506,11 +507,26 @@ static int ses_intf_add(struct class_dev
goto recv_failed;
ses_dev->page10 = buf;
ses_dev->page10_len = len;
+   buf = NULL;
 
  no_page10:
-   scomp = kmalloc(sizeof(struct ses_component) * components, GFP_KERNEL);
+
+   /* Page 7 for the descriptors is optional */
+   result = ses_recv_diag(sdev, 7, hdr_buf, INIT_ALLOC_SIZE);
+   if (result)
+   goto simple_populate;
+
+   len = (hdr_buf[2] << 8) + hdr_buf[3] + 4;
+   /* add 1 for trailing '\0' we'll use */
+   buf = kzalloc(len + 1, GFP_KERNEL);
+   if (!buf)
+   goto err_free;
+   result = ses_recv_diag(sdev, 7, buf, len);
+
+ simple_populate:
+   scomp = kzalloc(sizeof(struct ses_component) * components, GFP_KERNEL);
if (!scomp)
-   goto  err_free;
+   goto err_free;
 
edev = enclosure_register(cdev->dev, sdev->sdev_gendev.bus_id,
  components, _enclosure_callbacks);
@@ -521,20 +537,10 @@ static int ses_intf_add(struct class_dev
 
edev->scratch = ses_dev;
for (i = 0; i < components; i++)
-   edev->component[i].scratch = scomp++;
+   edev->component[i].scratch = scomp + i;
 
-   /* Page 7 for the descriptors is optional */
-   buf = NULL;
-   result = ses_recv_diag(sdev, 7, hdr_buf, INIT_ALLOC_SIZE);
-   if (result)
-   goto simple_populate;
-
-   len = (hdr_buf[2] << 8) + hdr_buf[3] + 4;
-   /* add 1 for trailing '\0' we'll use */
-   buf = kzalloc(len + 1, GFP_KERNEL);
-   result = ses_recv_diag(sdev, 7, buf, len);
+   /* result and buf from page 7 check */
if (result) {
- simple_populate:
kfree(buf);
buf = NULL;
desc_ptr = NULL;
@@ -598,6 +604,7 @@ static int ses_intf_add(struct class_dev
err = -ENODEV;
  err_free:
kfree(buf);
+   kfree(scomp);
kfree(ses_dev->page10);
kfree(ses_dev->page2);
kfree(ses_dev->page1);
@@ -630,6 +637,7 @@ static void ses_intf_remove(struct class
ses_dev = edev->scratch;
edev->scratch = NULL;
 
+   kfree(ses_dev->page10);
kfree(ses_dev->page1);
kfree(ses_dev->page2);
kfree(ses_dev);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Query about set_pages_* API

2008-02-09 Thread Larry Finger

Is the set_pages_* API that replaces change_page_attr described somewhere? I have been unable to 
find it with Google.


I'm trying to modify the VirtualBox kernel module to work with 2.6.24-git (and 2.6.25) on x86_64 
architecture. The current code has a value of the third argument of the call (prot) with 3 variants. 
All variations have the following bits set: _PAGE_PRESENT, _PAGE_RW, _PAGE_DIRTY, and 
_PAGE_ACCESSED. Number 2 adds _PAGE_NX to the above, and number 3 adds _PAGE_GLOBAL to the bits in 
variation 1.


From the code in arch/x86/mm/pageattr.c, I figured I need to call set_pages_wb() unconditionally, 
and set_pages_nx() if _PAGE_NX is set. Will these calls be sufficient? I thought about calling 
set_pages_rw(), but that entry is not exported.


Thanks,

Larry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: how to tell i386 from x86-64 kernel

2008-02-09 Thread Arjan van de Ven

On Sat, 9 Feb 2008 21:13:43 +0100 (CET)
Jan Engelhardt <[EMAIL PROTECTED]> wrote:

> 
> On Feb 1 2008 12:53, Alejandro Riveira Fernández wrote:
> >> 
> >> # uname -m
> >> I won't tell you.
> >> # linux32 uname -m
> >> i686
> >
> > Ubuntu 7.10 64 bit userland 2.6.24
> >
> >$ uname -m
> >x86_64
> >$ linux32 uname -m
> >i686
> 
> What I am saying is that uname(2) does not reliably tell you whether
> you have a 64-bit kernel underneath unless you have other sources of 
> information.

that's sort of a rabbit-and-the-frog problem. The 32 bit emulator tries to
look EXACTLY like the 32 bit kernel, and it really should.
If someone wants a method to detect even that... we would really want
to know the exact usecase.. because very likely it's the wrong answer
to some other problem ;-)

-- 
If you want to reach me at my work email, use [EMAIL PROTECTED]
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] time: Fix constant size in kernel/timeconst.h

2008-02-09 Thread Johann Felix Soden

H. Peter Anvin wrote:
> Johann Felix Soden wrote:
> > kernel/timeconst.pl generates only long sized constants in timeconst.pl
> > which gives this warning:
> > 
> > kernel/time.c: In function 'msecs_to_jiffies':
> > kernel/time.c:472: warning: integer constant is too large for 'long' type
> > 
> > unsigned long long is needed.
> Hm, you've just taken a warning and elevated it to a bug.
> 
> According to the C standard, a constant has the shortest type (>= int) 
> needed to hold the constant, and the warning above is somewhat bogus in 
> that context (what version of gcc is that, anyway?)
> 
> ULL is only appropriate to 32-bit machines, or there will be other 
> issues downstream.  The Right Way[TM] to do this would be to get Linux 
> to have the [U]INTxx_C() macros from C99.
> 
>   -hpa

Sorry for this. Thanks for teaching about the C standard.
About your question: gcc 4.2.3 gave me this warning.
And I'm a little bit surprised because the kernel code is full of
constants with ULL. Is kernel/time.c a special case?

J. F. Soden


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] scsi: ses fix for len and mem leaking when fail to add intf

2008-02-09 Thread Yinghai Lu

On Feb 9, 2008 7:00 AM, James Bottomley
<[EMAIL PROTECTED]> wrote:
>
> On Sat, 2008-02-09 at 04:13 -0800, Yinghai Lu wrote:
> > [PATCH] scsi: ses fix for len and mem leaking when fail to add intf
> >
> > change to u32 before left shifting char
>
> This one is a bit unnecessary; C promotion rules guarantee that
> everything is promoted to int (or above) before doing arithmetic.  Since
> it's only ever done on 16 bits, signed or unsigned int is adequate for
> the conversion.

thank. just learned that.

[EMAIL PROTECTED]:~/xx/xx/notes> cat ctest.c
#include 
int main(int argc, char *argv[])
{
unsigned char buf[20];
int len;

buf[2] = 0x02;
buf[3] = 0x03;

len = (buf[2] << 8) + buf[3];

printf("len = %x\n", len);
return 0;
}
[EMAIL PROTECTED]:~/xx/xx/notes> gcc -o ctest ctest.c
[EMAIL PROTECTED]:~/xx/xx/notes> ./ctest
len = 203
[EMAIL PROTECTED]:~/xx/xx/notes>


>
> > also fix leaking with scomp leaking when failing.
>
> Yes, I see that, thanks!  There's also the kmalloc of scomp which should
> be kzalloc if you care to fix that up in the resend.
>
> > - edev =  enclosure_find(>host->shost_gendev);
> > + edev = enclosure_find(>host->shost_gendev);
>
> Space cleanups also need mention in the changelog.
>
> > - ses_dev->page1 = buf;
> > - ses_dev->page1_len = len;
> > -
> >   result = ses_recv_diag(sdev, 1, buf, len);
> >   if (result)
> >   goto recv_failed;
> >
> > + ses_dev->page1 = buf;
> > + ses_dev->page1_len = len;
> > +
>
> Neither of us gets this right.  By removing the kfree(buf) from the
> err_free path, you cause a leak here.  I cause a double free.  I think
> putting back the kfree(buf) and keeping this hunk is the fix.

the buf already become sdev->page1, sdev->pag10, sdev->page2.
so it will be freed via them

>
> >   types = buf[10];
> >   len = buf[11];
> >
> > @@ -474,11 +474,12 @@ static int ses_intf_add(struct class_dev
> >   components += type_ptr[1];
> >   }
> >
> > + buf = NULL;
>
> Yes, prevents double free (but only if buf is freed).

it became sdev->page1 already

>
> >   result = ses_recv_diag(sdev, 2, hdr_buf, INIT_ALLOC_SIZE);
> >   if (result)
> >   goto recv_failed;
> >
> > @@ -492,11 +493,12 @@ static int ses_intf_add(struct class_dev
> >
> >   /* The additional information page --- allows us
> >* to match up the devices */
> > + buf = NULL;
>
> It's probably better to move these closer to the statements that make
> them necessary (in this case above the comment).

OK

>
> >   if (IS_ERR(edev)) {
> >   err = PTR_ERR(edev);
> > + kfree(scomp);
> >   goto err_free;
> >   }
>
> kfree(scomp) should be in the err_free path just in case someone else
> adds something to this.

ok.

>
> >   /* add 1 for trailing '\0' we'll use */
> >   buf = kzalloc(len + 1, GFP_KERNEL);
> > - result = ses_recv_diag(sdev, 7, buf, len);
> > - if (result) {
> > + if (buf)
> > + result = ses_recv_diag(sdev, 7, buf, len);
> > + else
> > + result = 7;
> > +
>
> What exactly is this supposed to be doing, and why 7?  If you're
> thinking of conditioning the page 7 receive on the success of the
> allocation, we really need the allocation failure report more than we
> need the driver to attach.

want to move out label out of if later.

>
> > - addl_desc_ptr += addl_desc_ptr[1] + 2;
> > + addl_desc_ptr += 2 + addl_desc_ptr[1];
>
> This is rather pointless, isn't it?
>
> >   err_free:
> > - kfree(buf);
>
> You can't remove this.  Also add kfree(scomp) here.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 0/4] make pr_debug() dynamic

2008-02-09 Thread Jan Engelhardt


On Feb 8 2008 10:52, Jason Baron wrote:
>On Thu, Feb 07, 2008 at 02:42:14PM -0800, Joe Perches wrote:
>> On Thu, 2008-02-07 at 16:03 -0500, Jason Baron wrote:
>> > make the pr_debug() function dependent upon the new immediate 
>> > infrastruture.
>> 
>> What's wrong with klogd -c 8 or equivalent?
>
>Setting the loglevel higher, will not make pr_debug() calls visible. The only
>way to make them visible right now, is by re-compiling the kernel.

pr_debug() was IMHO meant to be a compile-time optimization
to throw out debug messages most people do not want.

If you want to switch on/off debugging messages, use
printk(KERN_DEBUG) [with klogd -c something] and not pr_debug!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: wrong cylinders of kingston usb pendrive [intel 82801DB]

2008-02-09 Thread Alan Stern

On Mon, 4 Feb 2008, Patrick Ringl wrote:

> Hello,
> 
> I am suffering from the following (usb-related?) problem:
> 
> I have several different mashines - all x86 architecture - just lets 
> call them mashineA, mashineB and mashineC.
> Anyway, mashineA has a severe problem with a 
> Kingston-USB-pendrive(2gig). I simply cant install anything on it - the 
> kernel usually moans with problems like "attempt to access beyond end of 
> device" - while it does work fine with several noname usb-pendrives of 
> the same size.
> Now, I just tested that kingston pendrive on mashineB and mashineC - 
> where it runs fine .. I can install debian to it (same installation 
> media) without any problem or kernel errors.
> 
> I compared the output of dmesg and fdisk from mashineA and mashineB and 
> C .. and the difference is simple: mashineA always shows 248 cylinders - 
> while all the other mashines show 228 cylinders.

The number of cylinders is meaningless.  What matters is the number of 
sectors.  What does "fdisk -l /dev/sdX" (substitute the appropriate 
letter for X) display for the pendrive on each of the machines?

What messages show up in the dmesg log when you plug in the pendrive?

What version of the Linux kernel are you using?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Sectionized printk data

2008-02-09 Thread Jan Engelhardt


On Feb 4 2008 19:07, Sam Ravnborg wrote:
>> The attached patch allows something along the lines:
>> 
>> int __init some_function(void)
>> {
>> [...]
>> pr_init(KERN_WARNING "failure %s in %s\n", ...);
>> [...]
>> }
>> 
>> Another idea I had was to make printk a macro that figures out the
>> section of the surrounding function and then moves the data
>> automatically when it is a literal, but I couldn't find mechanisms that
>> allow this.  Anyone of you got an idea?
>> 
>> What do you think in general?
>
>What is the rationale behind this?

To drop strings that are only shown once anyway, such as:

static int __init ebtables_init(void)
{
int ret;

mutex_lock(_mutex);
list_add(_standard_target.list, _targets);
mutex_unlock(_mutex);
if ((ret = nf_register_sockopt(_sockopts)) < 0)
return ret;

->  printk(KERN_INFO "Ebtables v2.0 registered\n");
return 0;
}

>If you say "saving memory" then please let us know with specific examples
>in what area these savings will really pay off.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 >

1 - 100 of 621 matches

Mail list logo