Re: Scheduler(?) regression from 2.6.22 to 2.6.24 for short-lived threads
On Sun, Feb 10, 2008 at 01:00:56AM -0600, Olof Johansson wrote: > On Sun, Feb 10, 2008 at 07:15:58AM +0100, Willy Tarreau wrote: > > On Sat, Feb 09, 2008 at 11:29:41PM -0600, Olof Johansson wrote: > > > 40M: > > > 2.6.22time 94315 ms > > > 2.6.23time 107930 ms > > > 2.6.24time 113291 ms > > > 2.6.24-git19 time 110360 ms > > > > > > So with more work per thread, the differences become less but they're > > > still there. At the 40M loop, with 500 threads it's quite a bit of > > > runtime per thread. > > > > No, it's really nothing. I had to push the loop to 1 billion to make the > > load > > noticeable. You don't have 500 threads, you have 2 threads and that load is > > repeated 500 times. And if we look at the numbers, let's take the worst one > > : > > > 40M: > > > 2.6.24time 113291 ms > > 113291/500 = 227 microseconds/loop. This is still very low compared to the > > smallest timeslice you would have (1 ms at HZ=1000). > > > > So your threads are still completing *before* the scheduler has to preempt > > them. > > Hmm? I get that to be 227ms per loop, which is way more than a full > timeslice. Running the program took in the range of 2 minutes, so it's > 11 milliseconds, not microseconds. Damn you're right! I don't know why I assumed that the reported time was in microseconds. Nevermind. > > > It seems generally unfortunate that it takes longer for a new thread to > > > move over to the second cpu even when the first is busy with the original > > > thread. I can certainly see cases where this causes suboptimal overall > > > system behaviour. > > > > In fact, I don't think it takes longer, I think it does not do it at their > > creation, but will do it immediately after the first slice is consumed. This > > would explain the important differences here. I don't know how we could > > ensure > > that the new thread is created on the second CPU from the start, though. > > The math doesn't add up for me. Even if it rebalanced at the end of > the first slice (i.e. after 1ms), that would be a 1ms penalty per > iteration. With 500 threads that'd be a total penalty of 500ms. yes you're right. > > I tried inserting a sched_yield() at the top of the busy loop (1M loops). > > By default, it did not change a thing. Then I simply set sched_compat_yield > > to 1, and the two threads then ran simultaneously with a stable low time > > (2700 ms instead of 10-12 seconds). > > > > Doing so with 10k loops (initial test) shows times in the range 240-300 ms > > only instead of 2200-6500 ms. > > Right, likely because the long-running cases got stuck at the busy loop > at the end, which would end up aborting quicker if the other thread got > scheduled for just a bit. It was a mistake to post that variant of the > testcase, it's not as relevant and doesn't mimic the original workload I > was trying to mimic as well as if the first loop was made larger. agreed, but what's important is not to change the workload, but to see what changes induce a different behaviour. > > Ingo, would it be possible (and wise) to ensure that a new thread being > > created gets immediately rebalanced in order to emulate what is done here > > with sched_compat_yield=1 and sched_yield() in both threads just after the > > thread creation ? I don't expect any performance difference doing this, > > but maybe some shell scripts reliying on short-lived pipes would get faster > > on SMP. > > There's always the tradeoff of losing cache warmth whenever a thread is > moved, so I'm not sure if it's a good idea to always migrate it at > creation time. It's not a simple problem, really. yes I know. That should not prevent us from experimenting though. If thread-CPU affinity is too strong and causes the second CPU to be rarely used, there's something wrong waiting for a fix. > > > I agree that the testcase is highly artificial. Unfortunately, it's > > > not uncommon to see these kind of weird testcases from customers tring > > > to evaluate new hardware. :( They tend to be pared-down versions of > > > whatever their real workload is (the real workload is doing things more > > > appropriately, but the smaller version is used for testing). I was lucky > > > enough to get source snippets to base a standalone reproduction case on > > > for this, normally we wouldn't even get copies of their binaries. > > > > I'm well aware of that. What's important is to be able to explain what is > > causing the difference and why the test case does not represent anything > > related to performance. Maybe the code author wanted to get 500 parallel > > threads and got his code wrong ? > > I believe it started out as a simple attempt to parallelize a workload > that sliced the problem too low, instead of slicing it in larger chunks > and have each thread do more work at a time. It did well on 2.6.22 with > almost a 2x speedup, but did worse than the single-treaded testcase on a > 2.6.24 kernel. > > So yes,
Re: [3/6] kgdb: core
On Sun, Feb 10, 2008 at 08:43:52AM +0100, Ingo Molnar wrote: > > * Christoph Hellwig <[EMAIL PROTECTED]> wrote: > > > This still doesn't address a lot of the review comments from Jason's > > last posting. > > sorry, which mails are those? It's all in the thread starting with '[PATCH 0/8] kgdb 2.6.25 version', msgid [EMAIL PROTECTED] or at http://lkml.org/lkml/2008/2/9/104 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
panic about sysfs with adm1026
Calling initcall 0x80c4b575: sm_adm1026_init+0x0/0xe() i2c-adapter i2c-1: : Unrecognized stepping 0x45. Defaulting to ADM1026. general protection fault: [1] SMP CPU 0 Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.24-smp-09379-g0cf975e-dirty #34 RIP: 0010:[] [] sysfs_add_file+0x16/0x81 RSP: :81040503dd50 EFLAGS: 00010286 RAX: RBX: fffe002e002d002c RCX: 48d9 RDX: 0002 RSI: fffe002e002d002c RDI: 810202c4fb90 RBP: R08: 810202c4fb90 R09: R10: 0002 R11: 0002 R12: fff4 R13: 810202c4fb90 R14: 000c R15: 810202c4fb90 FS: () GS:80bde000() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: 7fff94de3470 CR3: 00201000 CR4: 06e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process swapper (pid: 1, threadinfo 81040503c000, task 81020504) Stack: 810202c4fb90 0001 810202c4e000 80b87850 810202c4e118 808e0cc0 802d933c 81040503dd55 810202c17878 Call Trace: [] sysfs_create_group+0xa2/0x106 [] adm1026_detect+0x4b3/0x522 [] adm1026_detect+0x0/0x522 [] i2c_probe_address+0xb9/0xfc [] i2c_probe+0x162/0x175 [] adm1026_detect+0x0/0x522 [] i2c_register_driver+0x9a/0xea [] kernel_init+0x15d/0x2c9 [] child_rip+0xa/0x12 [] kernel_init+0x0/0x2c9 [] child_rip+0x0/0x12 Code: c0 84 c0 74 0c 41 58 48 89 df 5b 5d e9 2a 07 00 00 5e 5b 5d c3 41 55 49 89 fd 41 54 41 bc f4 ff ff ff 55 53 48 89 f3 48 83 ec 28 <8b> 76 10 48 8b 3b 66 81 e6 ff 0f 66 81 ce 00 80 0f b7 f6 e8 fd RIP [] sysfs_add_file+0x16/0x81 RSP ---[ end trace b23a825db37d3043 ]--- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3/6] kgdb: core
* Christoph Hellwig <[EMAIL PROTECTED]> wrote: > This still doesn't address a lot of the review comments from Jason's > last posting. sorry, which mails are those? Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.24-mm1] Mempolicy: silently restrict nodemask to allowed nodes V3
On Sat, 9 Feb 2008, Greg KH wrote: > > Once the patch goes into Linus's tree, feel free to send it to the > [EMAIL PROTECTED] address so that we can include it in the 2.6.24.x > tree. I've been ignoring the patches because they say "PATCH 2.6.24-mm1", and so I simply don't know whether it's supposed to go into *my* kernel or just -mm. There's also been several versions and discussions, so I'd really like to have somebody send me a final patch with all the acks etc.. One that is clearly for me, not for -mm. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [5/6] x86: kgdb support
* Sam Ravnborg <[EMAIL PROTECTED]> wrote: > > config X86_64 > > def_bool 64BIT > > + select KGDB_ARCH_HAS_SHADOW_INFO > > > > ### Arch settings > > config X86 > > @@ -139,6 +140,9 @@ config AUDIT_ARCH > > config ARCH_SUPPORTS_AOUT > > def_bool y > > > > +config ARCH_SUPPORTS_KGDB > > + def_bool y > > + > > Please use the documented HAVE_ approach and not this ugly "one > variable per arch" idiom. This was also commented last time the > patchset were posted. hm, i wasnt Cc:-ed to that so i didnt read it yet. I have just followed the logical ARHC_SUPPORTS_* idiom which reads more naturally than HAVE_ARCH_*. But ... no string feelings, changing it is easy enough, i renamed them and pushed out the new iteration to: git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-kgdb.git diffstat and shortlog did not change (just a rename of these config variables) - but i'm including them below for completeness of the submission. any other observations? Ingo --> Ingo Molnar (3): pids: add pid_max prototype uaccess: add probe_kernel_write() x86: kgdb support Jan Kiszka (1): consoles: polling support, kgdboc Jason Wessel (2): kgdb: core kgdb: document parameters Documentation/kernel-parameters.txt |5 + arch/x86/Kconfig|4 + arch/x86/kernel/Makefile|1 + arch/x86/kernel/kgdb.c | 550 ++ drivers/char/tty_io.c | 47 + drivers/serial/8250.c | 62 ++ drivers/serial/Kconfig |3 + drivers/serial/Makefile |1 + drivers/serial/kgdboc.c | 164 +++ drivers/serial/serial_core.c| 67 ++- include/asm-generic/kgdb.h | 93 ++ include/asm-x86/kgdb.h | 87 ++ include/linux/kgdb.h| 264 + include/linux/pid.h |2 + include/linux/serial_core.h |4 + include/linux/tty_driver.h | 12 + include/linux/uaccess.h | 22 + kernel/Makefile |1 + kernel/kgdb.c | 2020 +++ kernel/sysctl.c |2 +- lib/Kconfig.debug |2 + lib/Kconfig.kgdb| 37 + 22 files changed, 3448 insertions(+), 2 deletions(-) create mode 100644 arch/x86/kernel/kgdb.c create mode 100644 drivers/serial/kgdboc.c create mode 100644 include/asm-generic/kgdb.h create mode 100644 include/asm-x86/kgdb.h create mode 100644 include/linux/kgdb.h create mode 100644 kernel/kgdb.c create mode 100644 lib/Kconfig.kgdb -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [0/6] kgdb light
From: Ingo Molnar <[EMAIL PROTECTED]> Date: Sun, 10 Feb 2008 08:13:04 +0100 > this is the "kgdb light" tree that has been also posted at: > >http://lkml.org/lkml/2008/2/9/236 > > it is available at: > >git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-kgdb.git > > See the shortlog below. Thanks for keeping this work alive. > - removed the GTOD/clocksource hacks. If a user uses kdgb for extended >periods of time then GTOD clocksources can get out of sync and we >might fall back to other clocksources. That is the _right_ thing to >do for the kernel, hacking it around to avoid kernel messages was >wrong. I suspect something will however need to be done with watchdogs and things of that nature which will get very confused if the kernel sits in a breakpoint for a period of time whilst the user looks at things from the kgdb prompt. Just a heads up... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/6] pids: add pid_max prototype
On Sun, Feb 10, 2008 at 08:13:21AM +0100, Ingo Molnar wrote: > From: Ingo Molnar <[EMAIL PROTECTED]> > > add pid_max prototype - used by sysctl and will be used by kgdb as well. Looks good, and this should go in ASAP independent of kgdb. And while you're at it, I think all of the below want to find a suitable place in a header somewhere: > @@ -71,7 +72,6 @@ extern int max_threads; > extern int core_uses_pid; > extern int suid_dumpable; > extern char core_pattern[]; > -extern int pid_max; > extern int min_free_kbytes; > extern int pid_max_min, pid_max_max; > extern int sysctl_drop_caches; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3/6] kgdb: core
This still doesn't address a lot of the review comments from Jason's last posting. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [5/6] x86: kgdb support
On Sun, Feb 10, 2008 at 08:13:45AM +0100, Ingo Molnar wrote: > From: Ingo Molnar <[EMAIL PROTECTED]> > > simplified and streamlined kgdb support on x86, both 32-bit and 64-bit, > based on patch from: > > Subject: kgdb: core-lite > From: Jason Wessel <[EMAIL PROTECTED]> > > [ and countless other authors - see the patch for details. ] > > Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> > Reviewed-by: Thomas Gleixner <[EMAIL PROTECTED]> > --- > arch/x86/Kconfig |4 > arch/x86/kernel/Makefile |1 > arch/x86/kernel/kgdb.c | 550 > +++ > include/asm-x86/kgdb.h | 87 +++ > 4 files changed, 642 insertions(+) > > Index: linux-kgdb.q/arch/x86/Kconfig > === > --- linux-kgdb.q.orig/arch/x86/Kconfig > +++ linux-kgdb.q/arch/x86/Kconfig > @@ -14,6 +14,7 @@ config X86_32 > > config X86_64 > def_bool 64BIT > + select KGDB_ARCH_HAS_SHADOW_INFO > > ### Arch settings > config X86 > @@ -139,6 +140,9 @@ config AUDIT_ARCH > config ARCH_SUPPORTS_AOUT > def_bool y > > +config ARCH_SUPPORTS_KGDB > + def_bool y > + Please use the documented HAVE_ approach and not this ugly "one variable per arch" idiom. This was also commented last time the patchset were posted. Sam -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3/6] kgdb: core
On Sun, Feb 10, 2008 at 08:13:31AM +0100, Ingo Molnar wrote: > From: Jason Wessel <[EMAIL PROTECTED]> > > kgdb core code. Handles the protocol and the arch details. > > [ [EMAIL PROTECTED]: heavily modified, simplified and cleaned up. ] Hi Ingo. I see that only a very few of my comments posted yesterday got addressed. On purpose or did you miss them? Sam -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[18/19] ftrace: add ftrace_enabled sysctl to disable mcount function
From: Steven Rostedt <[EMAIL PROTECTED]> This patch adds back the sysctl ftrace_enabled. This time it is defaulted to on, if DYNAMIC_FTRACE is configured. When ftrace_enabled is disabled, the ftrace function is set to the stub return. If DYNAMIC_FTRACE is also configured, on ftrace_enabled = 0, the registered ftrace functions will all be set to jmps, but no more new calls to ftrace recording (used to find the ftrace calling sites) will be called. Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- include/linux/ftrace.h |6 ++ kernel/sysctl.c| 11 kernel/trace/ftrace.c | 125 + 3 files changed, 124 insertions(+), 18 deletions(-) Index: linux/include/linux/ftrace.h === --- linux.orig/include/linux/ftrace.h +++ linux/include/linux/ftrace.h @@ -5,6 +5,12 @@ #include +extern int ftrace_enabled; +extern int +ftrace_enable_sysctl(struct ctl_table *table, int write, +struct file *filp, void __user *buffer, size_t *lenp, +loff_t *ppos); + typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip); struct ftrace_ops { Index: linux/kernel/sysctl.c === --- linux.orig/kernel/sysctl.c +++ linux/kernel/sysctl.c @@ -45,6 +45,7 @@ #include #include #include +#include #include #include @@ -488,6 +489,16 @@ static struct ctl_table kern_table[] = { .mode = 0644, .proc_handler = _dointvec, }, +#ifdef CONFIG_FTRACE + { + .ctl_name = CTL_UNNUMBERED, + .procname = "ftrace_enabled", + .data = _enabled, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = _enable_sysctl, + }, +#endif #ifdef CONFIG_KMOD { .ctl_name = KERN_MODPROBE, Index: linux/kernel/trace/ftrace.c === --- linux.orig/kernel/trace/ftrace.c +++ linux/kernel/trace/ftrace.c @@ -20,12 +20,24 @@ #include #include #include +#include #include #include #include "trace.h" +#ifdef CONFIG_DYNAMIC_FTRACE +# define FTRACE_ENABLED_INIT 1 +#else +# define FTRACE_ENABLED_INIT 0 +#endif + +int ftrace_enabled = FTRACE_ENABLED_INIT; +static int last_ftrace_enabled = FTRACE_ENABLED_INIT; + static DEFINE_SPINLOCK(ftrace_lock); +static DEFINE_MUTEX(ftrace_sysctl_lock); + static struct ftrace_ops ftrace_list_end __read_mostly = { .func = ftrace_stub, @@ -78,14 +90,16 @@ static int notrace __register_ftrace_fun smp_wmb(); ftrace_list = ops; - /* -* For one func, simply call it directly. -* For more than one func, call the chain. -*/ - if (ops->next == _list_end) - ftrace_trace_function = ops->func; - else - ftrace_trace_function = ftrace_list_func; + if (ftrace_enabled) { + /* +* For one func, simply call it directly. +* For more than one func, call the chain. +*/ + if (ops->next == _list_end) + ftrace_trace_function = ops->func; + else + ftrace_trace_function = ftrace_list_func; + } spin_unlock(_lock); @@ -120,10 +134,12 @@ static int notrace __unregister_ftrace_f *p = (*p)->next; - /* If we only have one func left, then call that directly */ - if (ftrace_list == _list_end || - ftrace_list->next == _list_end) - ftrace_trace_function = ftrace_list->func; + if (ftrace_enabled) { + /* If we only have one func left, then call that directly */ + if (ftrace_list == _list_end || + ftrace_list->next == _list_end) + ftrace_trace_function = ftrace_list->func; + } out: spin_unlock(_lock); @@ -263,7 +279,8 @@ static void notrace ftrace_startup(void) goto out; __unregister_ftrace_function(_shutdown_ops); - ftrace_run_startup_code(); + if (ftrace_enabled) + ftrace_run_startup_code(); out: mutex_unlock(_lock); } @@ -275,13 +292,32 @@ static void notrace ftrace_shutdown(void if (ftraced_suspend) goto out; - ftrace_run_shutdown_code(); + if (ftrace_enabled) + ftrace_run_shutdown_code(); __register_ftrace_function(_shutdown_ops); out: mutex_unlock(_lock); } +static void notrace ftrace_startup_sysctl(void) +{ + mutex_lock(_lock); + /* ftraced_suspend is true if we want ftrace running */ + if (ftraced_suspend) +
[19/19] ftrace
[ uhm, i cannot count apparently :-) There's no 19th patch. ] -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[17/19] ftrace: dynamic enabling/disabling of function calls
From: Steven Rostedt <[EMAIL PROTECTED]> This patch adds a feature to dynamically replace the ftrace code with the jmps to allow a kernel with ftrace configured to run as fast as it can without it configured. The way this works, is on bootup, a ftrace function is registered to record the instruction pointer of all places that call the function. Later, a kthread is awoken once a second that performs a stop_machine, and replaces all the code that was called with a jmp over the call to ftrace. It only replaces what was found the previous time. e.g. call ftrace /* 5 bytes */ is replaced with jmp 3f /* jmp is 2 bytes and we jump 3 forward */ 3: When we want to enable ftrace for function tracing, the IP recording is removed, and stop_machine is called again to replace all the locations of that were recorded back to the call of ftrace. When it is disabled, we replace the code back to the jmp. Allocation is done by the kthread. If the ftrace recording function is called, and we don't have any record slots available, then we simply skip that call. Once a second a new page (if needed) is allocated for recording new ftrace function calls. A large batch is allocated at boot up to get most of the calls there. Because we do this via stop_machine, we don't have to worry about another CPU executing a ftrace call as we modify it. But we do need to worry about NMI's so all functions that might be called via nmi must be annotated with notrace_nmi. When this code is configured in, the NMI code will not call notrace. Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- arch/x86/kernel/Makefile |1 arch/x86/kernel/ftrace.c | 237 +++ include/linux/ftrace.h | 18 ++ kernel/trace/Kconfig | 17 ++ kernel/trace/ftrace.c| 356 ++- 5 files changed, 597 insertions(+), 32 deletions(-) Index: linux/arch/x86/kernel/Makefile === --- linux.orig/arch/x86/kernel/Makefile +++ linux/arch/x86/kernel/Makefile @@ -54,6 +54,7 @@ obj-$(CONFIG_X86_NUMAQ) += numaq_32.o obj-$(CONFIG_X86_SUMMIT_NUMA) += summit_32.o obj-$(CONFIG_X86_VSMP) += vsmp_64.o obj-$(CONFIG_KPROBES) += kprobes.o +obj-$(CONFIG_DYNAMIC_FTRACE) += ftrace.o obj-$(CONFIG_MODULES) += module_$(BITS).o obj-$(CONFIG_ACPI_SRAT)+= srat_32.o obj-$(CONFIG_EFI) += efi.o efi_$(BITS).o efi_stub_$(BITS).o Index: linux/arch/x86/kernel/ftrace.c === --- /dev/null +++ linux/arch/x86/kernel/ftrace.c @@ -0,0 +1,237 @@ +/* + * Code for replacing ftrace calls with jumps. + * + * Copyright (C) 2007-2008 Steven Rostedt <[EMAIL PROTECTED]> + * + * Thanks goes to Ingo Molnar, for suggesting the idea. + * Mathieu Desnoyers, for suggesting postponing the modifications. + * Arjan van de Ven, for keeping me straight, and explaining to me + * the dangers of modifying code on the run. + */ + +#include +#include +#include +#include +#include +#include + +#define CALL_BACK 5 + +#define JMPFWD 0x03eb + +static unsigned short ftrace_jmp = JMPFWD; + +struct ftrace_record { + struct dyn_ftrace rec; + int failed; +} __attribute__((packed)); + +struct ftrace_page { + struct ftrace_page *next; + int index; + struct ftrace_recordrecords[]; +} __attribute__((packed)); + +#define ENTRIES_PER_PAGE \ + ((PAGE_SIZE - sizeof(struct ftrace_page)) / sizeof(struct ftrace_record)) + +/* estimate from running different kernels */ +#define NR_TO_INIT 1 + +#define MCOUNT_ADDR ((long)()) + +union ftrace_code_union { + char code[5]; + struct { + char e8; + int offset; + } __attribute__((packed)); +}; + +static struct ftrace_page *ftrace_pages_start; +static struct ftrace_page *ftrace_pages; + +notrace struct dyn_ftrace *ftrace_alloc_shutdown_node(unsigned long ip) +{ + struct ftrace_record *rec; + unsigned short save; + + ip -= CALL_BACK; + save = *(short *)ip; + + /* If this was already converted, skip it */ + if (save == JMPFWD) + return NULL; + + if (ftrace_pages->index == ENTRIES_PER_PAGE) { + if (!ftrace_pages->next) + return NULL; + ftrace_pages = ftrace_pages->next; + } + + rec = _pages->records[ftrace_pages->index++]; + + return >rec; +} + +static int notrace +ftrace_modify_code(unsigned long ip, unsigned char *old_code, + unsigned char *new_code) +{ + unsigned short old = *(unsigned short *)old_code; + unsigned short new = *(unsigned short *)new_code; + unsigned short replaced; + int faulted = 0; + + /* +*
[16/19] ftrace: trace preempt off critical timings
From: Steven Rostedt <[EMAIL PROTECTED]> Add preempt off timings. A lot of kernel core code is taken from the RT patch latency trace that was written by Ingo Molnar. This adds "preemptoff" and "preemptirqsoff" to /debugfs/tracing/available_tracers Now instead of just tracing irqs off, preemption off can be selected to be recorded. When this is selected, it shares the same files as irqs off timings. One can either trace preemption off, irqs off, or one or the other off. By echoing "preemptoff" into /debugfs/tracing/current_tracer, recording of preempt off only is performed. "irqsoff" will only record the time irqs are disabled, but "preemptirqsoff" will take the total time irqs or preemption are disabled. Runtime switching of these options is now supported by simpling echoing in the appropriate trace name into /debugfs/tracing/current_tracer. Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- arch/x86/kernel/process_32.c |3 include/linux/ftrace.h |8 + include/linux/irqflags.h |3 include/linux/preempt.h |2 kernel/sched.c | 24 + kernel/trace/Kconfig | 25 + kernel/trace/Makefile|1 kernel/trace/trace_irqsoff.c | 184 +++ 8 files changed, 197 insertions(+), 53 deletions(-) Index: linux/arch/x86/kernel/process_32.c === --- linux.orig/arch/x86/kernel/process_32.c +++ linux/arch/x86/kernel/process_32.c @@ -207,7 +207,10 @@ void cpu_idle(void) play_dead(); __get_cpu_var(irq_stat).idle_timestamp = jiffies; + /* Don't trace irqs off for idle */ + stop_critical_timings(); idle(); + start_critical_timings(); } tick_nohz_restart_sched_tick(); preempt_enable_no_resched(); Index: linux/include/linux/ftrace.h === --- linux.orig/include/linux/ftrace.h +++ linux/include/linux/ftrace.h @@ -58,4 +58,12 @@ extern void mcount(void); # define time_hardirqs_off(a0, a1) do { } while (0) #endif +#ifdef CONFIG_PREEMPT_TRACER + extern void notrace trace_preempt_on(unsigned long a0, unsigned long a1); + extern void notrace trace_preempt_off(unsigned long a0, unsigned long a1); +#else +# define trace_preempt_on(a0, a1) do { } while (0) +# define trace_preempt_off(a0, a1) do { } while (0) +#endif + #endif /* _LINUX_FTRACE_H */ Index: linux/include/linux/irqflags.h === --- linux.orig/include/linux/irqflags.h +++ linux/include/linux/irqflags.h @@ -41,7 +41,8 @@ # define INIT_TRACE_IRQFLAGS #endif -#ifdef CONFIG_IRQSOFF_TRACER +#if defined(CONFIG_IRQSOFF_TRACER) || \ + defined(CONFIG_PREEMPT_TRACER) extern void stop_critical_timings(void); extern void start_critical_timings(void); #else Index: linux/include/linux/preempt.h === --- linux.orig/include/linux/preempt.h +++ linux/include/linux/preempt.h @@ -10,7 +10,7 @@ #include #include -#ifdef CONFIG_DEBUG_PREEMPT +#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER) extern void add_preempt_count(int val); extern void sub_preempt_count(int val); #else Index: linux/kernel/sched.c === --- linux.orig/kernel/sched.c +++ linux/kernel/sched.c @@ -66,6 +66,7 @@ #include #include #include +#include #include #include @@ -3772,26 +3773,44 @@ void scheduler_tick(void) #endif } -#if defined(CONFIG_PREEMPT) && defined(CONFIG_DEBUG_PREEMPT) +#if defined(CONFIG_PREEMPT) && (defined(CONFIG_DEBUG_PREEMPT) || \ + defined(CONFIG_PREEMPT_TRACER)) + +static inline unsigned long get_parent_ip(unsigned long addr) +{ + if (in_lock_functions(addr)) { + addr = CALLER_ADDR2; + if (in_lock_functions(addr)) + addr = CALLER_ADDR3; + } + return addr; +} void add_preempt_count(int val) { +#ifdef CONFIG_DEBUG_PREEMPT /* * Underflow? */ if (DEBUG_LOCKS_WARN_ON((preempt_count() < 0))) return; +#endif preempt_count() += val; +#ifdef CONFIG_DEBUG_PREEMPT /* * Spinlock count overflowing soon? */ DEBUG_LOCKS_WARN_ON((preempt_count() & PREEMPT_MASK) >= PREEMPT_MASK - 10); +#endif + if (preempt_count() == val) + trace_preempt_off(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1)); } EXPORT_SYMBOL(add_preempt_count); void sub_preempt_count(int val) { +#ifdef CONFIG_DEBUG_PREEMPT /* *
[15/19] ftrace: trace irq disabled critical timings
From: Steven Rostedt <[EMAIL PROTECTED]> This patch adds latency tracing for critical timings (how long interrupts are disabled for). "irqsoff" is added to /debugfs/tracing/available_tracers Note: tracing_max_latency also holds the max latency for irqsoff (in usecs). (default to large number so one must start latency tracing) tracing_thresh threshold (in usecs) to always print out if irqs off is detected to be longer than stated here. If irq_thresh is non-zero, then max_irq_latency is ignored. Here's an example of a trace with ftrace_enabled = 0 === preemption latency trace v1.1.5 on 2.6.24-rc7 latency: 100 us, #3/3, CPU#1 | (M:rt VP:0, KP:0, SP:0 HP:0 #P:2) - | task: swapper-0 (uid:0 nice:0 policy:0 rt_prio:0) - => started at: _spin_lock_irqsave+0x2a/0xb7 => ended at: _spin_unlock_irqrestore+0x32/0x5f _--=> CPU# / _-=> irqs-off | / _=> need-resched || / _---=> hardirq/softirq ||| / _--=> preempt-depth / | delay cmd pid | time | caller \ /| \ | / swapper-0 1d.s30us+: _spin_lock_irqsave+0x2a/0xb7 (e1000_update_stats+0x47/0x64c [e1000]) swapper-0 1d.s3 100us : _spin_unlock_irqrestore+0x32/0x5f (e1000_update_stats+0x641/0x64c [e1000]) swapper-0 1d.s3 100us : trace_hardirqs_on_caller+0x75/0x89 (_spin_unlock_irqrestore+0x32/0x5f) vim:ft=help === And this is a trace with ftrace_enabled == 1 === preemption latency trace v1.1.5 on 2.6.24-rc7 latency: 102 us, #12/12, CPU#1 | (M:rt VP:0, KP:0, SP:0 HP:0 #P:2) - | task: swapper-0 (uid:0 nice:0 policy:0 rt_prio:0) - => started at: _spin_lock_irqsave+0x2a/0xb7 => ended at: _spin_unlock_irqrestore+0x32/0x5f _--=> CPU# / _-=> irqs-off | / _=> need-resched || / _---=> hardirq/softirq ||| / _--=> preempt-depth / | delay cmd pid | time | caller \ /| \ | / swapper-0 1dNs30us+: _spin_lock_irqsave+0x2a/0xb7 (e1000_update_stats+0x47/0x64c [e1000]) swapper-0 1dNs3 46us : e1000_read_phy_reg+0x16/0x225 [e1000] (e1000_update_stats+0x5e2/0x64c [e1000]) swapper-0 1dNs3 46us : e1000_swfw_sync_acquire+0x10/0x99 [e1000] (e1000_read_phy_reg+0x49/0x225 [e1000]) swapper-0 1dNs3 46us : e1000_get_hw_eeprom_semaphore+0x12/0xa6 [e1000] (e1000_swfw_sync_acquire+0x36/0x99 [e1000]) swapper-0 1dNs3 47us : __const_udelay+0x9/0x47 (e1000_read_phy_reg+0x116/0x225 [e1000]) swapper-0 1dNs3 47us+: __delay+0x9/0x50 (__const_udelay+0x45/0x47) swapper-0 1dNs3 97us : preempt_schedule+0xc/0x84 (__delay+0x4e/0x50) swapper-0 1dNs3 98us : e1000_swfw_sync_release+0xc/0x55 [e1000] (e1000_read_phy_reg+0x211/0x225 [e1000]) swapper-0 1dNs3 99us+: e1000_put_hw_eeprom_semaphore+0x9/0x35 [e1000] (e1000_swfw_sync_release+0x50/0x55 [e1000]) swapper-0 1dNs3 101us : _spin_unlock_irqrestore+0xe/0x5f (e1000_update_stats+0x641/0x64c [e1000]) swapper-0 1dNs3 102us : _spin_unlock_irqrestore+0x32/0x5f (e1000_update_stats+0x641/0x64c [e1000]) swapper-0 1dNs3 102us : trace_hardirqs_on_caller+0x75/0x89 (_spin_unlock_irqrestore+0x32/0x5f) vim:ft=help === Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- arch/x86/kernel/process_64.c |3 arch/x86/lib/Makefile|1 arch/x86/lib/thunk_32.S | 47 + arch/x86/lib/thunk_64.S | 19 +- include/asm-x86/irqflags.h | 24 -- include/linux/ftrace.h |8 include/linux/irqflags.h | 12 + kernel/fork.c|2 kernel/lockdep.c | 23 ++ kernel/printk.c |2 kernel/trace/Kconfig | 18 + kernel/trace/Makefile|1 kernel/trace/trace_irqsoff.c | 402 +++ 13 files changed, 531 insertions(+), 31 deletions(-) Index: linux/arch/x86/kernel/process_64.c === --- linux.orig/arch/x86/kernel/process_64.c +++ linux/arch/x86/kernel/process_64.c @@ -189,7 +189,10 @@ void cpu_idle(void) */ local_irq_disable(); enter_idle(); + /* Don't trace irqs off for idle */ + stop_critical_timings(); idle(); + start_critical_timings(); /* In many cases the interrupt that ended idle has already called exit_idle.
[14/19] ftrace: tracer for scheduler wakeup latency
From: Steven Rostedt <[EMAIL PROTECTED]> This patch adds the tracer that tracks the wakeup latency of the highest priority waking task. "wakeup" is added to /debugfs/tracing/available_tracers Also added to /debugfs/tracing tracing_max_latency holds the current max latency for the wakeup wakeup_thresh if set to other than zero, a log will be recorded for every wakeup that takes longer than the number entered in here (usecs for all counters) (deletes previous trace) Examples: (with ftrace_enabled = 0) preemption latency trace v1.1.5 on 2.6.24-rc8 latency: 26 us, #2/2, CPU#1 | (M:rt VP:0, KP:0, SP:0 HP:0 #P:2) - | task: migration/0-3 (uid:0 nice:-5 policy:1 rt_prio:99) - _--=> CPU# / _-=> irqs-off | / _=> need-resched || / _---=> hardirq/softirq ||| / _--=> preempt-depth / | delay cmd pid | time | caller \ /| \ | / quilt-8551 0d..30us+: wake_up_process+0x15/0x17 (sched_exec+0xc9/0x100 ) quilt-8551 0d..4 26us : sched_switch_callback+0x73/0x81 (schedule+0x483/0x6d5 ) vim:ft=help (with ftrace_enabled = 1) preemption latency trace v1.1.5 on 2.6.24-rc8 latency: 36 us, #45/45, CPU#0 | (M:rt VP:0, KP:0, SP:0 HP:0 #P:2) - | task: migration/1-5 (uid:0 nice:-5 policy:1 rt_prio:99) - _--=> CPU# / _-=> irqs-off | / _=> need-resched || / _---=> hardirq/softirq ||| / _--=> preempt-depth / | delay cmd pid | time | caller \ /| \ | / bash-10653 1d..30us : wake_up_process+0x15/0x17 (sched_exec+0xc9/0x100 ) bash-10653 1d..31us : try_to_wake_up+0x271/0x2e7 (sub_preempt_count+0xc/0x7a ) bash-10653 1d..22us : try_to_wake_up+0x296/0x2e7 (update_rq_clock+0x9/0x20 ) bash-10653 1d..22us : update_rq_clock+0x1e/0x20 (__update_rq_clock+0xc/0x90 ) bash-10653 1d..23us : __update_rq_clock+0x1b/0x90 (sched_clock+0x9/0x29 ) bash-10653 1d..24us : try_to_wake_up+0x2a6/0x2e7 (activate_task+0xc/0x3f ) bash-10653 1d..24us : activate_task+0x2d/0x3f (enqueue_task+0xe/0x66 ) bash-10653 1d..25us : enqueue_task+0x5b/0x66 (enqueue_task_rt+0x9/0x3c ) bash-10653 1d..26us : try_to_wake_up+0x2ba/0x2e7 (check_preempt_wakeup+0x12/0x99 ) [...] bash-10653 1d..5 33us : tracing_record_cmdline+0xcf/0xd4 (_spin_unlock+0x9/0x33 ) bash-10653 1d..5 34us : _spin_unlock+0x19/0x33 (sub_preempt_count+0xc/0x7a ) bash-10653 1d..4 35us : wakeup_sched_switch+0x65/0x2ff (_spin_lock_irqsave+0xc/0xa9 ) bash-10653 1d..4 35us : _spin_lock_irqsave+0x19/0xa9 (add_preempt_count+0xe/0x77 ) bash-10653 1d..4 36us : sched_switch_callback+0x73/0x81 (schedule+0x483/0x6d5 ) vim:ft=help The [...] was added here to not waste your email box space. Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- include/linux/ftrace.h| 23 ++ kernel/trace/Kconfig | 13 + kernel/trace/Makefile |1 kernel/trace/trace_sched_wakeup.c | 310 ++ 4 files changed, 343 insertions(+), 4 deletions(-) Index: linux/include/linux/ftrace.h === --- linux.orig/include/linux/ftrace.h +++ linux/include/linux/ftrace.h @@ -5,10 +5,6 @@ #include -#define CALLER_ADDR0 ((unsigned long)__builtin_return_address(0)) -#define CALLER_ADDR1 ((unsigned long)__builtin_return_address(1)) -#define CALLER_ADDR2 ((unsigned long)__builtin_return_address(2)) - typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip); struct ftrace_ops { @@ -35,4 +31,23 @@ extern void mcount(void); # define unregister_ftrace_function(ops) do { } while (0) # define clear_ftrace_function(ops) do { } while (0) #endif /* CONFIG_FTRACE */ + + +#ifdef CONFIG_FRAME_POINTER +/* TODO: need to fix this for ARM */ +# define CALLER_ADDR0 ((unsigned long)__builtin_return_address(0)) +# define CALLER_ADDR1 ((unsigned long)__builtin_return_address(1)) +# define CALLER_ADDR2 ((unsigned long)__builtin_return_address(2)) +# define CALLER_ADDR3 ((unsigned long)__builtin_return_address(3)) +# define CALLER_ADDR4 ((unsigned long)__builtin_return_address(4)) +# define CALLER_ADDR5 ((unsigned long)__builtin_return_address(5)) +#else +# define CALLER_ADDR0 ((unsigned long)__builtin_return_address(0)) +# define CALLER_ADDR1 0UL +# define CALLER_ADDR2 0UL +#
[12/19] ftrace: function tracer
From: Steven Rostedt <[EMAIL PROTECTED]> This is a simple trace that uses the ftrace infrastructure. It is designed to be fast and small, and easy to use. It is useful to record things that happen over a very short period of time, and not to analyze the system in general. Updates: available_tracers "function" is added to this file. current_tracer To enable the function tracer: echo function > /debugfs/tracing/current_tracer To disable the tracer: echo disable > /debugfs/tracing/current_tracer The output of the function_trace file is as follows "echo noverbose > /debugfs/tracing/iter_ctrl" preemption latency trace v1.1.5 on 2.6.24-rc7-tst latency: 0 us, #419428/4361791, CPU#1 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4) - | task: -0 (uid:0 nice:0 policy:0 rt_prio:0) - _--=> CPU# / _-=> irqs-off | / _=> need-resched || / _---=> hardirq/softirq ||| / _--=> preempt-depth / | delay cmd pid | time | caller \ /| \ | / swapper-0 0d.h. 1595128us+: set_normalized_timespec+0x8/0x2d (ktime_get_ts+0x4a/0x4e ) swapper-0 0d.h. 1595131us+: _spin_lock+0x8/0x18 (hrtimer_interrupt+0x6e/0x1b0 ) Or with verbose turned on: "echo verbose > /debugfs/tracing/iter_ctrl" preemption latency trace v1.1.5 on 2.6.24-rc7-tst latency: 0 us, #419428/4361791, CPU#1 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4) - | task: -0 (uid:0 nice:0 policy:0 rt_prio:0) - swapper 0 0 9 [f3675f41] 1595.128ms (+0.003ms): set_normalized_timespec+0x8/0x2d (ktime_get_ts+0x4a/0x4e ) swapper 0 0 9 0001 [f3675f45] 1595.131ms (+0.003ms): _spin_lock+0x8/0x18 (hrtimer_interrupt+0x6e/0x1b0 ) swapper 0 0 9 0002 [f3675f48] 1595.135ms (+0.003ms): _spin_lock+0x8/0x18 (hrtimer_interrupt+0x6e/0x1b0 ) The "trace" file is not affected by the verbose mode, but is by the symonly. echo "nosymonly" > /debugfs/tracing/iter_ctrl tracer: [ 81.479967] CPU 0: bash:3154 register_ftrace_function+0x5f/0x66 <-- _spin_unlock_irqrestore+0xe/0x5a [ 81.479967] CPU 0: bash:3154 _spin_unlock_irqrestore+0x3e/0x5a <-- sub_preempt_count+0xc/0x7a [ 81.479968] CPU 0: bash:3154 sub_preempt_count+0x30/0x7a <-- in_lock_functions+0x9/0x24 [ 81.479968] CPU 0: bash:3154 vfs_write+0x11d/0x155 <-- dnotify_parent+0x12/0x78 [ 81.479968] CPU 0: bash:3154 dnotify_parent+0x2d/0x78 <-- _spin_lock+0xe/0x70 [ 81.479969] CPU 0: bash:3154 _spin_lock+0x1b/0x70 <-- add_preempt_count+0xe/0x77 [ 81.479969] CPU 0: bash:3154 add_preempt_count+0x3e/0x77 <-- in_lock_functions+0x9/0x24 echo "symonly" > /debugfs/tracing/iter_ctrl tracer: [ 81.479913] CPU 0: bash:3154 register_ftrace_function+0x5f/0x66 <-- _spin_unlock_irqrestore+0xe/0x5a [ 81.479913] CPU 0: bash:3154 _spin_unlock_irqrestore+0x3e/0x5a <-- sub_preempt_count+0xc/0x7a [ 81.479913] CPU 0: bash:3154 sub_preempt_count+0x30/0x7a <-- in_lock_functions+0x9/0x24 [ 81.479914] CPU 0: bash:3154 vfs_write+0x11d/0x155 <-- dnotify_parent+0x12/0x78 [ 81.479914] CPU 0: bash:3154 dnotify_parent+0x2d/0x78 <-- _spin_lock+0xe/0x70 [ 81.479914] CPU 0: bash:3154 _spin_lock+0x1b/0x70 <-- add_preempt_count+0xe/0x77 [ 81.479914] CPU 0: bash:3154 add_preempt_count+0x3e/0x77 <-- in_lock_functions+0x9/0x24 Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- kernel/trace/Kconfig | 13 +++ kernel/trace/Makefile |1 kernel/trace/trace_functions.c | 73 + 3 files changed, 87 insertions(+) Index: linux/kernel/trace/Kconfig === --- linux.orig/kernel/trace/Kconfig +++ linux/kernel/trace/Kconfig @@ -8,3 +8,16 @@ config TRACING bool select DEBUG_FS +config FTRACE + bool "Kernel Function Tracer" + depends on DEBUG_KERNEL && HAVE_FTRACE + select FRAME_POINTER + select TRACING + help + Enable the kernel to trace every kernel function. This is done + by using a compiler feature to insert a small, 5-byte No-Operation + instruction to the beginning of every kernel function, which NOP + sequence is then dynamically patched into a tracer call when + tracing is enabled by the administrator. If it's runtime disabled + (the bootup default), then the overhead of the instructions is very + small and not measurable even in micro-benchmarks. Index:
[13/19] ftrace: add tracing of context switches
From: Steven Rostedt <[EMAIL PROTECTED]> This patch adds context switch tracing, of the format of: _--=> CPU# / _-=> irqs-off | / _=> need-resched || / _---=> hardirq/softirq ||| / _--=> preempt-depth / | delay cmd pid | time | pid:prio:state \ /| \ | / swapper-0 1d..3137us+: 0:140:R --> 2912:120 sshd-2912 1d..3216us+: 2912:120:S --> 0:140 swapper-0 1d..3261us+: 0:140:R --> 2912:120 bash-2920 0d..3267us+: 2920:120:S --> 0:140 sshd-2912 1d..3330us!: 2912:120:S --> 0:140 swapper-0 1d..3 2389us+: 0:140:R --> 2847:120 yum-upda-2847 1d..3 2411us!: 2847:120:S --> 0:140 swapper-0 0d..3 11089us+: 0:140:R --> 3139:120 gdm-bina-3139 0d..3 3us!: 3139:120:S --> 0:140 swapper-0 1d..3 102328us+: 0:140:R --> 2847:120 yum-upda-2847 1d..3 102348us!: 2847:120:S --> 0:140 "sched_switch" is added to /debugfs/tracing/available_tracers Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Cc: Mathieu Desnoyers <[EMAIL PROTECTED]> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- kernel/trace/Kconfig | 11 +++ kernel/trace/Makefile |1 kernel/trace/trace_sched_switch.c | 125 ++ 3 files changed, 137 insertions(+) Index: linux/kernel/trace/Kconfig === --- linux.orig/kernel/trace/Kconfig +++ linux/kernel/trace/Kconfig @@ -13,6 +13,7 @@ config FTRACE depends on DEBUG_KERNEL && HAVE_FTRACE select FRAME_POINTER select TRACING + select CONTEXT_SWITCH_TRACER help Enable the kernel to trace every kernel function. This is done by using a compiler feature to insert a small, 5-byte No-Operation @@ -21,3 +22,13 @@ config FTRACE tracing is enabled by the administrator. If it's runtime disabled (the bootup default), then the overhead of the instructions is very small and not measurable even in micro-benchmarks. + +config CONTEXT_SWITCH_TRACER + bool "Trace process context switches" + depends on DEBUG_KERNEL + select TRACING + select MARKERS + help + This tracer gets called from the context switch and records + all switching of tasks. + Index: linux/kernel/trace/Makefile === --- linux.orig/kernel/trace/Makefile +++ linux/kernel/trace/Makefile @@ -1,6 +1,7 @@ obj-$(CONFIG_FTRACE) += libftrace.o obj-$(CONFIG_TRACING) += trace.o +obj-$(CONFIG_CONTEXT_SWITCH_TRACER) += trace_sched_switch.o obj-$(CONFIG_FTRACE) += trace_functions.o libftrace-y := ftrace.o Index: linux/kernel/trace/trace_sched_switch.c === --- /dev/null +++ linux/kernel/trace/trace_sched_switch.c @@ -0,0 +1,125 @@ +/* + * trace context switch + * + * Copyright (C) 2007 Steven Rostedt <[EMAIL PROTECTED]> + * + */ +#include +#include +#include +#include +#include +#include +#include + +#include "trace.h" + +static struct trace_array *ctx_trace; +static int __read_mostly tracer_enabled; +int __read_mostly tracing_sched_switch_enabled; + +static void notrace +ctx_switch_func(struct task_struct *prev, struct task_struct *next) +{ + struct trace_array *tr = ctx_trace; + struct trace_array_cpu *data; + unsigned long flags; + long disabled; + int cpu; + + if (!tracer_enabled) + return; + + raw_local_irq_save(flags); + cpu = raw_smp_processor_id(); + data = tr->data[cpu]; + disabled = atomic_inc_return(>disabled); + + if (likely(disabled == 1)) + tracing_sched_switch_trace(tr, data, prev, next, flags); + + atomic_dec(>disabled); + raw_local_irq_restore(flags); +} + +void ftrace_ctx_switch(struct task_struct *prev, struct task_struct *next) +{ + tracing_record_cmdline(prev); + + /* +* If tracer_switch_func only points to the local +* switch func, it still needs the ptr passed to it. +*/ + ctx_switch_func(prev, next); + + /* +* Chain to the wakeup tracer (this is a NOP if disabled): +*/ + wakeup_sched_switch(prev, next); +} + +static notrace void sched_switch_reset(struct trace_array *tr) +{ + int cpu; + + tr->time_start = now(tr->cpu); + + for_each_online_cpu(cpu) + tracing_reset(tr->data[cpu]); +} + +static notrace void start_sched_trace(struct trace_array *tr) +{ + sched_switch_reset(tr); + tracer_enabled = 1; +} + +static notrace void stop_sched_trace(struct trace_array *tr) +{ + tracer_enabled = 0; +} + +static notrace void sched_switch_trace_init(struct
[11/19] ftrace: latency tracer infrastructure
From: Steven Rostedt <[EMAIL PROTECTED]> This patch adds the latency tracer infrastructure. This patch does not add anything that will select and turn it on, but will be used by later patches. If it were to be compiled, it would add the following files to the debugfs: The root tracing directory: /debugfs/tracing/ This patch also adds the following files: available_tracers list of available tracers. Currently no tracers are available. Looking into this file only shows "none" which is used to unregister all tracers. current_tracer The trace that is currently active. Empty on start up. To switch to a tracer simply echo one of the tracers that are listed in available_tracers: example: (used with later patches) echo function > /debugfs/tracing/current_tracer To disable the tracer: echo disable > /debugfs/tracing/current_tracer tracing_enabled echoing "1" into this file starts the ftrace function tracing (if sysctl kernel.ftrace_enabled=1) echoing "0" turns it off. latency_trace This file is readonly and holds the result of the trace. trace This file outputs a easier to read version of the trace. iter_ctrl Controls the way the output of traces look. So far there's two controls: echoing in "symonly" will only show the kallsyms variables without the addresses (if kallsyms was configured) echoing in "verbose" will change the output to show a lot more data, but not very easy to understand by humans. echoing in "nosymonly" turns off symonly. echoing in "noverbose" turns off verbose. Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- kernel/Makefile |1 kernel/trace/Kconfig |5 kernel/trace/Makefile |2 kernel/trace/trace.c | 1547 ++ kernel/trace/trace.h | 185 + 5 files changed, 1740 insertions(+) Index: linux/kernel/Makefile === --- linux.orig/kernel/Makefile +++ linux/kernel/Makefile @@ -69,6 +69,7 @@ obj-$(CONFIG_TASKSTATS) += taskstats.o t obj-$(CONFIG_MARKERS) += marker.o obj-$(CONFIG_LATENCYTOP) += latencytop.o obj-$(CONFIG_FTRACE) += trace/ +obj-$(CONFIG_TRACING) += trace/ ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y) # According to Alan Modra <[EMAIL PROTECTED]>, the -fno-omit-frame-pointer is Index: linux/kernel/trace/Kconfig === --- linux.orig/kernel/trace/Kconfig +++ linux/kernel/trace/Kconfig @@ -3,3 +3,8 @@ # config HAVE_FTRACE bool + +config TRACING + bool + select DEBUG_FS + Index: linux/kernel/trace/Makefile === --- linux.orig/kernel/trace/Makefile +++ linux/kernel/trace/Makefile @@ -1,3 +1,5 @@ obj-$(CONFIG_FTRACE) += libftrace.o +obj-$(CONFIG_TRACING) += trace.o + libftrace-y := ftrace.o Index: linux/kernel/trace/trace.c === --- /dev/null +++ linux/kernel/trace/trace.c @@ -0,0 +1,1547 @@ +/* + * ring buffer based function tracer + * + * Copyright (C) 2007-2008 Steven Rostedt <[EMAIL PROTECTED]> + * Copyright (C) 2008 Ingo Molnar <[EMAIL PROTECTED]> + * + * Originally taken from the RT patch by: + *Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> + * + * Based on code from the latency_tracer, that is: + * Copyright (C) 2004-2006 Ingo Molnar + * Copyright (C) 2004 William Lee Irwin III + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "trace.h" + +unsigned long __read_mostlytracing_max_latency = (cycle_t)ULONG_MAX; +unsigned long __read_mostlytracing_thresh; + +static long notrace +ns2usecs(cycle_t nsec) +{ + nsec += 500; + do_div(nsec, 1000); + return nsec; +} + +static atomic_ttracer_counter; +static struct trace_array global_trace; + +static DEFINE_PER_CPU(struct trace_array_cpu, global_trace_cpu); + +static struct trace_array max_tr; + +static DEFINE_PER_CPU(struct trace_array_cpu, max_data); + +static int tracer_enabled; +static unsigned long trace_nr_entries = 4096UL; + +static struct tracer *trace_types __read_mostly; +static struct tracer *current_trace __read_mostly; +static int max_tracer_type_len; + +static DEFINE_MUTEX(trace_types_lock); + +static int __init set_nr_entries(char *str) +{ + if (!str) + return 0; + trace_nr_entries = simple_strtoul(str, , 0); + return 1; +} +__setup("trace_entries=", set_nr_entries); + +enum trace_type { +
[10/19] ftrace: add basic support for gcc profiler instrumentation
From: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> If CONFIG_FTRACE is selected and /proc/sys/kernel/ftrace_enabled is set to a non-zero value the ftrace routine will be called everytime we enter a kernel function that is not marked with the "notrace" attribute. The ftrace routine will then call a registered function if a function happens to be registered. [ This code has been highly hacked by Steven Rostedt and Ingo Molnar, so don't blame Arnaldo for all of this ;-) ] Update: It is now possible to register more than one ftrace function. If only one ftrace function is registered, that will be the function that ftrace calls directly. If more than one function is registered, then ftrace will call a function that will loop through the functions to call. Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- Makefile |3 arch/x86/Kconfig |1 arch/x86/kernel/entry_32.S | 27 arch/x86/kernel/entry_64.S | 37 include/linux/ftrace.h | 38 kernel/Makefile|1 kernel/trace/Kconfig |5 + kernel/trace/Makefile |3 kernel/trace/ftrace.c | 138 + lib/Kconfig.debug |2 10 files changed, 255 insertions(+) Index: linux/Makefile === --- linux.orig/Makefile +++ linux/Makefile @@ -509,6 +509,9 @@ endif include $(srctree)/arch/$(SRCARCH)/Makefile +ifdef CONFIG_FTRACE +KBUILD_CFLAGS += -pg +endif ifdef CONFIG_FRAME_POINTER KBUILD_CFLAGS += -fno-omit-frame-pointer -fno-optimize-sibling-calls else Index: linux/arch/x86/Kconfig === --- linux.orig/arch/x86/Kconfig +++ linux/arch/x86/Kconfig @@ -20,6 +20,7 @@ config X86 def_bool y select HAVE_OPROFILE select HAVE_KPROBES + select HAVE_FTRACE config GENERIC_LOCKBREAK def_bool n Index: linux/arch/x86/kernel/entry_32.S === --- linux.orig/arch/x86/kernel/entry_32.S +++ linux/arch/x86/kernel/entry_32.S @@ -75,6 +75,33 @@ DF_MASK = 0x0400 NT_MASK= 0x4000 VM_MASK= 0x0002 +#ifdef CONFIG_FTRACE +ENTRY(mcount) + cmpl $ftrace_stub, ftrace_trace_function + jnz trace + +.globl ftrace_stub +ftrace_stub: + ret + + /* taken from glibc */ +trace: + pushl %eax + pushl %ecx + pushl %edx + movl 0xc(%esp), %eax + movl 0x4(%ebp), %edx + + call *ftrace_trace_function + + popl %edx + popl %ecx + popl %eax + + jmp ftrace_stub +END(mcount) +#endif + #ifdef CONFIG_PREEMPT #define preempt_stop(clobbers) DISABLE_INTERRUPTS(clobbers); TRACE_IRQS_OFF #else Index: linux/arch/x86/kernel/entry_64.S === --- linux.orig/arch/x86/kernel/entry_64.S +++ linux/arch/x86/kernel/entry_64.S @@ -54,6 +54,43 @@ .code64 +#ifdef CONFIG_FTRACE +ENTRY(mcount) + cmpq $ftrace_stub, ftrace_trace_function + jnz trace +.globl ftrace_stub +ftrace_stub: + retq + +trace: + /* taken from glibc */ + subq $0x38, %rsp + movq %rax, (%rsp) + movq %rcx, 8(%rsp) + movq %rdx, 16(%rsp) + movq %rsi, 24(%rsp) + movq %rdi, 32(%rsp) + movq %r8, 40(%rsp) + movq %r9, 48(%rsp) + + movq 0x38(%rsp), %rdi + movq 8(%rbp), %rsi + + call *ftrace_trace_function + + movq 48(%rsp), %r9 + movq 40(%rsp), %r8 + movq 32(%rsp), %rdi + movq 24(%rsp), %rsi + movq 16(%rsp), %rdx + movq 8(%rsp), %rcx + movq (%rsp), %rax + addq $0x38, %rsp + + jmp ftrace_stub +END(mcount) +#endif + #ifndef CONFIG_PREEMPT #define retint_kernel retint_restore_args #endif Index: linux/include/linux/ftrace.h === --- /dev/null +++ linux/include/linux/ftrace.h @@ -0,0 +1,38 @@ +#ifndef _LINUX_FTRACE_H +#define _LINUX_FTRACE_H + +#ifdef CONFIG_FTRACE + +#include + +#define CALLER_ADDR0 ((unsigned long)__builtin_return_address(0)) +#define CALLER_ADDR1 ((unsigned long)__builtin_return_address(1)) +#define CALLER_ADDR2 ((unsigned long)__builtin_return_address(2)) + +typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip); + +struct ftrace_ops { + ftrace_func_t func; + struct ftrace_ops *next; +}; + +/* + * The ftrace_ops must be a static and should also + * be read_mostly. These functions do modify read_mostly variables + * so use them sparely. Never free an ftrace_op or modify the + * next pointer after it has been registered. Even after unregistering + * it, the next pointer may still be
[09/19] ftrace: add notrace annotations for NMI routines
From: Steven Rostedt <[EMAIL PROTECTED]> This annotates NMI functions with notrace. Some tracers may be able to live with this, but some cannot. The safest is to turn it off, it's not particularly interesting anyway. Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- arch/x86/kernel/nmi_32.c |3 ++- arch/x86/kernel/nmi_64.c |6 -- arch/x86/kernel/traps_32.c | 12 ++-- arch/x86/kernel/traps_64.c | 11 ++- 4 files changed, 18 insertions(+), 14 deletions(-) Index: linux/arch/x86/kernel/nmi_32.c === --- linux.orig/arch/x86/kernel/nmi_32.c +++ linux/arch/x86/kernel/nmi_32.c @@ -320,7 +320,8 @@ EXPORT_SYMBOL(touch_nmi_watchdog); extern void die_nmi(struct pt_regs *, const char *msg); -__kprobes int nmi_watchdog_tick(struct pt_regs * regs, unsigned reason) +notrace __kprobes int +nmi_watchdog_tick(struct pt_regs *regs, unsigned reason) { /* Index: linux/arch/x86/kernel/nmi_64.c === --- linux.orig/arch/x86/kernel/nmi_64.c +++ linux/arch/x86/kernel/nmi_64.c @@ -314,7 +314,8 @@ void touch_nmi_watchdog(void) } EXPORT_SYMBOL(touch_nmi_watchdog); -int __kprobes nmi_watchdog_tick(struct pt_regs * regs, unsigned reason) +notrace __kprobes int +nmi_watchdog_tick(struct pt_regs *regs, unsigned reason) { int sum; int touched = 0; @@ -385,7 +386,8 @@ int __kprobes nmi_watchdog_tick(struct p static unsigned ignore_nmis; -asmlinkage __kprobes void do_nmi(struct pt_regs * regs, long error_code) +asmlinkage notrace __kprobes void +do_nmi(struct pt_regs *regs, long error_code) { nmi_enter(); add_pda(__nmi_count,1); Index: linux/arch/x86/kernel/traps_32.c === --- linux.orig/arch/x86/kernel/traps_32.c +++ linux/arch/x86/kernel/traps_32.c @@ -665,7 +665,7 @@ gp_in_kernel: } } -static __kprobes void +static notrace __kprobes void mem_parity_error(unsigned char reason, struct pt_regs * regs) { printk(KERN_EMERG "Uhhuh. NMI received for unknown reason %02x on " @@ -688,7 +688,7 @@ mem_parity_error(unsigned char reason, s clear_mem_error(reason); } -static __kprobes void +static notrace __kprobes void io_check_error(unsigned char reason, struct pt_regs * regs) { unsigned long i; @@ -705,7 +705,7 @@ io_check_error(unsigned char reason, str outb(reason, 0x61); } -static __kprobes void +static notrace __kprobes void unknown_nmi_error(unsigned char reason, struct pt_regs * regs) { #ifdef CONFIG_MCA @@ -727,7 +727,7 @@ unknown_nmi_error(unsigned char reason, static DEFINE_SPINLOCK(nmi_print_lock); -void __kprobes die_nmi(struct pt_regs *regs, const char *msg) +void notrace __kprobes die_nmi(struct pt_regs *regs, const char *msg) { if (notify_die(DIE_NMIWATCHDOG, msg, regs, 0, 2, SIGINT) == NOTIFY_STOP) @@ -758,7 +758,7 @@ void __kprobes die_nmi(struct pt_regs *r do_exit(SIGSEGV); } -static __kprobes void default_do_nmi(struct pt_regs * regs) +static notrace __kprobes void default_do_nmi(struct pt_regs *regs) { unsigned char reason = 0; @@ -798,7 +798,7 @@ static __kprobes void default_do_nmi(str static int ignore_nmis; -__kprobes void do_nmi(struct pt_regs * regs, long error_code) +notrace __kprobes void do_nmi(struct pt_regs *regs, long error_code) { int cpu; Index: linux/arch/x86/kernel/traps_64.c === --- linux.orig/arch/x86/kernel/traps_64.c +++ linux/arch/x86/kernel/traps_64.c @@ -598,7 +598,8 @@ void die(const char * str, struct pt_reg oops_end(flags, regs, SIGSEGV); } -void __kprobes die_nmi(char *str, struct pt_regs *regs, int do_panic) +notrace __kprobes void +die_nmi(char *str, struct pt_regs *regs, int do_panic) { unsigned long flags = oops_begin(); @@ -765,7 +766,7 @@ asmlinkage void __kprobes do_general_pro die("general protection fault", regs, error_code); } -static __kprobes void +static notrace __kprobes void mem_parity_error(unsigned char reason, struct pt_regs * regs) { printk(KERN_EMERG "Uhhuh. NMI received for unknown reason %02x.\n", @@ -789,7 +790,7 @@ mem_parity_error(unsigned char reason, s outb(reason, 0x61); } -static __kprobes void +static notrace __kprobes void io_check_error(unsigned char reason, struct pt_regs * regs) { printk("NMI: IOCK error (debug interrupt?)\n"); @@ -803,7 +804,7 @@ io_check_error(unsigned char reason, str outb(reason, 0x61); } -static __kprobes void +static notrace __kprobes void unknown_nmi_error(unsigned char reason, struct pt_regs * regs) { printk(KERN_EMERG "Uhhuh. NMI received for unknown reason %02x.\n", @@ -818,7 +819,7 @@ unknown_nmi_error(unsigned char reason, /* Runs on IST
[07/19] x86: add notrace annotations to vsyscall.
From: Steven Rostedt <[EMAIL PROTECTED]> Add the notrace annotations to the vsyscall functions - there we are not in kernel context yet, so the tracer function cannot (and must not) be called. Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- arch/x86/kernel/vsyscall_64.c |3 ++- arch/x86/vdso/vclock_gettime.c | 15 --- arch/x86/vdso/vgetcpu.c|3 ++- include/asm-x86/vsyscall.h |3 ++- 4 files changed, 14 insertions(+), 10 deletions(-) Index: linux/arch/x86/kernel/vsyscall_64.c === --- linux.orig/arch/x86/kernel/vsyscall_64.c +++ linux/arch/x86/kernel/vsyscall_64.c @@ -42,7 +42,8 @@ #include #include -#define __vsyscall(nr) __attribute__ ((unused,__section__(".vsyscall_" #nr))) +#define __vsyscall(nr) \ + __attribute__ ((unused, __section__(".vsyscall_" #nr))) notrace #define __syscall_clobber "r11","cx","memory" #define __pa_vsymbol(x)\ ({unsigned long v; \ Index: linux/arch/x86/vdso/vclock_gettime.c === --- linux.orig/arch/x86/vdso/vclock_gettime.c +++ linux/arch/x86/vdso/vclock_gettime.c @@ -23,7 +23,7 @@ #define gtod vdso_vsyscall_gtod_data -static long vdso_fallback_gettime(long clock, struct timespec *ts) +notrace static long vdso_fallback_gettime(long clock, struct timespec *ts) { long ret; asm("syscall" : "=a" (ret) : @@ -31,7 +31,7 @@ static long vdso_fallback_gettime(long c return ret; } -static inline long vgetns(void) +notrace static inline long vgetns(void) { long v; cycles_t (*vread)(void); @@ -40,7 +40,7 @@ static inline long vgetns(void) return (v * gtod->clock.mult) >> gtod->clock.shift; } -static noinline int do_realtime(struct timespec *ts) +notrace static noinline int do_realtime(struct timespec *ts) { unsigned long seq, ns; do { @@ -54,7 +54,8 @@ static noinline int do_realtime(struct t } /* Copy of the version in kernel/time.c which we cannot directly access */ -static void vset_normalized_timespec(struct timespec *ts, long sec, long nsec) +notrace static void +vset_normalized_timespec(struct timespec *ts, long sec, long nsec) { while (nsec >= NSEC_PER_SEC) { nsec -= NSEC_PER_SEC; @@ -68,7 +69,7 @@ static void vset_normalized_timespec(str ts->tv_nsec = nsec; } -static noinline int do_monotonic(struct timespec *ts) +notrace static noinline int do_monotonic(struct timespec *ts) { unsigned long seq, ns, secs; do { @@ -82,7 +83,7 @@ static noinline int do_monotonic(struct return 0; } -int __vdso_clock_gettime(clockid_t clock, struct timespec *ts) +notrace int __vdso_clock_gettime(clockid_t clock, struct timespec *ts) { if (likely(gtod->sysctl_enabled && gtod->clock.vread)) switch (clock) { @@ -96,7 +97,7 @@ int __vdso_clock_gettime(clockid_t clock int clock_gettime(clockid_t, struct timespec *) __attribute__((weak, alias("__vdso_clock_gettime"))); -int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz) +notrace int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz) { long ret; if (likely(gtod->sysctl_enabled && gtod->clock.vread)) { Index: linux/arch/x86/vdso/vgetcpu.c === --- linux.orig/arch/x86/vdso/vgetcpu.c +++ linux/arch/x86/vdso/vgetcpu.c @@ -13,7 +13,8 @@ #include #include "vextern.h" -long __vdso_getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *unused) +notrace long +__vdso_getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *unused) { unsigned int p; Index: linux/include/asm-x86/vsyscall.h === --- linux.orig/include/asm-x86/vsyscall.h +++ linux/include/asm-x86/vsyscall.h @@ -24,7 +24,8 @@ enum vsyscall_num { ((unused, __section__ (".vsyscall_gtod_data"),aligned(16))) #define __section_vsyscall_clock __attribute__ \ ((unused, __section__ (".vsyscall_clock"),aligned(16))) -#define __vsyscall_fn __attribute__ ((unused,__section__(".vsyscall_fn"))) +#define __vsyscall_fn \ + __attribute__ ((unused, __section__(".vsyscall_fn"))) notrace #define VGETCPU_RDTSCP 1 #define VGETCPU_LSL2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[08/19] ftrace: annotate core code that should not be traced
From: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> Mark with "notrace" functions in core code that should not be traced. The "notrace" attribute will prevent gcc from adding a call to ftrace on the annotated funtions. Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- lib/smp_processor_id.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux/lib/smp_processor_id.c === --- linux.orig/lib/smp_processor_id.c +++ linux/lib/smp_processor_id.c @@ -7,7 +7,7 @@ #include #include -unsigned int debug_smp_processor_id(void) +notrace unsigned int debug_smp_processor_id(void) { unsigned long preempt_count = preempt_count(); int this_cpu = raw_smp_processor_id(); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Change pci_raw_ops to pci_raw_read/write
On Sat, Feb 09, 2008 at 10:25:23PM -0800, Yinghai Lu wrote: > On Feb 9, 2008 4:41 AM, Matthew Wilcox <[EMAIL PROTECTED]> wrote: > > On Thu, Feb 07, 2008 at 10:54:05AM -0500, Tony Camuso wrote: > > > Matthew, > > > > > > Perhaps I missed it, but did you address Yinghai's concerns? > > > > No, I was on holiday. > > > > > Yinghai Lu wrote: > > > >On Jan 28, 2008 7:03 PM, Matthew Wilcox <[EMAIL PROTECTED]> wrote: > > > >> > > > >>-int pci_conf1_write(unsigned int seg, unsigned int bus, > > > >>+static int pci_conf1_write(unsigned int seg, unsigned int bus, > > > >> unsigned int devfn, int reg, int len, u32 > > > >> value) > > > > > > > >any reason to change pci_conf1_read/write to static? > > > > Yes -- it no longer needs to be called from outside this file. > > > > > >>+config ATA_RAM > > > >>+ tristate "ATA RAM driver" > > > >>+ > > > > > > > >related? > > > > looks good. it should get into -mm or x86/mm for some testing Can I get a revised version of this, without the incorrect hunk? thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[06/19] tracing: add notrace to linkage.h
From: Ingo Molnar <[EMAIL PROTECTED]> notrace signals that a function should not be traced. Most of the time this is used by tracers to annotate code that cannot be traced - it's in a volatile state (such as in user vdso context or NMI context) or it's in the tracer internals. Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- include/linux/linkage.h |2 ++ 1 file changed, 2 insertions(+) Index: linux/include/linux/linkage.h === --- linux.orig/include/linux/linkage.h +++ linux/include/linux/linkage.h @@ -3,6 +3,8 @@ #include +#define notrace __attribute__((no_instrument_function)) + #ifdef __cplusplus #define CPP_ASMLINKAGE extern "C" #else -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[05/19] ftrace: make the task state char-string visible to all
From: Steven Rostedt <[EMAIL PROTECTED]> The tracer wants to be able to convert the state number into a user visible character. This patch pulls that conversion string out the scheduler into the header. This way if it were to ever change, other parts of the kernel will know. Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- include/linux/sched.h |2 ++ kernel/sched.c|2 +- 2 files changed, 3 insertions(+), 1 deletion(-) Index: linux/include/linux/sched.h === --- linux.orig/include/linux/sched.h +++ linux/include/linux/sched.h @@ -2117,6 +2117,8 @@ static inline void migration_init(void) #define TASK_SIZE_OF(tsk) TASK_SIZE #endif +#define TASK_STATE_TO_CHAR_STR "RSDTtZX" + #endif /* __KERNEL__ */ #endif Index: linux/kernel/sched.c === --- linux.orig/kernel/sched.c +++ linux/kernel/sched.c @@ -5154,7 +5154,7 @@ out_unlock: return retval; } -static const char stat_nam[] = "RSDTtZX"; +static const char stat_nam[] = TASK_STATE_TO_CHAR_STR; void sched_show_task(struct task_struct *p) { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[04/19] ftrace: add preempt_enable/disable notrace macros
From: Steven Rostedt <[EMAIL PROTECTED]> The tracer may need to call preempt_enable and disable functions for time keeping and such. The trace gets ugly when we see these functions show up for all traces. To make the output cleaner this patch adds preempt_enable_notrace and preempt_disable_notrace to be used by tracer (and debugging) functions. Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- include/linux/preempt.h | 32 1 file changed, 32 insertions(+) Index: linux/include/linux/preempt.h === --- linux.orig/include/linux/preempt.h +++ linux/include/linux/preempt.h @@ -52,6 +52,34 @@ do { \ preempt_check_resched(); \ } while (0) +/* For debugging and tracer internals only! */ +#define add_preempt_count_notrace(val) \ + do { preempt_count() += (val); } while (0) +#define sub_preempt_count_notrace(val) \ + do { preempt_count() -= (val); } while (0) +#define inc_preempt_count_notrace() add_preempt_count_notrace(1) +#define dec_preempt_count_notrace() sub_preempt_count_notrace(1) + +#define preempt_disable_notrace() \ +do { \ + inc_preempt_count_notrace(); \ + barrier(); \ +} while (0) + +#define preempt_enable_no_resched_notrace() \ +do { \ + barrier(); \ + dec_preempt_count_notrace(); \ +} while (0) + +/* preempt_check_resched is OK to trace */ +#define preempt_enable_notrace() \ +do { \ + preempt_enable_no_resched_notrace(); \ + barrier(); \ + preempt_check_resched(); \ +} while (0) + #else #define preempt_disable() do { } while (0) @@ -59,6 +87,10 @@ do { \ #define preempt_enable() do { } while (0) #define preempt_check_resched()do { } while (0) +#define preempt_disable_notrace() do { } while (0) +#define preempt_enable_no_resched_notrace()do { } while (0) +#define preempt_enable_notrace() do { } while (0) + #endif #ifdef CONFIG_PREEMPT_NOTIFIERS -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[03/19] printk: dont wake up klogd with the rq locked
From: Steven Rostedt <[EMAIL PROTECTED]> It is not wise to place a printk where the runqueue lock is held. I just spent two hours debugging why some of my code was locking up, to find that the lockup was caused by some debugging printk's that I had in the scheduler. The printk's were only in rare paths so they shouldn't be too much of a problem, but after I hit the printk the system locked up. Thinking that it was locking up on my code I went looking down the wrong path. I finally found (after examining an NMI dump) that the lockup happened because printk was trying to wakeup the klogd daemon, which caused a deadlock when the try_to_wakeup code tries to grab the runqueue lock. This patch adds a runqueue_is_locked interface in sched.c for other files to see if the current runqueue lock is held. This is used in printk to determine whether it is safe or not to wake up the klogd. And with this patch, my code ran fine ;-) [ [EMAIL PROTECTED]: we also want this to be able to printk something in case the scheduler crashes. ] Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- include/linux/sched.h |2 ++ kernel/printk.c | 14 ++ kernel/sched.c| 18 ++ 3 files changed, 30 insertions(+), 4 deletions(-) Index: linux/include/linux/sched.h === --- linux.orig/include/linux/sched.h +++ linux/include/linux/sched.h @@ -245,6 +245,8 @@ extern void sched_init_smp(void); extern void init_idle(struct task_struct *idle, int cpu); extern void init_idle_bootup_task(struct task_struct *idle); +extern int runqueue_is_locked(void); + extern cpumask_t nohz_cpu_mask; #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ) extern int select_nohz_load_balancer(int cpu); Index: linux/kernel/printk.c === --- linux.orig/kernel/printk.c +++ linux/kernel/printk.c @@ -583,9 +583,11 @@ static int have_callable_console(void) * @fmt: format string * * This is printk(). It can be called from any context. We want it to work. - * Be aware of the fact that if oops_in_progress is not set, we might try to - * wake klogd up which could deadlock on runqueue lock if printk() is called - * from scheduler code. + * + * Note: if printk() is called with the runqueue lock held, it will not wake + * up the klogd. This is to avoid a deadlock from calling printk() in schedule + * with the runqueue lock held and having the wake_up grab the runqueue lock + * as well. * * We try to grab the console_sem. If we succeed, it's easy - we log the output and * call the console drivers. If we fail to get the semaphore we place the output @@ -994,7 +996,11 @@ void release_console_sem(void) console_locked = 0; up(_sem); spin_unlock_irqrestore(_lock, flags); - if (wake_klogd) + /* +* If we try to wake up klogd while printing with the runqueue lock +* held, this will deadlock. +*/ + if (wake_klogd && !runqueue_is_locked()) wake_up_klogd(); } EXPORT_SYMBOL(release_console_sem); Index: linux/kernel/sched.c === --- linux.orig/kernel/sched.c +++ linux/kernel/sched.c @@ -621,6 +621,24 @@ unsigned long rt_needs_cpu(int cpu) # define const_debug static const #endif +/** + * runqueue_is_locked + * + * Returns true if the current cpu runqueue is locked. + * This interface allows printk to be called with the runqueue lock + * held and know whether or not it is OK to wake up the klogd. + */ +int runqueue_is_locked(void) +{ + int cpu = get_cpu(); + struct rq *rq = cpu_rq(cpu); + int ret; + + ret = spin_is_locked(>lock); + put_cpu(); + return ret; +} + /* * Debugging: various feature bits */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: acpi dsts loading and populate_rootfs
On Sun, Feb 10, 2008 at 08:12:26AM +0100, Christoph Hellwig wrote: > Folks, moving this call around hidden behing in completely unreviewed > acpi junk is not acceptable. > > Either populate_rootfs _is_ safe to be called earlier and then we should > do it always or it's not. Either way such a change should be posted > separately and reviewd on lkml. > > Len, can you please revert "ACPI: basic initramfs DSDT override support" > aka commit 71fc47a9adf8ee89e5c96a47222915c5485ac437 until we've sorted > this out properly? Thanks. And while we're at it the file reading thing in there is utter crap aswell. You really should be using the firmware loader which works perfectly fine if you initramfs is set up for it. So please folks, back to the drawing board, do it properly and send it out to lkml for review please. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[01/19] rcu: add support for dynamic ticks and preempt rcu
From: Steven Rostedt <[EMAIL PROTECTED]> The PREEMPT-RCU can get stuck if a CPU goes idle and NO_HZ is set. The idle CPU will not progress the RCU through its grace period and a synchronize_rcu my get stuck. Without this patch I have a box that will not boot when PREEMPT_RCU and NO_HZ are set. That same box boots fine with this patch. This patch comes from the -rt kernel where it has been tested for several months. Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Signed-off-by: Paul E. McKenney <[EMAIL PROTECTED]> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- include/linux/hardirq.h| 10 ++ include/linux/rcuclassic.h |3 include/linux/rcupreempt.h | 22 kernel/rcupreempt.c| 224 - kernel/softirq.c |1 kernel/time/tick-sched.c |3 include/linux/hardirq.h| 10 ++ include/linux/rcuclassic.h |3 include/linux/rcupreempt.h | 22 kernel/rcupreempt.c| 224 - kernel/softirq.c |1 kernel/time/tick-sched.c |3 6 files changed, 259 insertions(+), 4 deletions(-) Index: linux/include/linux/hardirq.h === --- linux.orig/include/linux/hardirq.h +++ linux/include/linux/hardirq.h @@ -109,6 +109,14 @@ static inline void account_system_vtime( } #endif +#if defined(CONFIG_PREEMPT_RCU) && defined(CONFIG_NO_HZ) +extern void rcu_irq_enter(void); +extern void rcu_irq_exit(void); +#else +# define rcu_irq_enter() do { } while (0) +# define rcu_irq_exit() do { } while (0) +#endif /* CONFIG_PREEMPT_RCU */ + /* * It is safe to do non-atomic ops on ->hardirq_context, * because NMI handlers may not preempt and the ops are @@ -117,6 +125,7 @@ static inline void account_system_vtime( */ #define __irq_enter() \ do {\ + rcu_irq_enter();\ account_system_vtime(current); \ add_preempt_count(HARDIRQ_OFFSET); \ trace_hardirq_enter(); \ @@ -135,6 +144,7 @@ extern void irq_enter(void); trace_hardirq_exit(); \ account_system_vtime(current); \ sub_preempt_count(HARDIRQ_OFFSET); \ + rcu_irq_exit(); \ } while (0) /* Index: linux/include/linux/rcuclassic.h === --- linux.orig/include/linux/rcuclassic.h +++ linux/include/linux/rcuclassic.h @@ -160,5 +160,8 @@ extern void rcu_restart_cpu(int cpu); extern long rcu_batches_completed(void); extern long rcu_batches_completed_bh(void); +#define rcu_enter_nohz() do { } while (0) +#define rcu_exit_nohz()do { } while (0) + #endif /* __KERNEL__ */ #endif /* __LINUX_RCUCLASSIC_H */ Index: linux/include/linux/rcupreempt.h === --- linux.orig/include/linux/rcupreempt.h +++ linux/include/linux/rcupreempt.h @@ -82,5 +82,27 @@ extern struct rcupreempt_trace *rcupreem struct softirq_action; +#ifdef CONFIG_NO_HZ +DECLARE_PER_CPU(long, dynticks_progress_counter); + +static inline void rcu_enter_nohz(void) +{ + __get_cpu_var(dynticks_progress_counter)++; + WARN_ON(__get_cpu_var(dynticks_progress_counter) & 0x1); + mb(); +} + +static inline void rcu_exit_nohz(void) +{ + mb(); + __get_cpu_var(dynticks_progress_counter)++; + WARN_ON(!(__get_cpu_var(dynticks_progress_counter) & 0x1)); +} + +#else /* CONFIG_NO_HZ */ +#define rcu_enter_nohz() do { } while (0) +#define rcu_exit_nohz()do { } while (0) +#endif /* CONFIG_NO_HZ */ + #endif /* __KERNEL__ */ #endif /* __LINUX_RCUPREEMPT_H */ Index: linux/kernel/rcupreempt.c === --- linux.orig/kernel/rcupreempt.c +++ linux/kernel/rcupreempt.c @@ -23,6 +23,10 @@ * to Suparna Bhattacharya for pushing me completely away * from atomic instructions on the read side. * + * - Added handling of Dynamic Ticks + * Copyright 2007 - Paul E. Mckenney <[EMAIL PROTECTED]> + * - Steven Rostedt <[EMAIL PROTECTED]> + * * Papers: http://www.rdrop.com/users/paulmck/RCU * * Design Document: http://lwn.net/Articles/253651/ @@ -409,6 +413,212 @@ static void __rcu_advance_callbacks(stru } } +#ifdef CONFIG_NO_HZ + +DEFINE_PER_CPU(long, dynticks_progress_counter) = 1; +static DEFINE_PER_CPU(long, rcu_dyntick_snapshot); +static DEFINE_PER_CPU(int, rcu_update_flag); + +/** + * rcu_irq_enter - Called from Hard irq handlers and NMI/SMI. + * + * If the CPU was idle with dynamic ticks active, this updates the + * dynticks_progress_counter to let the RCU handling know that the +
[02/19] sched: add latency tracer callbacks to the scheduler
From: Ingo Molnar <[EMAIL PROTECTED]> add 3 lightweight callbacks to the tracer backend. zero impact if tracing is turned off. Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- include/linux/sched.h | 26 ++ kernel/sched.c|3 +++ 2 files changed, 29 insertions(+) Index: linux/include/linux/sched.h === --- linux.orig/include/linux/sched.h +++ linux/include/linux/sched.h @@ -2027,6 +2027,32 @@ extern int sched_mc_power_savings, sched extern void normalize_rt_tasks(void); +#ifdef CONFIG_CONTEXT_SWITCH_TRACER +extern void +ftrace_ctx_switch(struct task_struct *prev, struct task_struct *next); +#else +static inline void +ftrace_ctx_switch(struct task_struct *prev, struct task_struct *next) +{ +} +#endif + +#ifdef CONFIG_SCHED_TRACER +extern void +ftrace_wake_up_task(struct task_struct *wakee, struct task_struct *curr); +extern void +ftrace_wake_up_new_task(struct task_struct *wakee, struct task_struct *curr); +#else +static inline void +ftrace_wake_up_task(struct task_struct *wakee, struct task_struct *curr) +{ +} +static inline void +ftrace_wake_up_new_task(struct task_struct *wakee, struct task_struct *curr) +{ +} +#endif + #ifdef CONFIG_FAIR_GROUP_SCHED extern struct task_group init_task_group; Index: linux/kernel/sched.c === --- linux.orig/kernel/sched.c +++ linux/kernel/sched.c @@ -1867,6 +1867,7 @@ static int try_to_wake_up(struct task_st out_activate: #endif /* CONFIG_SMP */ + ftrace_wake_up_task(p, rq->curr); schedstat_inc(p, se.nr_wakeups); if (sync) schedstat_inc(p, se.nr_wakeups_sync); @@ -2007,6 +2008,7 @@ void wake_up_new_task(struct task_struct p->sched_class->task_new(rq, p); inc_nr_running(rq); } + ftrace_wake_up_new_task(p, rq->curr); check_preempt_curr(rq, p); #ifdef CONFIG_SMP if (p->sched_class->task_wake_up) @@ -2179,6 +2181,7 @@ context_switch(struct rq *rq, struct tas struct mm_struct *mm, *oldmm; prepare_task_switch(rq, prev, next); + ftrace_ctx_switch(prev, next); mm = next->mm; oldmm = prev->active_mm; /* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[00/19] latency tracer
this is the latency tracer that has been also posted at: http://lkml.org/lkml/2008/2/8/435 http://lkml.org/lkml/2008/2/9/127 the tree can be pulled from: git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git about 9 iterations of this have been posted to lkml in the past month, this is the most recent iteration. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
KVM is not seen under X86 config with latest git (32 bit compile)
The KVM configuration is no longer visible in the latest git tree. It looks like it is selected by HAVE_SETUP_PER_CPU_AREA. I've moved HAVE_KVM to under CONFIG_X86. Hopefully, this is the right fix. Comments? Signed-off-by: Balbir Singh <[EMAIL PROTECTED]> --- arch/x86/Kconfig |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff -puN arch/x86/Kconfig~fix-kvm-build arch/x86/Kconfig --- linux-2.6-git/arch/x86/Kconfig~fix-kvm-build2008-02-10 12:41:18.0 +0530 +++ linux-2.6-git-balbir/arch/x86/Kconfig 2008-02-10 12:41:37.0 +0530 @@ -20,6 +20,8 @@ config X86 def_bool y select HAVE_OPROFILE select HAVE_KPROBES + select HAVE_KVM + config GENERIC_LOCKBREAK def_bool n @@ -108,8 +110,6 @@ config GENERIC_TIME_VSYSCALL config HAVE_SETUP_PER_CPU_AREA def_bool X86_64 -select HAVE_KVM - config ARCH_HIBERNATION_POSSIBLE def_bool y depends on !SMP || !X86_VOYAGER _ -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[5/6] x86: kgdb support
From: Ingo Molnar <[EMAIL PROTECTED]> simplified and streamlined kgdb support on x86, both 32-bit and 64-bit, based on patch from: Subject: kgdb: core-lite From: Jason Wessel <[EMAIL PROTECTED]> [ and countless other authors - see the patch for details. ] Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> Reviewed-by: Thomas Gleixner <[EMAIL PROTECTED]> --- arch/x86/Kconfig |4 arch/x86/kernel/Makefile |1 arch/x86/kernel/kgdb.c | 550 +++ include/asm-x86/kgdb.h | 87 +++ 4 files changed, 642 insertions(+) Index: linux-kgdb.q/arch/x86/Kconfig === --- linux-kgdb.q.orig/arch/x86/Kconfig +++ linux-kgdb.q/arch/x86/Kconfig @@ -14,6 +14,7 @@ config X86_32 config X86_64 def_bool 64BIT + select KGDB_ARCH_HAS_SHADOW_INFO ### Arch settings config X86 @@ -139,6 +140,9 @@ config AUDIT_ARCH config ARCH_SUPPORTS_AOUT def_bool y +config ARCH_SUPPORTS_KGDB + def_bool y + # Use the generic interrupt handling code in kernel/irq/: config GENERIC_HARDIRQS bool Index: linux-kgdb.q/arch/x86/kernel/Makefile === --- linux-kgdb.q.orig/arch/x86/kernel/Makefile +++ linux-kgdb.q/arch/x86/kernel/Makefile @@ -58,6 +58,7 @@ obj-$(CONFIG_MODULES) += module_$(BITS) obj-$(CONFIG_ACPI_SRAT)+= srat_32.o obj-$(CONFIG_EFI) += efi.o efi_$(BITS).o efi_stub_$(BITS).o obj-$(CONFIG_DOUBLEFAULT) += doublefault_32.o +obj-$(CONFIG_KGDB) += kgdb.o obj-$(CONFIG_VM86) += vm86_32.o obj-$(CONFIG_EARLY_PRINTK) += early_printk.o Index: linux-kgdb.q/arch/x86/kernel/kgdb.c === --- /dev/null +++ linux-kgdb.q/arch/x86/kernel/kgdb.c @@ -0,0 +1,550 @@ +/* + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2, or (at your option) any + * later version. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + */ + +/* + * Copyright (C) 2004 Amit S. Kale <[EMAIL PROTECTED]> + * Copyright (C) 2000-2001 VERITAS Software Corporation. + * Copyright (C) 2002 Andi Kleen, SuSE Labs + * Copyright (C) 2004 LinSysSoft Technologies Pvt. Ltd. + * Copyright (C) 2007 MontaVista Software, Inc. + * Copyright (C) 2007-2008 Jason Wessel, Wind River Systems, Inc. + */ +/ + * Contributor: Lake Stevens Instrument Division$ + * Written by: Glenn Engel $ + * Updated by: Amit Kale<[EMAIL PROTECTED]> + * Updated by: Tom Rini <[EMAIL PROTECTED]> + * Updated by: Jason Wessel <[EMAIL PROTECTED]> + * Modified for 386 by Jim Kingdon, Cygnus Support. + * Origianl kgdb, compatibility with 2.1.xx kernel by + * David Grothe <[EMAIL PROTECTED]> + * Integrated into 2.2.5 kernel by Tigran Aivazian <[EMAIL PROTECTED]> + * X86_64 changes from Andi Kleen's patch merged by Jim Houston + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +#ifdef CONFIG_X86_32 +# include +#else +# include +#endif + +/* + * Put the error code here just in case the user cares: + */ +static int gdb_x86errcode; + +/* + * Likewise, the vector number here (since GDB only gets the signal + * number through the usual means, and that's not very specific): + */ +static int gdb_x86vector = -1; + +void pt_regs_to_gdb_regs(unsigned long *gdb_regs, struct pt_regs *regs) +{ + gdb_regs[GDB_AX]= regs->ax; + gdb_regs[GDB_BX]= regs->bx; + gdb_regs[GDB_CX]= regs->cx; + gdb_regs[GDB_DX]= regs->dx; + gdb_regs[GDB_SI]= regs->si; + gdb_regs[GDB_DI]= regs->di; + gdb_regs[GDB_BP]= regs->bp; + gdb_regs[GDB_PS]= regs->flags; + gdb_regs[GDB_PC]= regs->ip; +#ifdef CONFIG_X86_32 + gdb_regs[GDB_DS]= regs->ds; + gdb_regs[GDB_ES]= regs->es; + gdb_regs[GDB_CS]= regs->cs; + gdb_regs[GDB_SS]= __KERNEL_DS; + gdb_regs[GDB_FS]= 0x; + gdb_regs[GDB_GS]= 0x; +#else + gdb_regs[GDB_R8]= regs->r8; + gdb_regs[GDB_R9]= regs->r9; + gdb_regs[GDB_R10] = regs->r10; + gdb_regs[GDB_R11] = regs->r11; + gdb_regs[GDB_R12] = regs->r12; + gdb_regs[GDB_R13] = regs->r13; + gdb_regs[GDB_R14] = regs->r14; + gdb_regs[GDB_R15] = regs->r15;
[6/6] kgdb: document parameters
From: Jason Wessel <[EMAIL PROTECTED]> document the kgdboc module/boot parameter. Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- Documentation/kernel-parameters.txt |5 + 1 file changed, 5 insertions(+) Index: linux-kgdb.q/Documentation/kernel-parameters.txt === --- linux-kgdb.q.orig/Documentation/kernel-parameters.txt +++ linux-kgdb.q/Documentation/kernel-parameters.txt @@ -930,6 +930,11 @@ and is between 256 and 4096 characters. kstack=N[X86-32,X86-64] Print N words from the kernel stack in oops dumps. + kgdboc= [HW] kgdb over consoles. + Requires a tty driver that supports console polling. + (only serial suported for now) + Format: [,baud] + l2cr= [PPC] lapic [X86-32,APIC] Enable the local APIC even if BIOS -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[4/6] consoles: polling support, kgdboc
From: Jan Kiszka <[EMAIL PROTECTED]> polled console handling support, to access a console in an irq-less way while in debug or irq context. absolutely zero impact as long as CONFIG_CONSOLE_POLL is disabled. (which is the default) kgdb over consoles support from: Jason Wessel <[EMAIL PROTECTED]> [ [EMAIL PROTECTED]: redesign, splitups, cleanups. ] Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> Reviewed-by: Thomas Gleixner <[EMAIL PROTECTED]> --- drivers/char/tty_io.c| 47 drivers/serial/8250.c| 62 drivers/serial/Kconfig |3 drivers/serial/Makefile |1 drivers/serial/kgdboc.c | 164 +++ drivers/serial/serial_core.c | 67 + include/linux/serial_core.h |4 + include/linux/tty_driver.h | 12 +++ 8 files changed, 359 insertions(+), 1 deletion(-) Index: linux-kgdb.q/drivers/char/tty_io.c === --- linux-kgdb.q.orig/drivers/char/tty_io.c +++ linux-kgdb.q/drivers/char/tty_io.c @@ -1155,6 +1155,48 @@ static struct tty_driver *get_tty_driver return NULL; } +#ifdef CONFIG_CONSOLE_POLL + +/** + * tty_find_polling_driver - find device of a polled tty + * @name: name string to match + * @line: pointer to resulting tty line nr + * + * This routine returns a tty driver structure, given a name + * and the condition that the tty driver is capable of polled + * operation. + */ +struct tty_driver *tty_find_polling_driver(char *name, int *line) +{ + struct tty_driver *p, *res = NULL; + int tty_line = 0; + char *str; + + mutex_lock(_mutex); + /* Search through the tty devices to look for a match */ + list_for_each_entry(p, _drivers, tty_drivers) { + str = name + strlen(p->name); + tty_line = simple_strtoul(str, , 10); + if (*str == ',') + str++; + if (*str == '\0') + str = 0; + + if (tty_line >= 0 && tty_line <= p->num && p->poll_init && + !p->poll_init(p, tty_line, str)) { + + res = p; + *line = tty_line; + break; + } + } + mutex_unlock(_mutex); + + return res; +} +EXPORT_SYMBOL_GPL(tty_find_polling_driver); +#endif + /** * tty_check_change- check for POSIX terminal changes * @tty: tty to check @@ -3850,6 +3892,11 @@ void tty_set_operations(struct tty_drive driver->write_proc = op->write_proc; driver->tiocmget = op->tiocmget; driver->tiocmset = op->tiocmset; +#ifdef CONFIG_CONSOLE_POLL + driver->poll_init = op->poll_init; + driver->poll_get_char = op->poll_get_char; + driver->poll_put_char = op->poll_put_char; +#endif } Index: linux-kgdb.q/drivers/serial/8250.c === --- linux-kgdb.q.orig/drivers/serial/8250.c +++ linux-kgdb.q/drivers/serial/8250.c @@ -1740,6 +1740,64 @@ static inline void wait_for_xmitr(struct } } +#ifdef CONFIG_CONSOLE_POLL +/* + * Console polling routines for writing and reading from the uart while + * in an interrupt or debug context. + */ + +static int serial8250_get_poll_char(struct uart_port *port) +{ + struct uart_8250_port *up = (struct uart_8250_port *)port; + unsigned char lsr = serial_inp(up, UART_LSR); + + while (!(lsr & UART_LSR_DR)) + lsr = serial_inp(up, UART_LSR); + + return serial_inp(up, UART_RX); +} + + +static void serial8250_put_poll_char(struct uart_port *port, +unsigned char c) +{ + unsigned int ier; + struct uart_8250_port *up = (struct uart_8250_port *)port; + + /* +* First save the IER then disable the interrupts +*/ + ier = serial_in(up, UART_IER); +#ifdef UART_CAP_UUE + if (up->capabilities & UART_CAP_UUE) +#else + if (up->port.type == PORT_XSCALE) +#endif + serial_out(up, UART_IER, UART_IER_UUE); + else + serial_out(up, UART_IER, 0); + + wait_for_xmitr(up, BOTH_EMPTY); + /* +* Send the character out. +* If a LF, also do CR... +*/ + serial_out(up, UART_TX, c); + if (c == 10) { + wait_for_xmitr(up, BOTH_EMPTY); + serial_out(up, UART_TX, 13); + } + + /* +* Finally, wait for transmitter to become empty +* and restore the IER +*/ + wait_for_xmitr(up, BOTH_EMPTY); + serial_out(up, UART_IER, ier); +} + +#endif /* CONFIG_CONSOLE_POLL */ + static int serial8250_startup(struct uart_port *port) { struct uart_8250_port *up = (struct uart_8250_port *)port; @@ -2386,6 +2444,10 @@ static struct uart_ops serial8250_pops =
[1/6] pids: add pid_max prototype
From: Ingo Molnar <[EMAIL PROTECTED]> add pid_max prototype - used by sysctl and will be used by kgdb as well. Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]> --- include/linux/pid.h |2 ++ kernel/sysctl.c |2 +- 2 files changed, 3 insertions(+), 1 deletion(-) Index: linux-kgdb.q/include/linux/pid.h === --- linux-kgdb.q.orig/include/linux/pid.h +++ linux-kgdb.q/include/linux/pid.h @@ -86,6 +86,8 @@ extern struct task_struct *FASTCALL(get_ extern struct pid *get_task_pid(struct task_struct *task, enum pid_type type); +extern int pid_max; + /* * attach_pid() and detach_pid() must be called with the tasklist_lock * write-held. Index: linux-kgdb.q/kernel/sysctl.c === --- linux-kgdb.q.orig/kernel/sysctl.c +++ linux-kgdb.q/kernel/sysctl.c @@ -32,6 +32,7 @@ #include #include #include +#include #include #include #include @@ -71,7 +72,6 @@ extern int max_threads; extern int core_uses_pid; extern int suid_dumpable; extern char core_pattern[]; -extern int pid_max; extern int min_free_kbytes; extern int pid_max_min, pid_max_max; extern int sysctl_drop_caches; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[2/6] uaccess: add probe_kernel_write()
From: Ingo Molnar <[EMAIL PROTECTED]> add probe_kernel_write() - copy & paste of the existing probe_kernel_access(), extended to writes. Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> Reviewed-by: Thomas Gleixner <[EMAIL PROTECTED]> --- include/linux/uaccess.h | 22 ++ 1 file changed, 22 insertions(+) Index: linux-kgdb.q/include/linux/uaccess.h === --- linux-kgdb.q.orig/include/linux/uaccess.h +++ linux-kgdb.q/include/linux/uaccess.h @@ -84,4 +84,26 @@ static inline unsigned long __copy_from_ ret;\ }) +/** + * probe_kernel_write(): safely attempt to write to a location + * @addr: address to write to - its type is type typeof(rdval)* + * @rdval: write to this variable + * + * Safely write to address @addr from variable @rdval. If a kernel fault + * happens, handle that and return -EFAULT. + */ +#define probe_kernel_write(addr, rdval)\ + ({ \ + long ret; \ + mm_segment_t old_fs = get_fs(); \ + \ + set_fs(KERNEL_DS); \ + pagefault_disable();\ + ret = __put_user(rdval, \ +(__force typeof(rdval) __user *)(addr)); \ + pagefault_enable(); \ + set_fs(old_fs); \ + ret;\ + }) + #endif /* __LINUX_UACCESS_H__ */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[0/6] kgdb light
this is the "kgdb light" tree that has been also posted at: http://lkml.org/lkml/2008/2/9/236 it is available at: git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-kgdb.git See the shortlog below. various iterations of this have also been included in x86.git for the past 3 months. This is a slimmed-down and cleaned up version of KGDB that i've created out of the original patches that we submitted two weeks ago. I went over the kgdb patches with Thomas and we cut out everything that we did not like, and cleaned up the result. KGDB is still just as functional as it was before (i tested it on 32-bit and 64-bit x86) - and any desired extra capability or complexity should be added as a delta improvement, not in this initial merge. The difference between the original kgdb submission and this submission is best visible in the diffstat: before: 41 files changed, 4007 insertions(+), 33 deletions(-) after: 22 files changed, 3448 insertions(+), 2 deletions(-) what got removed: - removed _all_ critical path impact, even if KGDB is enabled and active. The only notifier list it is registered in is the die notifiers, but even there it has the minimum priority of -INT_MAX, to be called as the last one of the die notifiers. I removed the 'early trap hook', the trap handler tweaks, everything. KGDB's only impact now are the arch details it implements in arch/x86/kernel/kgdb.c, nothing else. - removed all the lowlevel serial drivers. KGDB should not be in the business of writing special-purpose Linux drivers. In fact i found a testsystem where the KGDB 8250 driver would not work - it's simply reimplementing the wheel that drivers/serial already implements, and poorly so. Any "early debugging" functionality should be done via extending the early-console concept, not via special-purpose KGDB drivers. - I have added a redesigned and cleaned up version of the "KGDB over polled consoles" approach (KGDBOC) - i believe this should be the only IO transport for KGDB: it is an extension of the "console" concept - nothing more, nothing less. Netconsole fits this concept quite nicely as well. The moment a console driver is extended with polling functionality, KGDB will be usable through that IO transport, without having to know about hardware details. - I have removed the longjump code. That code was ugly beyond belief, it tried to fix up KGDB's own faults and needed to hook into all the fault handlers. It is totally, utterly wrong to do it like that. The code now uses pure probe_kernel_address() accesses. - removed the module symbol hacks - those need a clean solution. - removed the GTOD/clocksource hacks. If a user uses kdgb for extended periods of time then GTOD clocksources can get out of sync and we might fall back to other clocksources. That is the _right_ thing to do for the kernel, hacking it around to avoid kernel messages was wrong. - i have removed the softlockup hacks as well. - removed the toplevel Makefile changes - if any change is needed in that area (i'm not convinced thre is), then those changes need to go through Sam & the kbuild folks. - removed the might_sleep scheduler hack as well, and the thread_return hack. - [ and did lots of other cleanups and rewrites as well. ] as a result, this kgdb series has _obviously_ zero impact on the kernel, because it just does not touch any dangerous codepath. From this point on KGDB can evolve in small, well-controlled baby steps, as all other kernel code as well. and the resulting kgdb is still very functional: it can still break into a kernel (via SysRq-G), can catch crashes, can single-step, etc. It's already a quite usable first step. I have tested this tree on x86 32-bit and 64-bit. Other architectures are not expected to be impacted. Ingo --> Ingo Molnar (3): pids: add pid_max prototype uaccess: add probe_kernel_write() x86: kgdb support Jan Kiszka (1): consoles: polling support, kgdboc Jason Wessel (2): kgdb: core kgdb: document parameters Documentation/kernel-parameters.txt |5 + arch/x86/Kconfig|4 + arch/x86/kernel/Makefile|1 + arch/x86/kernel/kgdb.c | 550 ++ drivers/char/tty_io.c | 47 + drivers/serial/8250.c | 62 ++ drivers/serial/Kconfig |3 + drivers/serial/Makefile |1 + drivers/serial/kgdboc.c | 164 +++ drivers/serial/serial_core.c| 67 ++- include/asm-generic/kgdb.h | 93 ++ include/asm-x86/kgdb.h | 87 ++ include/linux/kgdb.h| 264 + include/linux/pid.h |2 + include/linux/serial_core.h |4 + include/linux/tty_driver.h | 12 + include/linux/uaccess.h | 22 +
acpi dsts loading and populate_rootfs
Folks, moving this call around hidden behing in completely unreviewed acpi junk is not acceptable. Either populate_rootfs _is_ safe to be called earlier and then we should do it always or it's not. Either way such a change should be posted separately and reviewd on lkml. Len, can you please revert "ACPI: basic initramfs DSDT override support" aka commit 71fc47a9adf8ee89e5c96a47222915c5485ac437 until we've sorted this out properly? Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Scheduler(?) regression from 2.6.22 to 2.6.24 for short-lived threads
On Sun, Feb 10, 2008 at 07:15:58AM +0100, Willy Tarreau wrote: > On Sat, Feb 09, 2008 at 11:29:41PM -0600, Olof Johansson wrote: > > 40M: > > 2.6.22 time 94315 ms > > 2.6.23 time 107930 ms > > 2.6.24 time 113291 ms > > 2.6.24-git19time 110360 ms > > > > So with more work per thread, the differences become less but they're > > still there. At the 40M loop, with 500 threads it's quite a bit of > > runtime per thread. > > No, it's really nothing. I had to push the loop to 1 billion to make the load > noticeable. You don't have 500 threads, you have 2 threads and that load is > repeated 500 times. And if we look at the numbers, let's take the worst one : > > 40M: > > 2.6.24time 113291 ms > 113291/500 = 227 microseconds/loop. This is still very low compared to the > smallest timeslice you would have (1 ms at HZ=1000). > > So your threads are still completing *before* the scheduler has to preempt > them. Hmm? I get that to be 227ms per loop, which is way more than a full timeslice. Running the program took in the range of 2 minutes, so it's 11 milliseconds, not microseconds. > > It seems generally unfortunate that it takes longer for a new thread to > > move over to the second cpu even when the first is busy with the original > > thread. I can certainly see cases where this causes suboptimal overall > > system behaviour. > > In fact, I don't think it takes longer, I think it does not do it at their > creation, but will do it immediately after the first slice is consumed. This > would explain the important differences here. I don't know how we could ensure > that the new thread is created on the second CPU from the start, though. The math doesn't add up for me. Even if it rebalanced at the end of the first slice (i.e. after 1ms), that would be a 1ms penalty per iteration. With 500 threads that'd be a total penalty of 500ms. > I tried inserting a sched_yield() at the top of the busy loop (1M loops). > By default, it did not change a thing. Then I simply set sched_compat_yield > to 1, and the two threads then ran simultaneously with a stable low time > (2700 ms instead of 10-12 seconds). > > Doing so with 10k loops (initial test) shows times in the range 240-300 ms > only instead of 2200-6500 ms. Right, likely because the long-running cases got stuck at the busy loop at the end, which would end up aborting quicker if the other thread got scheduled for just a bit. It was a mistake to post that variant of the testcase, it's not as relevant and doesn't mimic the original workload I was trying to mimic as well as if the first loop was made larger. > Ingo, would it be possible (and wise) to ensure that a new thread being > created gets immediately rebalanced in order to emulate what is done here > with sched_compat_yield=1 and sched_yield() in both threads just after the > thread creation ? I don't expect any performance difference doing this, > but maybe some shell scripts reliying on short-lived pipes would get faster > on SMP. There's always the tradeoff of losing cache warmth whenever a thread is moved, so I'm not sure if it's a good idea to always migrate it at creation time. It's not a simple problem, really. > > I agree that the testcase is highly artificial. Unfortunately, it's > > not uncommon to see these kind of weird testcases from customers tring > > to evaluate new hardware. :( They tend to be pared-down versions of > > whatever their real workload is (the real workload is doing things more > > appropriately, but the smaller version is used for testing). I was lucky > > enough to get source snippets to base a standalone reproduction case on > > for this, normally we wouldn't even get copies of their binaries. > > I'm well aware of that. What's important is to be able to explain what is > causing the difference and why the test case does not represent anything > related to performance. Maybe the code author wanted to get 500 parallel > threads and got his code wrong ? I believe it started out as a simple attempt to parallelize a workload that sliced the problem too low, instead of slicing it in larger chunks and have each thread do more work at a time. It did well on 2.6.22 with almost a 2x speedup, but did worse than the single-treaded testcase on a 2.6.24 kernel. So yes, it can clearly be handled through explanations and education and fixen the broken testcase, but I'm still not sure the new behaviour is desired. -Olof -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: One minute delay when booting 2.6.24.1
On Saturday 09 February 2008 22:01:44 Jan Engelhardt wrote: > On Feb 9 2008 13:29, Tvrtko A. Ursulin wrote: > >Hi all, > > > >As the subject says I get ~1 minute delay when booting 2.6.24.1 > >pretty reliably. It is possible it is not new to 2.6.24.1 but I > >can't tell due recent hardware changes. > > > >dmesg excerpt where it happens looks like this (full one attached): > > Do you really experience a 1 minute wait, or is this perhaps > just the clock skipping? It is a real delay. Tvrtko -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel 2.6.24.1 still vulnerable to the vmsplice local root exploit
On Feb 10, 2008 8:32 AM, Willy Tarreau <[EMAIL PROTECTED]> wrote: > On Sun, Feb 10, 2008 at 08:04:35AM +0200, Niki Denev wrote: > > Hi, > > > > As the subject says the 2.6.24.1 is still vulnerable to the vmsplice > > local root exploit. > > Yes indeed, that's quite bad. 2.6.24-git is still vulnerable too, and > also contains the fix :-( > > CC'd Jens as he worked on the fix. > > Willy > > I was unable to gain root on 2.6.24-git20 but after several segfaults when executing the exploit continously the machine crashes. --Niki -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] ext4 update
Hi Linus, Please pull from: git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git for_linus These are mostly bug fixes that we've found since the last pull request. The one non-bugfix change is that I've added a sanity check to assure that production ext3 filesystems don't get mounted with ext4dev accidentally. The need for this was discovered when Eric Sandeen started putting ext4 into Fedora's Rawhide release for initial testing. Thanks, - Ted Aneesh Kumar K.V (5): jbd2: Fix reference counting on the journal commit block's buffer head JBD2: Use the incompat macro for testing the incompat feature. ext4: Fix null bh pointer dereference in mballoc ext4: Fix circular locking dependency with migrate and rm. ext4: Don't panic in case of corrupt bitmap Dave Kleikamp (1): JBD2: Clear buffer_ordered flag for barried IO request on success Eric Sandeen (2): allow in-inode EAs on ext4 root inode ext4: allocate struct ext4_allocation_context from a kmem cache Jan Kara (2): jbd: Remove useless loop when writing commit record ext4: Fix Direct I/O locking Mingming Cao (1): jbd2: Add error check to journal_wait_on_commit_record to avoid oops Theodore Tso (1): ext4: Add new "development flag" to the ext4 filesystem Valerie Clement (1): ext4: Don't set EXTENTS_FL flag for fast symlinks fs/ext4/inode.c | 115 +++- fs/ext4/mballoc.c | 164 ++- fs/ext4/migrate.c | 123 +++ fs/ext4/namei.c |1 + fs/ext4/super.c | 11 +++ fs/jbd/commit.c | 14 ++-- fs/jbd2/commit.c| 10 ++- fs/jbd2/recovery.c |2 +- include/linux/ext4_fs.h |7 ++ 9 files changed, 270 insertions(+), 177 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel 2.6.24.1 still vulnerable to the vmsplice local root exploit
On Sun, Feb 10, 2008 at 08:04:35AM +0200, Niki Denev wrote: > Hi, > > As the subject says the 2.6.24.1 is still vulnerable to the vmsplice > local root exploit. Yes indeed, that's quite bad. 2.6.24-git is still vulnerable too, and also contains the fix :-( CC'd Jens as he worked on the fix. Willy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Change pci_raw_ops to pci_raw_read/write
On Feb 9, 2008 4:41 AM, Matthew Wilcox <[EMAIL PROTECTED]> wrote: > On Thu, Feb 07, 2008 at 10:54:05AM -0500, Tony Camuso wrote: > > Matthew, > > > > Perhaps I missed it, but did you address Yinghai's concerns? > > No, I was on holiday. > > > Yinghai Lu wrote: > > >On Jan 28, 2008 7:03 PM, Matthew Wilcox <[EMAIL PROTECTED]> wrote: > > >> > > >>-int pci_conf1_write(unsigned int seg, unsigned int bus, > > >>+static int pci_conf1_write(unsigned int seg, unsigned int bus, > > >> unsigned int devfn, int reg, int len, u32 > > >> value) > > > > > >any reason to change pci_conf1_read/write to static? > > Yes -- it no longer needs to be called from outside this file. > > > >>+config ATA_RAM > > >>+ tristate "ATA RAM driver" > > >>+ > > > > > >related? > looks good. it should get into -mm or x86/mm for some testing YH -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] USB: mark USB drivers as being GPL only
On Sunday 10 February 2008 00:43:49 Marcel Holtmann wrote: > Hi Daniel, > > > > > > It makes no difference if you > > > > > distribute the GPL library with it or not. > > > > > > > > If you do not distribute the GPL library, the library is simply being > > > > used in the intended, ordinary way. You do not need to agree to, nor > > > > can you violate, the GPL simply by using a work in its ordinary > > > > intended way. > > > > > > > > If the application contains insufficient copyrightable expression > > > > from the library to be considered a derivative work (and purely > > > > functional things do not count), then it cannot be a derivative work. > > > > The library is not being copied or distributed. So how can its > > > > copyright be infringed? > > > > > > go ahead and create an application that uses a GPL only library. Then > > > ask a lawyer if it is okay to distribute your application in binary > > > only form without making the source code available (according to the > > > GPL). > > > > > > http://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.html#IfLibraryIsGP > > >L > > > > > > http://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.html#LinkingWithGP > > >L > > > > In the US, at least, the belief that "Linking", in *ANY* form, with a GPL > > library creates a derivative work, is fallacious. > > that is how FSF states it and it seems that most legal departments of > big companies (US and EU based) are not taking any risk on this. So it > seems that someone actually has to prove in court that these assumptions > for the GPL case are wrong. The FSF is making a claim that can be traced back to the beliefs of one person - RMS - and that propagate their views. As I stated in the original, this is not just my opinion, but that of two different lawyers I've spoken to and also the stated belief of numerous people on LKML. The fact is that the GPL only affects a "derivative work" in a viral manner. Merely using a GPL'd libraries API is not enough to make a program a "derivative work". > > Were I to create an > > application that uses, say, GTK for the interface the protected > > expression is my "unique and creative" use of the GTK API for creating > > the specific interface and any other code I have written using the API. I > > hold sole license to the copyright on that code and am able to license > > said code under the specific license of my choice. > > Not even getting into this one since GTK+ is a LGPL based library. Get > your examples straight. And the LGPL was created because of the FSF propagated belief that using a GPL'd library means your application is automatically a "derivative work" and hence must be released under the GPL. So the LGPL was created with the "automatic" 'linking' exemption. It is not necessary and never has been. This is why, even if the FSF claims what I've said above (that linking code with the GPL doesn't propagate the GPL into the non-GPL code) most companies won't risk it... Because the FSF has taken actions that are the exact opposite of their words. > > Why? Because the pre-processor is what is including any GPL'd code in my > > application and expanding any macros. That is a purely mechanical process > > and hence the output is not able to be separately copyrighted - if it > > could be, then the copyright would be held by the *COMPILER*, and I am > > *NOT* bound by the license on that code. The same applies if GPL'd code > > is included in my application during the linking process. QED: The > > "Linking" argument used by most people is wholly fallacious in at least > > one major country - and if I'm not mistaken, the output from an automated > > process is similarly not considered as carrying a separate copyright in > > all nations that are signatories of or follow the Bern Convention. > > The GPL is a license. Nobody is talking about the copyright of your code > here. You always have the copyright on your code. The point is that you > have to license your code under GPL (when using a GPL library) and you > are distributing your code. Yes, It is "my" code and "my" copyright. However, by the absolutely *common* belief that "linking to GPL libraries makes a program a derivative work" it would mean that I no longer have the freedom to license my code under the license of my choosing, because the *mechanical* process of linking has caused the GPL's "viral" clause to spread to cover my code. And you're absolutely wrong. It doesn't matter that the library is GPL'd at all. My code *cannot*, under any circumstances, be affected by the GPL license on the library. Because the libraries API *cannot* be copyrighted and any GPL'd code which winds up in the final binary got there via a "mechanical process" and doesn't affect my right to release the code under a license of my choosing. Any other belief is fallacious. Claiming otherwise would mean that any program that uses any library on a windows system makes an application a derivative work of that
/bin/sh: -c: line 0: syntax error near unexpected token `;'
Hello All , In a recent pull of linus's tree (*) today @ 2008-02-10 02:49 UTC , Using a previously well behaving .config I now get ... Tia , JimL make -f scripts/Makefile.clean obj=sound/usb/usx2y make -f scripts/Makefile.clean obj=usr rm -rf .tmp_versions rm -f arch/x86/boot/fdimage arch/x86/boot/image.iso arch/x86/boot/mtools.conf vmlinux System.map .tmp_kallsyms* .tmp_version .tmp_vmlinux* .tmp_System.map rm -f include/config/kernel.release echo 2.6.24 > include/config/kernel.release set -e; ; mkdir -p include/linux/; (echo \#define LINUX_VERSION_CODE 132632; echo '#define KERNEL_VERSION(a,b,c) (((a) << 16) + ((b) << 8) + (c))';) < /usr/src/linux-2.6.25-git-20080209/Makefile > include/linux/version.h.tmp; if [ -r include/linux/version.h ] && cmp -s include/linux/version.h include/linux/version.h.tmp; then rm -f include/linux/version.h.tmp; else ; mv -f include/linux/version.h.tmp include/linux/version.h; fi /bin/sh: -c: line 0: syntax error near unexpected token `;' /bin/sh: -c: line 0: `set -e; ; mkdir -p include/linux/;(echo \#define LINUX_VERSION_CODE 132632; echo '#define KERNEL_VERSION(a,b,c) (((a) << 16) + ((b) << 8) + (c))';) < /usr/src/linux-2.6.25-git-20080209/Makefile > include/linux/version.h.tmp; if [ -r include/linux/version.h ] && cmp -s include/linux/version.h include/linux/version.h.tmp; then rm -f include/linux/version.h.tmp; else ; mv -f include/linux/version.h.tmp include/linux/version.h; fi' make: *** [include/linux/version.h] Error 2 (*)git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git -- +--+ | James W. Laferriere | SystemTechniques | Give me VMS | | Network Engineer | 2133McCullam Ave | Give me Linux | | [EMAIL PROTECTED] | Fairbanks, AK. 99701 | only on AXP | +--+ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Scheduler(?) regression from 2.6.22 to 2.6.24 for short-lived threads
On Sat, Feb 09, 2008 at 11:29:41PM -0600, Olof Johansson wrote: > On Sat, Feb 09, 2008 at 05:19:57PM +0100, Willy Tarreau wrote: > > On Sat, Feb 09, 2008 at 02:37:39PM +0100, Mike Galbraith wrote: > > > > > > On Sat, 2008-02-09 at 12:40 +0100, Willy Tarreau wrote: > > > > On Sat, Feb 09, 2008 at 11:58:25AM +0100, Mike Galbraith wrote: > > > > > > > > > > On Sat, 2008-02-09 at 09:03 +0100, Willy Tarreau wrote: > > > > > > > > > > > How many CPUs do you have ? > > > > > > > > > > It's a P4/HT, so 1 plus $CHUMP_CHANGE_MAYBE > > > > > > > > > > > > 2.6.25-smp (git today) > > > > > > > time 29 ms > > > > > > > time 61 ms > > > > > > > time 72 ms > > > > > > > > > > > > These ones look rather strange. What type of workload is it ? Can > > > > > > you > > > > > > publish the program for others to test it ? > > > > > > > > > > It's the proglet posted in this thread. > > > > > > > > OK sorry, I did not notice it when I first read the report. > > > > > > Hm. The 2.6.25-smp kernel is the only one that looks like it's doing > > > what proggy wants to do, massive context switching. Bump threads to > > > larger number so you can watch: the supposedly good kernel (22) is doing > > > everything on one CPU. Everybody else sucks differently (idleness), and > > > the clear throughput winner, via mad over-schedule (!?!), is git today. > > > > For me, 2.6.25-smp gives pretty irregular results : > > > > time 6548 ms > > time 7272 ms > > time 1188 ms > > time 3772 ms > > > > The CPU usage is quite irregular too and never goes beyond 50% (this is a > > dual-athlon). If I start two of these processes, 100% of the CPU is used, > > the context switch rate is more regular (about 700/s) and the total time > > is more regular too (between 14.8 and 18.5 seconds). > > > > Increasing the parallel run time of the two threads by changing the upper > > limit of the for(j) loop correctly saturates both processors. I think that > > this program simply does not have enough work to do for each thread to run > > for a full timeslice, thus showing a random behaviour. > > Right. I should have tinkered a bit more with it before I posted it, the > version posted had too little going on in the first loop and thus got > hung up on the second busywait loop instead. > > I did a bunch of runs with various loop sizes. Basically, what seems to > happen is that the older kernels are quicker at rebalancing a new thread > over to the other cpu, while newer kernels let them share the same cpu > longer (and thus increases wall clock runtime). > > All of these are built with gcc without optimization, larger loop size > and an added sched_yield() in the busy-wait loop at the end to take that > out as a factor. As you've seen yourself, runtimes can be quite noisy > but the trends are quite clear anyway. All of these numbers were > collected with default scheduler runtime options, same kernels and > configs as previously posted. > > Loop to 1M: > 2.6.22time 4015 ms > 2.6.23time 4581 ms > 2.6.24time 10765 ms > 2.6.24-git19 time 8286 ms > > 2M: > 2.6.22time 7574 ms > 2.6.23time 9031 ms > 2.6.24time 12844 ms > 2.6.24-git19 time 10959 ms > > 3M: > 2.6.22time 8015 ms > 2.6.23time 13053 ms > 2.6.24time 16204 ms > 2.6.24-git19 time 14984 ms > > 4M: > 2.6.22time 10045 ms > 2.6.23time 16642 ms > 2.6.24time 16910 ms > 2.6.24-git19 time 16468 ms > > 5M: > 2.6.22time 12055 ms > 2.6.23time 21024 ms > > 2.6.24-git19 time 16040 ms > > 10M: > 2.6.22time 24030 ms > 2.6.23time 33082 ms > 2.6.24time 34139 ms > 2.6.24-git19 time 33724 ms > > 20M: > 2.6.22time 50015 ms > 2.6.23time 63963 ms > 2.6.24time 65100 ms > 2.6.24-git19 time 63092 ms > > 40M: > 2.6.22time 94315 ms > 2.6.23time 107930 ms > 2.6.24time 113291 ms > 2.6.24-git19 time 110360 ms > > So with more work per thread, the differences become less but they're > still there. At the 40M loop, with 500 threads it's quite a bit of > runtime per thread. No, it's really nothing. I had to push the loop to 1 billion to make the load noticeable. You don't have 500 threads, you have 2 threads and that load is repeated 500 times. And if we look at the numbers, let's take the worst one : > 40M: > 2.6.24time 113291 ms 113291/500 = 227 microseconds/loop. This is still very low compared to the smallest timeslice you would have (1 ms at HZ=1000). So your threads are still completing *before* the scheduler has to preempt them. > > However, I fail to understand the goal of the reproducer. Granted it shows > > irregularities in the scheduler under such conditions, but what *real* > > workload would spend its time sequentially
kernel 2.6.24.1 still vulnerable to the vmsplice local root exploit
Hi, As the subject says the 2.6.24.1 is still vulnerable to the vmsplice local root exploit. [EMAIL PROTECTED] tmp]$ uname -a Linux tester 2.6.24.1 #1 Sun Feb 10 00:06:49 EST 2008 i686 unknown [EMAIL PROTECTED] tmp]$ ./vms --- Linux vmsplice Local Root Exploit By qaaz --- [+] mmap: 0x0 .. 0x1000 [+] page: 0x0 [+] page: 0x20 [+] mmap: 0x4000 .. 0x5000 [+] page: 0x4000 [+] page: 0x4020 [+] mmap: 0x1000 .. 0x2000 [+] page: 0x1000 [+] mmap: 0xb7f56000 .. 0xb7f88000 [+] root [EMAIL PROTECTED] tmp]# [EMAIL PROTECTED] tmp]# id uid=0(root) gid=0(root) groups=2033(opa) [EMAIL PROTECTED] tmp]# uname -a Linux test 2.6.24.1 #1 Sun Feb 10 00:06:49 EST 2008 i686 unknown Is there any known fix/patch for this? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.24-mm1] Mempolicy: silently restrict nodemask to allowed nodes V3
On Sun, Feb 10, 2008 at 02:29:24PM +0900, KOSAKI Motohiro wrote: > CC'd Greg KH <[EMAIL PROTECTED]> > > I tested this patch on fujitsu memoryless node. > (2.6.24 + silently-restrict-nodemask-to-allowed-nodes-V3 insted 2.6.24-mm1) > it seems works good. > > Tested-by: KOSAKI Motohiro <[EMAIL PROTECTED]> > > > Greg, I hope this patch merge to 2.6.24.x stable tree because > this patch is regression fixed patch. > Please tell me what do i doing for it. Once the patch goes into Linus's tree, feel free to send it to the [EMAIL PROTECTED] address so that we can include it in the 2.6.24.x tree. thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] USB: mark USB drivers as being GPL only
Hi Daniel, > > > > It makes no difference if you > > > > distribute the GPL library with it or not. > > > > > > If you do not distribute the GPL library, the library is simply being > > > used in the intended, ordinary way. You do not need to agree to, nor can > > > you violate, the GPL simply by using a work in its ordinary intended way. > > > > > > If the application contains insufficient copyrightable expression from > > > the library to be considered a derivative work (and purely functional > > > things do not count), then it cannot be a derivative work. The library is > > > not being copied or distributed. So how can its copyright be infringed? > > > > go ahead and create an application that uses a GPL only library. Then > > ask a lawyer if it is okay to distribute your application in binary only > > form without making the source code available (according to the GPL). > > > > http://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.html#IfLibraryIsGPL > > > > http://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.html#LinkingWithGPL > > In the US, at least, the belief that "Linking", in *ANY* form, with a GPL > library creates a derivative work, is fallacious. that is how FSF states it and it seems that most legal departments of big companies (US and EU based) are not taking any risk on this. So it seems that someone actually has to prove in court that these assumptions for the GPL case are wrong. > Were I to create an > application that uses, say, GTK for the interface the protected expression is > my "unique and creative" use of the GTK API for creating the specific > interface and any other code I have written using the API. I hold sole > license to the copyright on that code and am able to license said code under > the specific license of my choice. Not even getting into this one since GTK+ is a LGPL based library. Get your examples straight. > Why? Because the pre-processor is what is including any GPL'd code in my > application and expanding any macros. That is a purely mechanical process and > hence the output is not able to be separately copyrighted - if it could be, > then the copyright would be held by the *COMPILER*, and I am *NOT* bound by > the license on that code. The same applies if GPL'd code is included in my > application during the linking process. QED: The "Linking" argument used by > most people is wholly fallacious in at least one major country - and if I'm > not mistaken, the output from an automated process is similarly not > considered as carrying a separate copyright in all nations that are > signatories of or follow the Bern Convention. The GPL is a license. Nobody is talking about the copyright of your code here. You always have the copyright on your code. The point is that you have to license your code under GPL (when using a GPL library) and you are distributing your code. Regards Marcel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2.6.24-mm1] Mempolicy: silently restrict nodemask to allowed nodes V3
CC'd Greg KH <[EMAIL PROTECTED]> I tested this patch on fujitsu memoryless node. (2.6.24 + silently-restrict-nodemask-to-allowed-nodes-V3 insted 2.6.24-mm1) it seems works good. Tested-by: KOSAKI Motohiro <[EMAIL PROTECTED]> Greg, I hope this patch merge to 2.6.24.x stable tree because this patch is regression fixed patch. Please tell me what do i doing for it. [intentional full quote] > Was "Re: [2.6.24 regression][BUGFIX] numactl --interleave=all doesn't > works on memoryless node." > > [Aside: I noticed there were two slightly different distributions for > this topic. I've unified the distribution lists w/o dropping anyone, I > think. Apologies if you'd rather have been dropped...] > > Here's V3 of the patch, accomodating Kosaki Motohiro's suggestion for > folding contextualize_policy() into mpol_check_policy() [because my > "was_empty" argument "was ugly" ;-)]. It does seem to clean up the > code. > > I'm still deferring David Rientjes' suggestion to fold > mpol_check_policy() into mpol_new(). We need to sort out whether > mempolicies specified for tmpfs and hugetlbfs mounts always need the > same "contextualization" as user/application installed policies. I > don't want to hold up this bug fix for that discussion. This is > something Paul J will need to address with his cpuset/mempolicy rework, > so we can sort it out in that context. > > Again, tested with "numactl --interleave=all" and memtoy on ia64 using > mem= command line argument to simulate memoryless node. > > > Lee > > > [PATCH] 2.6.24-mm1 - mempolicy: silently restrict nodemask to allowed nodes > > V2 -> V3: > + As suggested by Kosaki Motohito, fold the "contextualization" > of policy nodemask into mpol_check_policy(). Looks a little > cleaner. > > V1 -> V2: > + Communicate whether or not incoming node mask was empty to > mpol_check_policy() for better error checking. > + As suggested by David Rientjes, remove the now unused >cpuset_nodes_subset_current_mems_allowed() from cpuset.h > > Kosaki Motohito noted that "numactl --interleave=all ..." failed in the > presence of memoryless nodes. This patch attempts to fix that problem. > > Some background: > > numactl --interleave=all calls set_mempolicy(2) with a fully > populated [out to MAXNUMNODES] nodemask. set_mempolicy() > [in do_set_mempolicy()] calls contextualize_policy() which > requires that the nodemask be a subset of the current task's > mems_allowed; else EINVAL will be returned. A task's > mems_allowed will always be a subset of node_states[N_HIGH_MEMORY]-- > i.e., nodes with memory. So, a fully populated nodemask will > be declared invalid if it includes memoryless nodes. > > NOTE: the same thing will occur when running in a cpuset > with restricted mem_allowed--for the same reason: > node mask contains dis-allowed nodes. > > mbind(2), on the other hand, just masks off any nodes in the > nodemask that are not included in the caller's mems_allowed. > > In each case [mbind() and set_mempolicy()], mpol_check_policy() > will complain [again, resulting in EINVAL] if the nodemask contains > any memoryless nodes. This is somewhat redundant as mpol_new() > will remove memoryless nodes for interleave policy, as will > bind_zonelist()--called by mpol_new() for BIND policy. > > Proposed fix: > > 1) modify contextualize_policy logic to: >a) remember whether the incoming node mask is empty. >b) if not, restrict the nodemask to allowed nodes, as is > currently done in-line for mbind(). This guarantees > that the resulting mask includes only nodes with memory. > > NOTE: this is a [benign, IMO] change in behavior for > set_mempolicy(). Dis-allowed nodes will be > silently ignored, rather than returning an error. > >c) fold this code into mpol_check_policy(), replace 2 calls to > contextualize_policy() to call mpol_check_policy() directly > and remove contextualize_policy(). > > 2) In existing mpol_check_policy() logic, after "contextualization": >a) MPOL_DEFAULT: require that in coming mask "was_empty" >b) MPOL_{BIND|INTERLEAVE}: require that contextualized nodemask > contains at least one node. >c) add a case for MPOL_PREFERRED: if in coming was not empty > and resulting mask IS empty, user specified invalid nodes. > Return EINVAL. >c) remove the now redundant check for memoryless nodes > > 3) remove the now redundant masking of policy nodes for interleave >policy from mpol_new(). > > 4) Now that mpol_check_policy() contextualizes the nodemask, remove >the in-line nodes_and() from sys_mbind(). I believe that this >restores mbind() to the behavior before the memoryless-nodes >patch series. E.g., we'll no longer treat an invalid nodemask >with MPOL_PREFERRED as local allocation. > > Signed-off-by: Lee Schermerhorn <[EMAIL PROTECTED]> > >
Re: [git pull] kgdb light
On Sat, Feb 09, 2008 at 09:42:59PM +0100, Ingo Molnar wrote: > Linus, > > while this is probably one of the last days of the merge window, please > still consider pulling the "kgdb light" git tree from: > >git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-kgdb.git Without posting patches for review first? You must be kidding. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Scheduler(?) regression from 2.6.22 to 2.6.24 for short-lived threads
On Sat, Feb 09, 2008 at 05:19:57PM +0100, Willy Tarreau wrote: > On Sat, Feb 09, 2008 at 02:37:39PM +0100, Mike Galbraith wrote: > > > > On Sat, 2008-02-09 at 12:40 +0100, Willy Tarreau wrote: > > > On Sat, Feb 09, 2008 at 11:58:25AM +0100, Mike Galbraith wrote: > > > > > > > > On Sat, 2008-02-09 at 09:03 +0100, Willy Tarreau wrote: > > > > > > > > > How many CPUs do you have ? > > > > > > > > It's a P4/HT, so 1 plus $CHUMP_CHANGE_MAYBE > > > > > > > > > > 2.6.25-smp (git today) > > > > > > time 29 ms > > > > > > time 61 ms > > > > > > time 72 ms > > > > > > > > > > These ones look rather strange. What type of workload is it ? Can you > > > > > publish the program for others to test it ? > > > > > > > > It's the proglet posted in this thread. > > > > > > OK sorry, I did not notice it when I first read the report. > > > > Hm. The 2.6.25-smp kernel is the only one that looks like it's doing > > what proggy wants to do, massive context switching. Bump threads to > > larger number so you can watch: the supposedly good kernel (22) is doing > > everything on one CPU. Everybody else sucks differently (idleness), and > > the clear throughput winner, via mad over-schedule (!?!), is git today. > > For me, 2.6.25-smp gives pretty irregular results : > > time 6548 ms > time 7272 ms > time 1188 ms > time 3772 ms > > The CPU usage is quite irregular too and never goes beyond 50% (this is a > dual-athlon). If I start two of these processes, 100% of the CPU is used, > the context switch rate is more regular (about 700/s) and the total time > is more regular too (between 14.8 and 18.5 seconds). > > Increasing the parallel run time of the two threads by changing the upper > limit of the for(j) loop correctly saturates both processors. I think that > this program simply does not have enough work to do for each thread to run > for a full timeslice, thus showing a random behaviour. Right. I should have tinkered a bit more with it before I posted it, the version posted had too little going on in the first loop and thus got hung up on the second busywait loop instead. I did a bunch of runs with various loop sizes. Basically, what seems to happen is that the older kernels are quicker at rebalancing a new thread over to the other cpu, while newer kernels let them share the same cpu longer (and thus increases wall clock runtime). All of these are built with gcc without optimization, larger loop size and an added sched_yield() in the busy-wait loop at the end to take that out as a factor. As you've seen yourself, runtimes can be quite noisy but the trends are quite clear anyway. All of these numbers were collected with default scheduler runtime options, same kernels and configs as previously posted. Loop to 1M: 2.6.22 time 4015 ms 2.6.23 time 4581 ms 2.6.24 time 10765 ms 2.6.24-git19time 8286 ms 2M: 2.6.22 time 7574 ms 2.6.23 time 9031 ms 2.6.24 time 12844 ms 2.6.24-git19time 10959 ms 3M: 2.6.22 time 8015 ms 2.6.23 time 13053 ms 2.6.24 time 16204 ms 2.6.24-git19time 14984 ms 4M: 2.6.22 time 10045 ms 2.6.23 time 16642 ms 2.6.24 time 16910 ms 2.6.24-git19time 16468 ms 5M: 2.6.22 time 12055 ms 2.6.23 time 21024 ms 2.6.24-git19time 16040 ms 10M: 2.6.22 time 24030 ms 2.6.23 time 33082 ms 2.6.24 time 34139 ms 2.6.24-git19time 33724 ms 20M: 2.6.22 time 50015 ms 2.6.23 time 63963 ms 2.6.24 time 65100 ms 2.6.24-git19time 63092 ms 40M: 2.6.22 time 94315 ms 2.6.23 time 107930 ms 2.6.24 time 113291 ms 2.6.24-git19time 110360 ms So with more work per thread, the differences become less but they're still there. At the 40M loop, with 500 threads it's quite a bit of runtime per thread. > However, I fail to understand the goal of the reproducer. Granted it shows > irregularities in the scheduler under such conditions, but what *real* > workload would spend its time sequentially creating then immediately killing > threads, never using more than 2 at a time ? > > If this could be turned into a DoS, I could understand, but here it looks > a bit pointless :-/ It seems generally unfortunate that it takes longer for a new thread to move over to the second cpu even when the first is busy with the original thread. I can certainly see cases where this causes suboptimal overall system behaviour. I agree that the testcase is highly artificial. Unfortunately, it's not uncommon to see these kind of weird testcases from customers tring to evaluate new hardware. :( They tend to be pared-down versions of whatever their real workload is (the real workload is doing things more appropriately, but the smaller version is used for testing). I was lucky enough to get source snippets to base a standalone reproduction case on for this, normally we wouldn't even get copies of their binaries. -Olof -- To
Re: [PATCH] [resend] 3c509: convert to isa_driver and pnp_driver v4
On Sun, Feb 10, 2008 at 01:10:07AM +0100, Ondrej Zary wrote: > > > +typedef enum { EL3_ISA, EL3_PNP, EL3_MCA, EL3_EISA } el3_cardtype; > > > + > > > > No typedef please (see checkpatch) > > Is there any standard way to solve this without a typedef? I added > el3_dev_fill() function which fills that card type value according to a > parameter passed to it. "int" could be used instead and "#define EL3_ISA > 0", "#define EL3_PNP 1" - but I think that's ugly. enum el3_cardtype { EL3_ISA, EL3_PNP, EL3_MCA, EL3_EISA, }; > > > struct el3_private { > > > struct net_device_stats stats; > > > > Use network device stats in net_device now > > OK, looks like the driver will need some more patches. While I agree with Stephens comment that this driver should be using the stats in net_device that's totally out of scope for this patch. As you're the defacto maintainer of this driver now it would be nice if you could submit another one for it. > > > - struct net_device *next_dev; > > > spinlock_t lock; > > > /* skb send-queue */ > > > int head, size; > > > struct sk_buff *queue[SKB_QUEUE_SIZE]; > > > > What about sk_buff_head (linked list instead)? > > I don't know anything about this, maybe in next patch. Yes, separate patch please. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] USB: mark USB drivers as being GPL only
On Saturday 09 February 2008 23:50:17 Marcel Holtmann wrote: > > > It makes no difference if you > > > distribute the GPL library with it or not. > > > > If you do not distribute the GPL library, the library is simply being > > used in the intended, ordinary way. You do not need to agree to, nor can > > you violate, the GPL simply by using a work in its ordinary intended way. > > > > If the application contains insufficient copyrightable expression from > > the library to be considered a derivative work (and purely functional > > things do not count), then it cannot be a derivative work. The library is > > not being copied or distributed. So how can its copyright be infringed? > > go ahead and create an application that uses a GPL only library. Then > ask a lawyer if it is okay to distribute your application in binary only > form without making the source code available (according to the GPL). > > http://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.html#IfLibraryIsGPL > > http://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.html#LinkingWithGPL > > Regards > > Marcel In the US, at least, the belief that "Linking", in *ANY* form, with a GPL library creates a derivative work, is fallacious. Were I to create an application that uses, say, GTK for the interface the protected expression is my "unique and creative" use of the GTK API for creating the specific interface and any other code I have written using the API. I hold sole license to the copyright on that code and am able to license said code under the specific license of my choice. Why? Because the pre-processor is what is including any GPL'd code in my application and expanding any macros. That is a purely mechanical process and hence the output is not able to be separately copyrighted - if it could be, then the copyright would be held by the *COMPILER*, and I am *NOT* bound by the license on that code. The same applies if GPL'd code is included in my application during the linking process. QED: The "Linking" argument used by most people is wholly fallacious in at least one major country - and if I'm not mistaken, the output from an automated process is similarly not considered as carrying a separate copyright in all nations that are signatories of or follow the Bern Convention. (And yes, this also applies to some GPL'd tools that RMS extended "GPL Exemptions" to - such as "Bison". There is, generally, no need for such an exemption, because the process by which the GPL'd code is included in the final binary is wholly mechanical.) DRH PS: The above information is a very condensed form of the result of several past conversations on this list about copyright law and the GPL as well as my own, private discussions with lawyers. I'm being lazy here and not searching various archives of LKML to give pointers to the past discussions. -- Dialup is like pissing through a pipette. Slow and excruciatingly painful. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] ipvs: Cleanup sync daemon code
On Sun, Feb 10, 2008 at 12:38:11AM +0100, Sven Wegener wrote: > struct ip_vs_sync_thread_data { > - struct completion *startup; > + struct completion *startup; /* set to NULL once completed */ This is not needed anmore. kthread_run guarantees that the newly creates thread is run before returning to the caller. > +/* wait queue for master sync daemon */ > +static DECLARE_WAIT_QUEUE_HEAD(sync_master_wait); I don't think you need this one either. You can use wake_up_process on the task_struct pointer instead. > spin_lock(_vs_sync_lock); > list_add_tail(>list, _vs_sync_queue); > + if (++ip_vs_sync_count == 10) > + wake_up_interruptible(_master_wait); > spin_unlock(_vs_sync_lock); > } > -static int sync_thread(void *startup) > +static int sync_thread(void *data) Btw, it might make sense to remove sync_thread and just call the master and backup threads directly. > +void __init ip_vs_sync_init(void) > +{ > + /* set up multicast address */ > + mcast_addr.sin_family = AF_INET; > + mcast_addr.sin_port = htons(IP_VS_SYNC_PORT); > + mcast_addr.sin_addr.s_addr = htonl(IP_VS_SYNC_GROUP); > } Why can't this be initialized at compile time by: static struct sockaddr_in mcast_addr = { .sin_family = AF_INET, .sin_port = htons(IP_VS_SYNC_PORT), .sin_addr.s_addr= htonl(IP_VS_SYNC_GROUP), } (the hton* might need __constant_hton* also I'm not sure without trying) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] USB: mark USB drivers as being GPL only
Hi David, > > Lets phrase this in better words as Valdis pointed out: You can't > > distribute an application (binary or source form) under anything else > > than GPL if it uses a GPL library. > > This simply cannot be correct. The only way it could be true is if the work > was a derivative work of a GPL'd work. There is no other way it could become > subject to the GPL. > > So this argument reduces to -- any work that uses a library is a derivative > work of that library. But this is clearly wrong. For work X to be a > derivative work of work Y, it must contain substantial protected expression > from work Y, but an application need not have any expression from the > libraries it uses. > > > It makes no difference if you > > distribute the GPL library with it or not. > > If you do not distribute the GPL library, the library is simply being used > in the intended, ordinary way. You do not need to agree to, nor can you > violate, the GPL simply by using a work in its ordinary intended way. > > If the application contains insufficient copyrightable expression from the > library to be considered a derivative work (and purely functional things do > not count), then it cannot be a derivative work. The library is not being > copied or distributed. So how can its copyright be infringed? go ahead and create an application that uses a GPL only library. Then ask a lawyer if it is okay to distribute your application in binary only form without making the source code available (according to the GPL). http://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.html#IfLibraryIsGPL http://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.html#LinkingWithGPL Regards Marcel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] CPU isolation extensions (updated)
Paul Jackson wrote: > Max wrote: >> Linus, please pull CPU isolation extensions from > > Did I miss something in this discussion? I thought > Ingo was quite clear, and Linus pretty clear too, > that this patch should bake in *-mm or some such > place for a bit first. > Andrew said: > The feature as a whole seems useful, and I don't actually oppose the merge > based on what I see here. As long as you're really sure that cpusets are > inappropriate (and bear in mind that Paul has a track record of being wrong > on this :)). But I see a few glitches As far as I can understand Andrew is ok with the merge. And I addressed all his comments. Linus said: > Have these been in -mm and widely discussed etc? I'd like to start more > carefully, and (a) have that controversial last patch not merged initially > and (b) make sure everybody is on the same page wrt this all.. As far as I can understand Linus _asked_ whether it was in -mm or not and whether everybody's on the same page. He did not say "this must be in -mm first". I explained that it has not been in -mm, and who it was discussed with, and did a bunch more testing/investigation on the controversial patch and explained why I think it's not that controversial any more. Ingo said a few different things (a bit too large to quote). - That it was not discussed. I explained that it was in fact discussed and provided a bunch of pointers to the mail threads. - That he thinks that cpuset is the way to do it. Again I explained why it's not. And at the end he said: > Also, i'd not mind some test-coverage in sched.git as well. I far as I know "do not mind" does not mean "must go to" ;-). Also I replied that I did not mind either but I do not think that it has much (if anything) to do with the scheduler. Anyway. I think I mentioned that I did not mind -mm either. I think it's ready for the mainline. But if people still strongly feel that it has to be in -mm that's fine. Lets just do s/Linus/Andrew/ on the first line and move on. But if Linus pulls it now even better ;-) Andrew, Linus, I'll let you guys decide which tree it needs to go. Max -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Silent compiler warning introduced by commit 801c135ce73d5df1caf3eca35b66a10824ae0707 (UBI: Unsorted Block Images)
Hi; Following patch silents drivers/mtd/ubi/vmt.c: In function `ubi_create_volume': drivers/mtd/ubi/vmt.c:379: warning: statement with no effect compiler warning introduced by commit 801c135ce73d5df1caf3eca35b66a10824ae0707 (UBI: Unsorted Block Images) Signed-off-by: S.Çağlar Onur <[EMAIL PROTECTED]> drivers/mtd/ubi/vmt.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/mtd/ubi/vmt.c b/drivers/mtd/ubi/vmt.c index a3ca225..eafeaf0 100644 --- a/drivers/mtd/ubi/vmt.c +++ b/drivers/mtd/ubi/vmt.c @@ -376,7 +376,7 @@ out_sysfs: get_device(>dev); volume_sysfs_close(vol); out_gluebi: - ubi_destroy_gluebi(vol); + err = ubi_destroy_gluebi(vol); out_cdev: cdev_del(>cdev); out_mapping: Cheers -- S.Çağlar Onur <[EMAIL PROTECTED]> http://cekirdek.pardus.org.tr/~caglar/ Linux is like living in a teepee. No Windows, no Gates and an Apache in house! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] drivers/media/video/em28xx/: Fix undefined symbol error with CONFIG_SND=N
Hi; Following patch fixes following undefined symbol errors with CONFIG_SND=N ERROR: "snd_pcm_period_elapsed" [drivers/media/video/em28xx/em28xx-alsa.ko] undefined! ERROR: "snd_pcm_hw_constraint_integer" [drivers/media/video/em28xx/em28xx-alsa.ko] undefined! ERROR: "snd_pcm_set_ops" [drivers/media/video/em28xx/em28xx-alsa.ko] undefined! ERROR: "snd_pcm_lib_ioctl" [drivers/media/video/em28xx/em28xx-alsa.ko] undefined! ERROR: "snd_card_new" [drivers/media/video/em28xx/em28xx-alsa.ko] undefined! ERROR: "snd_card_free" [drivers/media/video/em28xx/em28xx-alsa.ko] undefined! ERROR: "snd_card_register" [drivers/media/video/em28xx/em28xx-alsa.ko] undefined! ERROR: "snd_pcm_new" [drivers/media/video/em28xx/em28xx-alsa.ko] undefined! Signed-off-by: S.Çağlar Onur <[EMAIL PROTECTED]> drivers/media/video/em28xx/Kconfig |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers/media/video/em28xx/Kconfig b/drivers/media/video/em28xx/Kconfig index abbd38c..0f7a0bd 100644 --- a/drivers/media/video/em28xx/Kconfig +++ b/drivers/media/video/em28xx/Kconfig @@ -13,7 +13,8 @@ config VIDEO_EM28XX module will be called em28xx config VIDEO_EM28XX_ALSA - depends on VIDEO_EM28XX + depends on VIDEO_EM28XX && SND + select SND_PCM tristate "Empia EM28xx ALSA audio module" ---help--- This is an ALSA driver for some Empia 28xx based TV cards. Cheeer -- S.Çağlar Onur <[EMAIL PROTECTED]> http://cekirdek.pardus.org.tr/~caglar/ Linux is like living in a teepee. No Windows, no Gates and an Apache in house! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] CPU isolation extensions (updated)
Max wrote: > Linus, please pull CPU isolation extensions from Did I miss something in this discussion? I thought Ingo was quite clear, and Linus pretty clear too, that this patch should bake in *-mm or some such place for a bit first. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.940.382.4214 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Update kernel/.gitignore with new generated files
Hi; Following patch updates kernel/.gitignore with new auto-generated files Signed-off-by: S.Çağlar Onur <[EMAIL PROTECTED]> kernel/.gitignore |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/kernel/.gitignore b/kernel/.gitignore index f2ab700..ab4f109 100644 --- a/kernel/.gitignore +++ b/kernel/.gitignore @@ -3,3 +3,4 @@ # config_data.h config_data.gz +timeconst.h Cheers -- S.Çağlar Onur <[EMAIL PROTECTED]> http://cekirdek.pardus.org.tr/~caglar/ Linux is like living in a teepee. No Windows, no Gates and an Apache in house! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Update arch/x86/boot/.gitignore with new generated files
Hi; Following patch update arch/x86/boot/.gitignore with new auto-generated files Signed-off-by: S.Çağlar Onur <[EMAIL PROTECTED]> arch/x86/boot/.gitignore |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/arch/x86/boot/.gitignore b/arch/x86/boot/.gitignore index 1846514..b1bdc4c 100644 --- a/arch/x86/boot/.gitignore +++ b/arch/x86/boot/.gitignore @@ -3,3 +3,5 @@ bzImage setup setup.bin setup.elf +cpustr.h +mkcpustr Cheers -- S.Çağlar Onur <[EMAIL PROTECTED]> http://cekirdek.pardus.org.tr/~caglar/ Linux is like living in a teepee. No Windows, no Gates and an Apache in house! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[git pull] CPU isolation extensions (updated)
Linus, please pull CPU isolation extensions from git://git.kernel.org/pub/scm/linux/kernel/git/maxk/cpuisol-2.6.git for-linus Diffstat: Documentation/ABI/testing/sysfs-devices-system-cpu | 41 +++ Documentation/cpu-isolation.txt| 113 + arch/x86/Kconfig |1 arch/x86/kernel/genapic_flat_64.c |4 drivers/base/cpu.c | 48 include/linux/cpumask.h|3 kernel/Kconfig.cpuisol | 42 +++ kernel/Makefile|4 kernel/cpu.c | 54 ++ kernel/sched.c | 36 -- kernel/stop_machine.c |8 + kernel/workqueue.c | 30 - 12 files changed, 337 insertions(+), 47 deletions(-) This addresses all Andrew's comments for the last submission. Details here: http://marc.info/?l=linux-kernel=120236394012766=2 There are no code changes since last time, besides minor fix for moving on-stack array to __initdata as suggested by Andrew. Other stuff is just documentation updates. List of commits cpuisol: Make cpu isolation configrable and export isolated map cpuisol: Do not route IRQs to the CPUs isolated at boot cpuisol: Do not schedule workqueues on the isolated CPUs cpuisol: Move on-stack array used for boot cmd parsing into __initdata cpuisol: Documentation updates cpuisol: Minor updates to the Kconfig options cpuisol: Do not halt isolated CPUs with Stop Machine I suggested by Ingo I'm CC'ing everyone who is even remotely connected/affected ;-) Ingo, Peter - Scheduler. There are _no_ changes in this area besides moving cpu_*_map maps from kerne/sched.c to kernel/cpu.c. Paul - Cpuset Again there are _no_ changes in this area. For reasons why cpuset is not the right mechanism for cpu isolation see this thread http://marc.info/?l=linux-kernel=120180692331461=2 Rusty - Stop machine. After doing a bunch of testing last three days I actually downgraded stop machine changes from [highly experimental] to simply [experimental]. Pleas see this thread for more info: http://marc.info/?l=linux-kernel=120243837206248=2 Short story is that I ran several insmod/rmmod workloads on live multi-core boxes with stop machine _completely_ disabled and did no see any issues. Rusty did not get a chance to reply yet, I hopping that we'll be able to make "stop machine" completely optional for some configurations. Gerg - ABI documentation. Nothing interesting here. I simply added Documentation/ABI/testing/sysfs-devices-system-cpu and documented some of the attributes exposed in there. Suggested by Andrew. I believe this is ready for the inclusion and my impression is that Andrew is ok with that. Most changes are very simple and do not affect existing behavior. As I mentioned before I've been using Workqueue and StopMachine changes in production for a couple of years now and have high confidence in them. Yet they are marked as experimental for now, just to be safe. My original explanation is included below. btw I'll be out skiing/snow boarding for the next 4 days and will have sporadic email access. Will do my best to address question/concerns (if any) during that time. Thanx Max -- This patch series extends CPU isolation support. Yes, most people want to virtuallize CPUs these days and I want to isolate them :) . The primary idea here is to be able to use some CPU cores as the dedicated engines for running user-space code with minimal kernel overhead/intervention, think of it as an SPE in the Cell processor. I'd like to be able to run a CPU intensive (%100) RT task on one of the processors without adversely affecting or being affected by the other system activities. System activities here include _kernel_ activities as well. I'm personally using this for hard realtime purposes. With CPU isolation it's very easy to achieve single digit usec worst case and around 200 nsec average response times on off-the-shelf multi- processor/core systems (vanilla kernel plus these patches) even under extreme system load. I'm working with legal folks on releasing hard RT user-space framework for that. I believe with the current multi-core CPU trend we will see more and more applications that explore this capability: RT gaming engines, simulators, hard RT apps, etc. Hence the proposal is to extend current CPU isolation feature. The new definition of the CPU isolation would be: --- 1. Isolated CPU(s) must not be subject to scheduler load balancing Users must explicitly bind threads in order to run on those CPU(s). 2. By default interrupts
[PATCH] Silent compiler warning introduced by commit 75b6102257874a4ea796af686de2f72cfa0452f9 (rtc: add support for Epson RTC-9701JE V4)
Hi; Following patch silents drivers/rtc/rtc-r9701.c: In function `r9701_get_datetime': drivers/rtc/rtc-r9701.c:74: warning: unused variable `time' compiler warning introduced by commit 75b6102257874a4ea796af686de2f72cfa0452f9 (rtc: add support for Epson RTC-9701JE V4) Signed-off-by: S.Çağlar Onur <[EMAIL PROTECTED]> drivers/rtc/rtc-r9701.c |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/drivers/rtc/rtc-r9701.c b/drivers/rtc/rtc-r9701.c index a64626a..b35f9bf 100644 --- a/drivers/rtc/rtc-r9701.c +++ b/drivers/rtc/rtc-r9701.c @@ -71,7 +71,6 @@ static int read_regs(struct device *dev, unsigned char *regs, int no_regs) static int r9701_get_datetime(struct device *dev, struct rtc_time *dt) { - unsigned long time; int ret; unsigned char buf[] = { RSECCNT, RMINCNT, RHRCNT, RDAYCNT, RMONCNT, RYRCNT }; Cheers -- S.Çağlar Onur <[EMAIL PROTECTED]> http://cekirdek.pardus.org.tr/~caglar/ Linux is like living in a teepee. No Windows, no Gates and an Apache in house! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Silent compiler warning introduced by 11b0cc3a4af65413ca3bb5698769e091486e0b22 (x25_asy: Fix ref count rule violation)
Hi; Following patch silents drivers/net/wan/x25_asy.c: In function `x25_asy_open_tty': drivers/net/wan/x25_asy.c:557: warning: unused variable `ld' compiler warning introduced by commit 11b0cc3a4af65413ca3bb5698769e091486e0b22 (x25_asy: Fix ref count rule violation) Signed-off-by: S.Çağlar Onur <[EMAIL PROTECTED]> drivers/net/wan/x25_asy.c |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/drivers/net/wan/x25_asy.c b/drivers/net/wan/x25_asy.c index 5e2d763..0f8aca8 100644 --- a/drivers/net/wan/x25_asy.c +++ b/drivers/net/wan/x25_asy.c @@ -554,7 +554,6 @@ static void x25_asy_receive_buf(struct tty_struct *tty, const unsigned char *cp, static int x25_asy_open_tty(struct tty_struct *tty) { struct x25_asy *sl = (struct x25_asy *) tty->disc_data; - struct tty_ldisc *ld; int err; /* First make sure we're not already connected. */ Cheers -- S.Çağlar Onur <[EMAIL PROTECTED]> http://cekirdek.pardus.org.tr/~caglar/ Linux is like living in a teepee. No Windows, no Gates and an Apache in house! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Silent compiler warning introduced by acea6852f32b8805e166d885ed7e9f0c7cd10d41 ([BLUETOOTH]: Move children of connection device to NULL before connection down.)
Hi; Following patch silents net/bluetooth/hci_sysfs.c: In function `del_conn': net/bluetooth/hci_sysfs.c:339: warning: suggest parentheses around assignment used as truth value compiler warning introduced by commit acea6852f32b8805e166d885ed7e9f0c7cd10d41 ([BLUETOOTH]: Move children of connection device to NULL before connection down.) Signed-off-by: S.Çağlar Onur <[EMAIL PROTECTED]> net/bluetooth/hci_sysfs.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/bluetooth/hci_sysfs.c b/net/bluetooth/hci_sysfs.c index e13cf5e..d2d1e4f 100644 --- a/net/bluetooth/hci_sysfs.c +++ b/net/bluetooth/hci_sysfs.c @@ -336,7 +336,7 @@ static void del_conn(struct work_struct *work) struct device *dev; struct hci_conn *conn = container_of(work, struct hci_conn, work); - while (dev = device_find_child(>dev, NULL, __match_tty)) { + while ((dev = device_find_child(>dev, NULL, __match_tty)) != NULL) { device_move(dev, NULL); put_device(dev); } Cheers -- S.Çağlar Onur <[EMAIL PROTECTED]> http://cekirdek.pardus.org.tr/~caglar/ Linux is like living in a teepee. No Windows, no Gates and an Apache in house! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel Panic in MPT SAS on 2.6.24 (and 2.6.23.14, 2.6.23.9)
Am Friday, den 8 February hub Maximilian Wilhelm folgendes in die Tasten: > Just noticed that Eric's address was wrong, so resend with corrected Cc. > Eric, my intial report was http://lkml.org/lkml/2008/2/6/300 > > > Am Thursday, den 7 February hub Krzysztof Oledzki folgendes in die Tasten: > > > > Hi! > > > > > >While installing my new firewall I got the following kernel panic in > > > >the MPT SAS driver which I need for the disks. > > > > > >The first kernel I bootet was 2.6.23.14 which did panic so I tried a > > > >2.6.24 which panics, too. Our usual FAI kernel (2.6.23.9) is also > > > >affected. > > > > > Could you please try 2.6.22-stable? > > > > Yes it works :-/ > > > > I've put some things which on the web which might be helpful: > > > > dmesg http://files.rfc2324.org/mptsas_panic/2.6.22-dmesg > > lspci -vhttp://files.rfc2324.org/mptsas_panic/2.6.22-lspci-v > > .config http://files.rfc2324.org/mptsas_panic/2.6.22-config > > > > I'll search for the last working kernel and try to break it down to a > > commit tommorow when I can get a serial console or direct access. > > The Java driven console redirection is everything else than fulfilling :-( > > > > > It looks *very* similar to my problem: > > > > > http://bugzilla.kernel.org/show_bug.cgi?id=9909 > > > > It seems to be the same controller: > > > > 01:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E > > PCI-Express Fusion-MPT SAS (rev 08) > > Subsystem: Dell Unknown device 1f10 > > Flags: bus master, fast devsel, latency 0, IRQ 16 > > I/O ports at ec00 [size=256] > > Memory at fc8fc000 (64-bit, non-prefetchable) [size=16K] > > Memory at fc8e (64-bit, non-prefetchable) [size=64K] > > Expansion ROM at fc90 [disabled] [size=1M] > > Capabilities: [50] Power Management version 2 > > Capabilities: [68] Express Endpoint IRQ 0 > > Capabilities: [98] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 > > Enable- > > Capabilities: [b0] MSI-X: Enable- Mask- TabSize=1 I did a git bisect between v2.6.22 v2.6.23 and it seems that 6cb8f91320d3e720351c21741da795fed580b21b introduced some badness. ---snip--- Fusion MPT base driver 3.04.05 Copyright (c) 1999-2007 LSI Logic Corporation Fusion MPT SAS Host driver 3.04.05 mptbase: Initiating ioc0 bringup ioc0: SAS1068E: Capabilities={Initiator} scsi0 : ioc0: LSISAS1068E, FwRev=00142e00h, Ports=1, MaxQ=511, IRQ=16 scsi 0:0:0:0: Direct-Access SEAGATE ST973402SS S207 PQ: 0 ANSI: 5 scsi 0:0:1:0: Direct-Access SEAGATE ST973402SS S207 PQ: 0 ANSI: 5 BUG: unable to handle kernel NULL pointer dereference at virtual address 0028 printing eip: c014b8ca *pde = Oops: [#1] SMP Modules linked in: CPU:6 EIP:0060:[]Not tainted VLI EFLAGS: 00010046 (2.6.22-g6cb8f913 #13) EIP is at __kmalloc+0x35/0x5f eax: 0006 ebx: 0246 ecx: c03fa820 edx: 00d0 esi: 0010 edi: ebp: c23a4000 esp: c2143dbc ds: 007b es: 007b fs: 00d8 gs: ss: 0068 Process swapper (pid: 1, ti=c2142000 task=c2141670 task.ti=c2142000) Stack: c22a3e80 c013cba9 c22a3e80 c22a3e80 c2399800 c02bcb67 0020 c2399800 00100100 00200200 00200200 fffefe48 c02ba15d c2399800 c219 c2143e1c 023a4000 0001 000a0001 c02b Call Trace: [] __kzalloc+0xd/0x34 [] mptsas_sas_expander_pg0+0x110/0x181 [] mpt_timer_expired+0x0/0x28 [] megasas_lookup_instance+0x9/0x2e [] mptsas_probe_expander_phys+0x42/0x395 [] mpt_timer_expired+0x0/0x28 [] mpt_timer_expired+0x0/0x28 [] mptsas_probe+0x309/0x387 [] pci_device_probe+0x36/0x57 [] driver_probe_device+0xe1/0x15f [] klist_next+0x4b/0x6b [] __driver_attach+0x0/0x79 [] __driver_attach+0x46/0x79 [] bus_for_each_dev+0x33/0x55 [] driver_attach+0x16/0x18 [] __driver_attach+0x0/0x79 [] bus_add_driver+0x6d/0x16d [] __pci_register_driver+0x48/0x74 [] kernel_init+0x14a/0x2ac [] ret_from_fork+0x6/0x1c [] kernel_init+0x0/0x2ac [] kernel_init+0x0/0x2ac [] kernel_thread_helper+0x7/0x10 === Code: 3f c0 85 c0 75 05 eb 1a 83 c1 0c 3b 01 77 f9 f6 c2 01 74 05 8b 71 08 eb 03 8b 71 04 31 c0 85 f6 74 30 9c 5b fa 64 a1 08 b0 46 c0 <8b> 0c 86 83 39 00 74 12 c7 41 0c 01 00 00 00 8b 01 48 89 01 8b EIP: [] __kmalloc+0x35/0x5f SS:ESP 0068:c2143dbc ---snip--- A simple git revert did not work on the current git and I don't want to fiddle around in this area, so I couldn't test further. Ciao Max -- Follow the white penguin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] USB: mark USB drivers as being GPL only
Marcel Holtmann wrote: > Lets phrase this in better words as Valdis pointed out: You can't > distribute an application (binary or source form) under anything else > than GPL if it uses a GPL library. This simply cannot be correct. The only way it could be true is if the work was a derivative work of a GPL'd work. There is no other way it could become subject to the GPL. So this argument reduces to -- any work that uses a library is a derivative work of that library. But this is clearly wrong. For work X to be a derivative work of work Y, it must contain substantial protected expression from work Y, but an application need not have any expression from the libraries it uses. > It makes no difference if you > distribute the GPL library with it or not. If you do not distribute the GPL library, the library is simply being used in the intended, ordinary way. You do not need to agree to, nor can you violate, the GPL simply by using a work in its ordinary intended way. If the application contains insufficient copyrightable expression from the library to be considered a derivative work (and purely functional things do not count), then it cannot be a derivative work. The library is not being copied or distributed. So how can its copyright be infringed? > But hey (again), feel free to disagree with me here. This argument has no basis in law or common sense. It's completely off-the-wall. And to Pekka Enberg: >It doesn't matter how "hard" it was to write that code. What matters >is whether your code requires enough copyrighted aspects of the >original work to constitute as derived work. There's a huge difference >between using kmalloc and spin_lock and writing a driver that is built >on to of the full USB stack of Linux kernel, for example. The legal standard is not whether it "requires" copyrighted aspects but whether it *contains* them. The driver does not contain the USB stack. The aspects of the USB stack that the driver must contain are purely functional -- its API. You simply can't have it both ways. If the driver must contain X in order to do its job, then X is functional and cannot make the driver a derivative work. You cannot protect, by copyright, every way to accomplish a particular function. Copyright only protects creative choices among millions of (at least arguably) equally good choices. DS -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: DMAR EHCI failures
On Monday 04 February 2008, Jiri Slaby wrote: > Hi, > > I have this in dmesg: > DMAR:[DMA Write] Request device [00:02.0] fault addr ee1512000 > DMAR:[fault reason 05] PTE Write access is not set > PCI-DMA: Intel(R) Virtualization Technology for Directed I/O > DMAR:[DMA Read] Request device [00:1d.7] fault addr 7d5f > DMAR:[fault reason 06] PTE Read access is not set > DMAR:[DMA Read] Request device [00:1a.7] fault addr 7d5f1000 > DMAR:[fault reason 06] PTE Read access is not set > PCI-GART: No AMD northbridge found. > DMAR:[DMA Read] Request device [00:1a.2] fault addr 7d5f7000 > DMAR:[fault reason 06] PTE Read access is not set > > CONFIG_DMAR=y > CONFIG_DMAR_GFX_WA=y > CONFIG_DMAR_FLOPPY_WA=y > > Without the gfx workaround, there is much more output regarding 00:02.0. Is > there problem with broken hw, bios or drivers? No idea. Someone who knows the DMA Remapping stuff should have an answer. Presumably it works with DMAR disabled, yes? If so, then just don't use DMAR. :) > /sys/firmware/acpi/tables/DMAR: > http://www.fi.muni.cz/~xslaby/sklad/DMAR.bin > dmesg: > http://www.fi.muni.cz/~xslaby/sklad/DMAR.dmesg > > # for a in 00:02.0 00:1d.7 00:1a.7 00:1a.2; do lspci -vxxx -s $a; done > 00:02.0 VGA compatible controller: Intel Corporation 82G33/G31 Express > Integrated Graphics Controller (rev 02) (prog-if 00 [VGA controller]) > ... deletia ... > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [sample] mem_notify v6: usage example
On Sat 2008-02-09 11:07:09, Jon Masters wrote: > This really needs to be triggered via a generic kernel > event in the final version - I picture glibc having a > reservation API and having generic support for freeing > such reservations. Not sure what you are talking about. This seems very right to me. We want memory-low notification, not yet another generic communication mechanism. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Define a NO_GPIO macro to compare against and to use as an invalid GPIO
On Saturday 09 February 2008, Guennadi Liakhovetski wrote: > On Fri, 8 Feb 2008, David Brownell wrote: > > > Actually I thought that what you needed was an is_valid_gpio(); > > your motivation was that you needed a predicate. > > > > The problem I have with a #define for a single such invalid GPIO > > number is that people will inevitably start to assume it's the > > only such number. In particular "if (gpio == NO_GPIO) ..." > > is by definition incorrect. > > > > So I'd really rather see a predicate like is_valid_gpio(). > > > > If you want to designate one value for use as an initializer, > > then I'd rather see a simple > > > > #define NO_GPIO (-EINVAL) > > > > without any option for arch-specific overrides ... along with a > > comment that this is only *one* of the numerous values which > > will fail is_valid_gpio(). > > I was thinking about irq numbers and trying to avoid as early as possible > their problem: namely that each and every platform has its own idea of > which irq numbers are valid and which are not, some use 0 as invalid irq, > some -1, some 256, etc. That problem came about mostly because the definition was not part of the original interface definition. Not unlike DMA addressing ... for the longest time it was impossible to report DMA mapping failures. Whereas there's *never* been any question about whether negative numbers are invalid GPIO numbers. (They aren't.) > And when those platforms share drivers, problems > arise. And the simple and efficient NO_IRQ notion, that would fis those > problems nicely, cannot seem to establish itself. Inertia is one of the problems there ... plus, the only obvious advantage of "#define NO_IRQ 0" is that it makes it easier to be lazy about initialization. Plus, changing platforms to use that convention means they mostly need to adopt an *unnatural* step of mapping from the hardware IRQ numbers (which often start at zero, as they do on one system I just ssh'd into) to some "logical" ID. Even if you believe that's worthwhile, it's work; and it could easily break something. > The disadvantages I see in your suggestions are: > > 1. two accessors (is_valid_gpio() and NO_GPIO) instead of one Neither of those is an "accessor". One is a "predicate"; and the other is an "initializer". (A better initializer name might be more like INIT_GPIO_INVALID.) The "accessor" scenario is actually a natural place to rely on errno values. Accessors are like int gpio = foo_get_gpio(foo_ptr); And the normal kernel convention there is to return negative errno values that characterize the different fault modes. (Ditto allocators: foo_alloc_gpio etc.) > 2. have to include errno.h Which most code already does. And you'd certainly want to do that if you were using an accessor to get GPIOs... > 3. it doesn't seem very logical to me to define a gpio number in terms of >an error code It's not a GPIO number though; it's specifically designated as NOT being a GPIO. So why not have it be a magic number which has meaning in multiple contexts? Do you object to "ssize_t", or in general the "return negative errno on fault" conventions? > 4. "confusing freedom" - NO_GPIO is the invalid gpio number, but, in fact, >you can use just any negative number I don't see any reason to change the API to disallow using other negative values there. What good would come from that? (Remember, the *CURRENT* definition covers this situation by saying no negative number is a valid GPIO number.) At the machine instruction level, comparing against "-1" or any other single currently-defined-as-invalid number is more expensive than checking "is it negative". And at a higher level, you'd prevent normal accessor (or allocator, etc) idioms. I can't see any value to preventing such usage. > Advantages of my proposal: > > 1. simplicity - only one macro, and "well-definedness" - use this and only >this as invalid gpio number. The rest are either valid, or undefined. It's currently simple and well defined; negative numbers are not GPIOs. You want a *different* model, which is in fact more complex ... it adds that "undefined" notion. > 2. overridable by platforms - though I don't have any examples at hand, I >can imagine, that some platforms would prefer some specific "natural" >for them numbers. They can already pick any positive number. I don't know about you, but I *shudder* to think of anyone who's seriously trying to manage more than 2 Gbits of GPIOs one bit at a time! > But, this is not something I would spend too much energy arguing about, > and this is your code in the end:-) So, if you still disagree, I'll do it > the way you suggest. I might well be wrong too:-) Well, you've not convinced me there's any reason to change the current rules to prevent accessor/allocator idioms from returning fault codes that could be meaningful. - Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel"
Re: [git pull] x86 updates
On Sun, 10 Feb 2008 00:24:50 +0100 (CET) Thomas Gleixner wrote: > Linus, > > please pull the pending x86 updates from: > > ssh://master.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git > master Hi Thomas, can we please get diffstats with git pull requests? (in the future) > The update contains: > > - a couple of bugfixes > - CPA and DEBUG_PAGEALLOC improvements > - x86 power management consolidation > - GEODE updates > - 32bit boot time page table construction rework > - sparse and compile warning fixes > - trivial cleanups > > There are two patches out of x86 scope as well: > > - lguest bugfix: x86 broke it, fix is obviously correct and Rusty > is away > > - randomization docs: resulted out of a x86 randomization > discussion > > Thanks, > > tglx > > > > Ahmed S. Darwish (1): > lguest: accept guest _PAGE_PWT page table entries > > Andres Salomon (5): > x86: GEODE: MFGPT: Minor cleanups > x86: GEODE: MFGPT: drop module owner usage from MFGPT API > x86: GEODE: MFGPT: replace 'flags' field with 'avail' bit > x86: GEODE: MFGPT: make mfgpt_timer_setup available outside of > mfgpt_32.c > x86: GEODE: MFGPT: fix a potential race when disabling a timer > > Arnd Hannemann (1): > x86: GEODE: MFGPT: fix typo in printk in mfgpt_timer_setup > > Denys Vlasenko (1): > x86: trivial printk optimizations > > Harvey Harrison (6): > x86: fix sparse warning in xen/time.c > x86: sparse warning in therm_throt.c > x86: sparse warnings in pageattr.c > x86: fix sparse warning in topology.c > x86: fix sparse warnings in acpi/bus.c > x86, core: remove CONFIG_FORCED_INLINING > > Ian Campbell (2): > x86: construct 32-bit boot time page tables in native format. > x86: fix early_ioremap pagetable ops > > Ingo Molnar (2): > x86: fixup more paravirt fallout > brk: help text typo fix > > Jiri Kosina (1): > brk: document randomize_va_space and CONFIG_COMPAT_BRK (was Re: > > Jordan Crouse (2): > x86: GEODE: MFGPT: Use "just-in-time" detection for the MFGPT timers > x86: GEODE: make sure the right MFGPT timer fired the timer tick > > Rafael J. Wysocki (4): > x86 PM: move 64-bit hibernation files to arch/x86/power > x86 PM: rename 32-bit files in arch/x86/power > x86 PM: consolidate suspend and hibernation code > x86 PM: update stale comments > > Thomas Gleixner (6): > x86: avoid unused variable warning in mm/init_64.c > x86: DEBUG_PAGEALLOC: enable after mem_init() > x86: introduce page pool in cpa > x86: cpa, use page pool > x86: cpa, enable CONFIG_DEBUG_PAGEALLOC on 64-bit > x86: cpa, strict range check in try_preserve_large_page() > > Willy Tarreau (1): > x86: GEODE fix MFGPT input clock value --- ~Randy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: how to tell i386 from x86-64 kernel
On Sat 2008-02-09 14:34:30, Arjan van de Ven wrote: > On Sat, 9 Feb 2008 21:13:43 +0100 (CET) > Jan Engelhardt <[EMAIL PROTECTED]> wrote: > > > > > On Feb 1 2008 12:53, Alejandro Riveira Fernández wrote: > > >> > > >> # uname -m > > >> I won't tell you. > > >> # linux32 uname -m > > >> i686 > > > > > > Ubuntu 7.10 64 bit userland 2.6.24 > > > > > >$ uname -m > > >x86_64 > > >$ linux32 uname -m > > >i686 > > > > What I am saying is that uname(2) does not reliably tell you whether > > you have a 64-bit kernel underneath unless you have other sources of > > information. > > that's sort of a rabbit-and-the-frog problem. The 32 bit emulator tries to > look EXACTLY like the 32 bit kernel, and it really should. > If someone wants a method to detect even that... we would really want > to know the exact usecase.. because very likely it's the wrong answer > to some other problem ;-) dmesg should really really tell you 32 vs. 64 bit, at the first line where it prints versions... so you easily know what you are dealing with when someone sends a bugreport. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] ipvs: Cleanup sync daemon code
On Sun, Feb 10, 2008 at 12:38:11AM +0100, Sven Wegener wrote: > Hi all, > > I'd like to get your feedback on this: > > - Use kthread_run instead of doing a double-fork via kernel_thread() > > - Return proper error codes to user-space on failures > > Currently ipvsadm --start-daemon with an invalid --mcast-interface will > silently suceed. With these changes we get an appropriate "No such > device" error. > > - Use wait queues for both master and backup thread > > Instead of doing an endless loop with sleeping for one second, we now use > wait queues. The master sync daemon has its own wait queue and gets woken > up when we have enough data to sent and also at a regular interval. The > backup sync daemon sits on the wait queue of the mcast socket and gets > woken up as soon as we have data to process. Hi Sven, This looks good to me, assuming that its tested and works. A few minor things: In sb_queue_tail() master loop is woken up if the ip_vs_sync_count reaches 10, which seems a bit arbitary. Perhaps its just my mail reader, but the patch seemed a bit screwy when I saved it to a file. I this fixed the problem I was seeing using s/^ / / Unfortuantely/Fortunately I am about to leave for a few days skiing, so if I am quiet you will know why. Acked-by: Simon Horman <[EMAIL PROTECTED]> -- Horms -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-git20 -- BUG: sleeping function called from invalid context at include/asm/uaccess_32.h:449
On Sat, 9 Feb 2008 16:26:43 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote: > Ugh, how did I let that one through? > > Guys, how often mut it be said? PLEASE always test all code with all > kernel deubg options enabled. maybe we should make a CONFIG_KERNEL_DEVELOPER option that SELECTs the various options that really should be on. If there is a chance of reasonable agreement on what those options are I'll cook up a patch... (but I don't want to get bogged down in a 500 mail flamewar about CONFIG_FOO_BAR being right for this or not...) -- If you want to reach me at my work email, use [EMAIL PROTECTED] For development, discussion and tips for power savings, visit http://www.lesswatts.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-git20 -- BUG: sleeping function called from invalid context at include/asm/uaccess_32.h:449
On Sat, 9 Feb 2008 14:03:28 -0500 "Miles Lane" <[EMAIL PROTECTED]> wrote: > Command run: > find /proc | xargs tail > > [ 2710.028219] BUG: sleeping function called from invalid context at > include/asm/uaccess_32.h:449 > [ 2710.028229] in_atomic():1, irqs_disabled():0 > [ 2710.028232] 1 lock held by head/9380: > [ 2710.028234] #0: (hugetlb_lock){--..}, at: [] > hugetlb_overcommit_handler+0x16/0x3e > [ 2710.028248] Pid: 9380, comm: head Not tainted 2.6.24-git20 #5 > [ 2710.028260] [] __might_sleep+0xc2/0xc9 > [ 2710.028267] [] copy_to_user+0x32/0x49 > [ 2710.028277] [] do_proc_doulongvec_minmax+0x1df/0x27f > [ 2710.028289] [] proc_doulongvec_minmax+0x15/0x17 > [ 2710.028295] [] hugetlb_overcommit_handler+0x2a/0x3e > [ 2710.028303] [] proc_sys_read+0x5e/0x7b > [ 2710.028311] [] ? proc_sys_read+0x0/0x7b > [ 2710.028317] [] vfs_read+0x8a/0x106 > [ 2710.028325] [] sys_read+0x3b/0x60 > [ 2710.028331] [] sysenter_past_esp+0x5f/0xa5 Ugh, how did I let that one through? Guys, how often mut it be said? PLEASE always test all code with all kernel deubg options enabled. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Sectionized printk data
Em Sun, Feb 10, 2008 at 01:18:18AM +0100, Jan Engelhardt escreveu: > > On Feb 9 2008 21:54, Arnaldo Carvalho de Melo wrote: > >> To drop strings that are only shown once anyway, such as: > >> > >> static int __init ebtables_init(void) > >> { > >> int ret; > >> > >> mutex_lock(_mutex); > >> list_add(_standard_target.list, _targets); > >> mutex_unlock(_mutex); > >> if ((ret = nf_register_sockopt(_sockopts)) < 0) > >> return ret; > >> > >> -> printk(KERN_INFO "Ebtables v2.0 registered\n"); > >> return 0; > >> } > >> > >> >If you say "saving memory" then please let us know with specific examples > >> >in what area these savings will really pay off. > > > >[...] > >With a tool like this the advantage is that no source code has to be > >changed, strings in __init functions are automagically moved to > >.init.data, the disadvantage is that not all strings can be moved to > >.init.data as there were (are?) subsystems that keep pointers to the > >string passed and another tool would be involved in the build process. > > There is one corner case to consider: > > > static char abc[] = "foo"; > > int __init init_module(void) > { > printk(abc); > } > > I am not sure if gcc/ld is smart enough to figure out that abc is > only ever used from within an __init function and that it could hence > be moved to __initdata. The initstr tool mentioned doesn't touches this case, as it doesn't searches specific functions such as printk, it looks for strings inside __init marked functions. In the above example abc won't be marked as __initdata. So if there are two places where the same string is used, with one being in a __init function one copy goes to .init.data and another to .data. - Arnaldo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mm/slub.c warnings
On Sat, 9 Feb 2008, Vegard Nossum wrote: > Hi, > > I get these warnings when compiling mm/slub.c in linux-2.6.git: > > mm/slub.c: In function 'slab_alloc': > mm/slub.c:1637: warning: assignment makes pointer from integer without a cast > mm/slub.c:1637: warning: assignment makes pointer from integer without a cast > mm/slub.c: In function 'slab_free': > mm/slub.c:1796: warning: assignment makes pointer from integer without a cast > mm/slub.c:1796: warning: assignment makes pointer from integer without a cast > > The actual lines are calls to cmpxchg_local(). This is probably > because I'm compiling with M386. I'm guessing the source of the > warnings is in include/asm-x86/cmpxchg_32.h, lines 283 and 286. Config > attached. Hmmm.. That cmpxchg local needs to be fixed? Mathieu? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Sectionized printk data
On Feb 9 2008 21:54, Arnaldo Carvalho de Melo wrote: >> To drop strings that are only shown once anyway, such as: >> >> static int __init ebtables_init(void) >> { >> int ret; >> >> mutex_lock(_mutex); >> list_add(_standard_target.list, _targets); >> mutex_unlock(_mutex); >> if ((ret = nf_register_sockopt(_sockopts)) < 0) >> return ret; >> >> -> printk(KERN_INFO "Ebtables v2.0 registered\n"); >> return 0; >> } >> >> >If you say "saving memory" then please let us know with specific examples >> >in what area these savings will really pay off. > >[...] >With a tool like this the advantage is that no source code has to be >changed, strings in __init functions are automagically moved to >.init.data, the disadvantage is that not all strings can be moved to >.init.data as there were (are?) subsystems that keep pointers to the >string passed and another tool would be involved in the build process. There is one corner case to consider: static char abc[] = "foo"; int __init init_module(void) { printk(abc); } I am not sure if gcc/ld is smart enough to figure out that abc is only ever used from within an __init function and that it could hence be moved to __initdata. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Define a NO_GPIO macro to compare against and to use as an invalid GPIO
On Fri, 8 Feb 2008, David Brownell wrote: > On Thursday 31 January 2008, Guennadi Liakhovetski wrote: > > As discussed on i2c mailing list with David Brownell, and number > > outside of the 0...MAX_INT range is invalid as a GPIO number. > > Define a macro, similar to NO_IRQ, to be used as a deliberate > > invalid GPIO, rather than defining a is_valid_gpio() macro. > > Actually I thought that what you needed was an is_valid_gpio(); > your motivation was that you needed a predicate. > > The problem I have with a #define for a single such invalid GPIO > number is that people will inevitably start to assume it's the > only such number. In particular "if (gpio == NO_GPIO) ..." > is by definition incorrect. > > So I'd really rather see a predicate like is_valid_gpio(). > > If you want to designate one value for use as an initializer, > then I'd rather see a simple > > #define NO_GPIO (-EINVAL) > > without any option for arch-specific overrides ... along with a > comment that this is only *one* of the numerous values which > will fail is_valid_gpio(). I was thinking about irq numbers and trying to avoid as early as possible their problem: namely that each and every platform has its own idea of which irq numbers are valid and which are not, some use 0 as invalid irq, some -1, some 256, etc. And when those platforms share drivers, problems arise. And the simple and efficient NO_IRQ notion, that would fis those problems nicely, cannot seem to establish itself. The disadvantages I see in your suggestions are: 1. two accessors (is_valid_gpio() and NO_GPIO) instead of one 2. have to include errno.h 3. it doesn't seem very logical to me to define a gpio number in terms of an error code 4. "confusing freedom" - NO_GPIO is the invalid gpio number, but, in fact, you can use just any negative number Advantages of my proposal: 1. simplicity - only one macro, and "well-definedness" - use this and only this as invalid gpio number. The rest are either valid, or undefined. 2. overridable by platforms - though I don't have any examples at hand, I can imagine, that some platforms would prefer some specific "natural" for them numbers. But, this is not something I would spend too much energy arguing about, and this is your code in the end:-) So, if you still disagree, I'll do it the way you suggest. I might well be wrong too:-) Thanks Guennadi --- Guennadi Liakhovetski -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [resend] 3c509: convert to isa_driver and pnp_driver v4
On Saturday 09 February 2008 22:48:05 Stephen Hemminger wrote: > On Sat, 9 Feb 2008 22:33:07 +0100 > > Ondrej Zary <[EMAIL PROTECTED]> wrote: > > Hello, > > this patch converts 3c509 driver to isa_driver and pnp_driver. The result > > is that autoloading using udev and hibernation works with ISA PnP cards. > > It also adds hibernation support for non-PnP ISA cards. > > > > xcvr module parameter was removed as its value was not used. > > > > Tested using 3 ISA cards in various combinations of PnP and non-PnP > > modes. EISA and MCA only compile-tested. > > > > Signed-off-by: Ondrej Zary <[EMAIL PROTECTED]> > > > > --- linux-2.6.24-orig/drivers/net/3c509.c 2008-01-27 19:48:19.0 > > +0100 +++ linux-2.6.24-pentium/drivers/net/3c509.c 2008-02-07 > > 17:58:45.0 +0100 @@ -54,25 +54,24 @@ > > v1.19a 28Oct2002 Davud Ruggiero <[EMAIL PROTECTED]> > > - Increase *read_eeprom udelay to workaround oops with > > 2 cards. > > v1.19b 08Nov2002 Marc Zyngier <[EMAIL PROTECTED]> > > - - Introduce driver model for EISA cards. > > + - Introduce driver model for EISA cards. > > + v1.20 04Feb2008 Ondrej Zary <[EMAIL PROTECTED]> > > + - convert to isa_driver and pnp_driver and some cleanups > > */ > > Don't bother with comment, kernel uses git change log to figure out > who to blame. > > > #define DRV_NAME "3c509" > > -#define DRV_VERSION"1.19b" > > -#define DRV_RELDATE"08Nov2002" > > +#define DRV_VERSION"1.20" > > +#define DRV_RELDATE"04Feb2008" > > > > /* A few values that may be tweaked. */ > > > > /* Time in jiffies before concluding the transmitter is hung. */ > > #define TX_TIMEOUT (400*HZ/1000) > > -/* Maximum events (Rx packets, etc.) to handle at each interrupt. */ > > -static int max_interrupt_work = 10; > > > > #include > > -#ifdef CONFIG_MCA > > #include > > -#endif > > -#include > > +#include > > +#include > > #include > > #include > > #include > > @@ -97,10 +96,6 @@ > > > > static char version[] __initdata = DRV_NAME ".c:" DRV_VERSION " " > > DRV_RELDATE " [EMAIL PROTECTED]"; > > > > -#if defined(CONFIG_PM) && (defined(CONFIG_MCA) || defined(CONFIG_EISA)) > > -#define EL3_SUSPEND > > -#endif > > - > > #ifdef EL3_DEBUG > > static int el3_debug = EL3_DEBUG; > > #else > > @@ -111,6 +106,7 @@ > > * a global variable so that the mca/eisa probe routines can increment > > * it */ > > static int el3_cards = 0; > > +#define EL3_MAX_CARDS 8 > > > > /* To minimize the size of the driver source I only define operating > > constants if they are used several times. You'll need the manual > > @@ -119,7 +115,7 @@ > > #define EL3_DATA 0x00 > > #define EL3_CMD 0x0e > > #define EL3_STATUS 0x0e > > -#define EEPROM_READ 0x80 > > +#defineEEPROM_READ 0x80 > > > > #define EL3_IO_EXTENT 16 > > > > @@ -168,23 +164,31 @@ > > */ > > #define SKB_QUEUE_SIZE 64 > > > > +typedef enum { EL3_ISA, EL3_PNP, EL3_MCA, EL3_EISA } el3_cardtype; > > + > > No typedef please (see checkpatch) Is there any standard way to solve this without a typedef? I added el3_dev_fill() function which fills that card type value according to a parameter passed to it. "int" could be used instead and "#define EL3_ISA 0", "#define EL3_PNP 1" - but I think that's ugly. > > > struct el3_private { > > struct net_device_stats stats; > > Use network device stats in net_device now OK, looks like the driver will need some more patches. > > - struct net_device *next_dev; > > spinlock_t lock; > > /* skb send-queue */ > > int head, size; > > struct sk_buff *queue[SKB_QUEUE_SIZE]; > > What about sk_buff_head (linked list instead)? I don't know anything about this, maybe in next patch. > > > - enum { > > - EL3_MCA, > > - EL3_PNP, > > - EL3_EISA, > > - } type; /* type of device */ > > - struct device *dev; > > + el3_cardtype type; > > }; > > -static int id_port __initdata = 0x110; /* Start with 0x110 to avoid new > > sound cards.*/ -static struct net_device *el3_root_dev; > > +static int id_port; > > +static int current_tag; > > +static struct net_device *el3_devs[EL3_MAX_CARDS]; > > I know is only ISA, but having a limit seems silly, can't the device just > use allocated space like other drivers. EL3_MAX_CARDS is also used as a parameter to isa_register_driver(). The irq[] array (see below) is limited to 8 devices too. And finally, the card itself can use one of 8 different IRQs (3,5,7,2/9,10,11,12,15). So I think that it's not worth adding more code to support more cards. The original driver will do bad things with more than 8 cards too - read beyond the end of irq[] array. > > + > > +/* Parameters that may be passed into the module. */ > > +static int debug = -1; > > +static int irq[] = {-1, -1, -1, -1, -1, -1, -1, -1}; > > +/* Maximum events (Rx packets,
Re: Query about set_pages_* API
On Sat, 09 Feb 2008 15:40:12 -0700 Larry Finger <[EMAIL PROTECTED]> wrote: > Is the set_pages_* API that replaces change_page_attr described > somewhere? I have been unable to find it with Google. > > I'm trying to modify the VirtualBox kernel module to work with > 2.6.24-git (and 2.6.25) on x86_64 architecture. The current code has > a value of the third argument of the call (prot) with 3 variants. All > variations have the following bits set: _PAGE_PRESENT, _PAGE_RW, > _PAGE_DIRTY, and _PAGE_ACCESSED. Number 2 adds _PAGE_NX to the above, > and number 3 adds _PAGE_GLOBAL to the bits in variation 1. > > From the code in arch/x86/mm/pageattr.c, I figured I need to call > set_pages_wb() unconditionally, and set_pages_nx() if _PAGE_NX is > set. Will these calls be sufficient? I thought about calling > set_pages_rw(), but that entry is not exported. > ok looking at the actual code.. it seems to only care about making a piece of memory executable (and then clearing it before freeing), so all you need is set_memory_x() and set_memory_nx() -- If you want to reach me at my work email, use [EMAIL PROTECTED] For development, discussion and tips for power savings, visit http://www.lesswatts.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Query about set_pages_* API
On Sat, 09 Feb 2008 15:40:12 -0700 Larry Finger <[EMAIL PROTECTED]> wrote: > I'm trying to modify the VirtualBox kernel module to work with > 2.6.24-git (and 2.6.25) on x86_64 architecture. The current code has > a value of the third argument of the call (prot) with 3 variants. All > variations have the following bits set: _PAGE_PRESENT, _PAGE_RW, > _PAGE_DIRTY, and _PAGE_ACCESSED. Number 2 adds _PAGE_NX to the above, > and number 3 adds _PAGE_GLOBAL to the bits in variation 1. > > From the code in arch/x86/mm/pageattr.c, I figured I need to call > set_pages_wb() unconditionally, and set_pages_nx() if _PAGE_NX is > set. Will these calls be sufficient? I thought about calling > set_pages_rw(), but that entry is not exported. it depends on what the code is trying to achieve. (this makes it not a trivial 1:1 scripted replacement ;-) Which attribute is the code trying to change? Is it trying to make a piece of code (non) cachable? or executable? You need to figure out what the intent is.. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Sectionized printk data
Em Sat, Feb 09, 2008 at 11:08:45PM +0100, Jan Engelhardt escreveu: > > On Feb 4 2008 19:07, Sam Ravnborg wrote: > >> The attached patch allows something along the lines: > >> > >> int __init some_function(void) > >> { > >> [...] > >> pr_init(KERN_WARNING "failure %s in %s\n", ...); > >> [...] > >> } > >> > >> Another idea I had was to make printk a macro that figures out the > >> section of the surrounding function and then moves the data > >> automatically when it is a literal, but I couldn't find mechanisms that > >> allow this. Anyone of you got an idea? > >> > >> What do you think in general? > > > >What is the rationale behind this? > > To drop strings that are only shown once anyway, such as: > > static int __init ebtables_init(void) > { > int ret; > > mutex_lock(_mutex); > list_add(_standard_target.list, _targets); > mutex_unlock(_mutex); > if ((ret = nf_register_sockopt(_sockopts)) < 0) > return ret; > > -> printk(KERN_INFO "Ebtables v2.0 registered\n"); > return 0; > } > > >If you say "saving memory" then please let us know with specific examples > >in what area these savings will really pay off. A long time ago I played with this, using a sparse based tool that was inserted as the compiler and modified the code before passing to gcc, i.e. a pre-pre-processor: http://www.kernel.org/pub/linux/kernel/people/acme/sparse/initstr.c I couldn't find in the archives, but IIRC some extra pages were freed after boot, i.e. strings moved from .data to .init.data. With a tool like this the advantage is that no source code has to be changed, strings in __init functions are automagically moved to .init.data, the disadvantage is that not all strings can be moved to .init.data as there were (are?) subsystems that keep pointers to the string passed and another tool would be involved in the build process. - Arnaldo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fwd: Re: e1000 1sec latency problem
Ray Lee wrote: > On Feb 9, 2008 1:51 PM, Kok, Auke <[EMAIL PROTECTED]> wrote: >> Martin Rogge wrote: >>> On Saturday 09 February 2008 11:07:26 Martin Rogge wrote: Hi, I am not so familiar with the various mailing lists and missed out on [EMAIL PROTECTED] the first time. Please cc me on any replies. I am looking for help with either making the e1000e driver work on my Thinkpad T60 or fixing the 1s latency issue with e1000. To be honest, I do not understand why the e1000e driver failed to recognize the NIC when I tried. At least, I noticed the correct device ID is defined in drivers/net/e1000e/hw.h: #define E1000_DEV_ID_82573L0x109A Any help is appreciated. Thanks, Martin -- Forwarded Message -- Subject: Re: e1000 1sec latency problem Date: Thursday 07 February 2008 From: Martin Rogge <[EMAIL PROTECTED]> To: linux-kernel@vger.kernel.org Pavel Machek wrote: > Hi! > > I have the famous e1000 latency problems: Hi, I have the same problem with my Thinkpad T60. [EMAIL PROTECTED]:~# ping arnold PING arnold (192.168.158.6) 56(84) bytes of data. 64 bytes from arnold (192.168.158.6): icmp_seq=1 ttl=64 time=49.7 ms 64 bytes from arnold (192.168.158.6): icmp_seq=2 ttl=64 time=0.438 ms 64 bytes from arnold (192.168.158.6): icmp_seq=3 ttl=64 time=1000 ms 64 bytes from arnold (192.168.158.6): icmp_seq=4 ttl=64 time=0.970 ms 64 bytes from arnold (192.168.158.6): icmp_seq=5 ttl=64 time=885 ms 64 bytes from arnold (192.168.158.6): icmp_seq=6 ttl=64 time=0.484 ms 64 bytes from arnold (192.168.158.6): icmp_seq=7 ttl=64 time=529 ms 64 bytes from arnold (192.168.158.6): icmp_seq=8 ttl=64 time=1.02 ms 64 bytes from arnold (192.168.158.6): icmp_seq=9 ttl=64 time=149 ms 64 bytes from arnold (192.168.158.6): icmp_seq=10 ttl=64 time=0.549 ms 64 bytes from arnold (192.168.158.6): icmp_seq=11 ttl=64 time=0.829 ms --- arnold ping statistics --- 11 packets transmitted, 11 received, 0% packet loss, time ms rtt min/avg/max/mdev = 0.438/238.113/1000.967/365.279 ms, pipe 2 [EMAIL PROTECTED]:~# uname -a Linux zorro 2.6.24 #6 SMP PREEMPT Sun Feb 3 18:27:48 CET 2008 i686 Intel(R) Core(TM)2 CPU T7200 @ 2.00GHz GenuineIntel GNU/Linux [EMAIL PROTECTED]:~# lspci -vvv >>> [stuff deleted] >>> Unfortunately the e1000e driver is not an option as it will not detect the NIC: from dmesg with e1000 compiled in: Intel(R) PRO/1000 Network Driver - version 7.3.20-k2-NAPI Copyright (c) 1999-2006 Intel Corporation. ACPI: PCI Interrupt :02:00.0[A] -> GSI 16 (level, low) -> IRQ 16 PCI: Setting latency timer of device :02:00.0 to 64 e1000: :02:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x1) 00:15:58:c3:3a:71 e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection from dmesg with e1000e compiled in: e1000e: Intel(R) PRO/1000 Network Driver - 0.2.0 e1000e: Copyright (c) 1999-2007 Intel Corporation. Any pointers? Thanks, Martin --- >>> Just for the records, I googled the following solution for the Lenovo T60: >>> >>> (a) use the e1000 driver >>> (b) if compiling as a module, add the following parameter to modprobe.conf: >>> options e1000 RxIntDelay=5 >>> (c) if compiling a static driver, use the following patch (based on 2.6.24): >>> >>> --- e1000_param.c.orig2008-01-24 23:58:37.0 +0100 >>> +++ e1000_param.c 2008-02-09 20:42:23.0 +0100 >>> @@ -158,7 +158,7 @@ >>> * Valid Range: 0-65535 >>> */ >>> E1000_PARAM(RxIntDelay, "Receive Interrupt Delay"); >>> -#define DEFAULT_RDTR 0 >>> +#define DEFAULT_RDTR 5 >>> #define MAX_RXDELAY 0x >>> #define MIN_RXDELAY0 >>> >>> After reboot, the average ping time is still factor 10 worse than it should >>> be, but it stays below 2 ms (which is a remarkable improvement compared to >>> 1000 ms). >> correct, this was a workaround which improved things for most people, but >> did not >> *fix* it. >> >> the real fix is to disable L1 ASPM alltogether at the cost of more power >> consumption, which is what is in e1000e in 2.6.25-git. > > e1000e doesn't recognize his NIC. Will you be adding this to the e1000 > driver as well? no, from 2.6.25 onwards e1000e will support 82573 nics, so you'll have to migrate drivers, and you will get the fix automatically that way. after 2.6.25 releases, support for all pci-e nics will be removed from the e1000 driver. Cheers Auke -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at
[RFC] ipvs: Cleanup sync daemon code
Hi all, I'd like to get your feedback on this: - Use kthread_run instead of doing a double-fork via kernel_thread() - Return proper error codes to user-space on failures Currently ipvsadm --start-daemon with an invalid --mcast-interface will silently suceed. With these changes we get an appropriate "No such device" error. - Use wait queues for both master and backup thread Instead of doing an endless loop with sleeping for one second, we now use wait queues. The master sync daemon has its own wait queue and gets woken up when we have enough data to sent and also at a regular interval. The backup sync daemon sits on the wait queue of the mcast socket and gets woken up as soon as we have data to process. diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h index 56f3c94..519bd96 100644 --- a/include/net/ip_vs.h +++ b/include/net/ip_vs.h @@ -890,6 +890,7 @@ extern char ip_vs_backup_mcast_ifn[IP_VS_IFNAME_MAXLEN]; extern int start_sync_thread(int state, char *mcast_ifn, __u8 syncid); extern int stop_sync_thread(int state); extern void ip_vs_sync_conn(struct ip_vs_conn *cp); +extern void ip_vs_sync_init(void); /* diff --git a/net/ipv4/ipvs/ip_vs_core.c b/net/ipv4/ipvs/ip_vs_core.c index 963981a..0ccee4b 100644 --- a/net/ipv4/ipvs/ip_vs_core.c +++ b/net/ipv4/ipvs/ip_vs_core.c @@ -1071,6 +1071,8 @@ static int __init ip_vs_init(void) { int ret; + ip_vs_sync_init(); + ret = ip_vs_control_init(); if (ret < 0) { IP_VS_ERR("can't setup control.\n"); diff --git a/net/ipv4/ipvs/ip_vs_sync.c b/net/ipv4/ipvs/ip_vs_sync.c index 948378d..36063d3 100644 --- a/net/ipv4/ipvs/ip_vs_sync.c +++ b/net/ipv4/ipvs/ip_vs_sync.c @@ -29,6 +29,9 @@ #include #include /* for ip_mc_join_group */ #include +#include +#include +#include #include #include @@ -68,7 +71,8 @@ struct ip_vs_sync_conn_options { }; struct ip_vs_sync_thread_data { - struct completion *startup; + struct completion *startup; /* set to NULL once completed */ + int *retval; /* only valid until startup is completed */ int state; }; @@ -123,9 +127,10 @@ struct ip_vs_sync_buff { }; -/* the sync_buff list head and the lock */ +/* the sync_buff list head, the lock and the counter */ static LIST_HEAD(ip_vs_sync_queue); static DEFINE_SPINLOCK(ip_vs_sync_lock); +static unsigned int ip_vs_sync_count; /* current sync_buff for accepting new conn entries */ static struct ip_vs_sync_buff *curr_sb = NULL; @@ -140,6 +145,13 @@ volatile int ip_vs_backup_syncid = 0; char ip_vs_master_mcast_ifn[IP_VS_IFNAME_MAXLEN]; char ip_vs_backup_mcast_ifn[IP_VS_IFNAME_MAXLEN]; +/* sync daemon tasks */ +static struct task_struct *sync_master_thread; +static struct task_struct *sync_backup_thread; + +/* wait queue for master sync daemon */ +static DECLARE_WAIT_QUEUE_HEAD(sync_master_wait); + /* multicast addr */ static struct sockaddr_in mcast_addr; @@ -148,6 +160,8 @@ static inline void sb_queue_tail(struct ip_vs_sync_buff *sb) { spin_lock(_vs_sync_lock); list_add_tail(>list, _vs_sync_queue); + if (++ip_vs_sync_count == 10) + wake_up_interruptible(_master_wait); spin_unlock(_vs_sync_lock); } @@ -163,6 +177,7 @@ static inline struct ip_vs_sync_buff * sb_dequeue(void) struct ip_vs_sync_buff, list); list_del(>list); + ip_vs_sync_count--; } spin_unlock_bh(_vs_sync_lock); @@ -536,14 +551,17 @@ static int bind_mcastif_addr(struct socket *sock, char *ifname) static struct socket * make_send_sock(void) { struct socket *sock; + int result; /* First create a socket */ - if (sock_create_kern(PF_INET, SOCK_DGRAM, IPPROTO_UDP, ) < 0) { + result = sock_create_kern(PF_INET, SOCK_DGRAM, IPPROTO_UDP, ); + if (result < 0) { IP_VS_ERR("Error during creation of socket; terminating\n"); - return NULL; + return ERR_PTR(result); } - if (set_mcast_if(sock->sk, ip_vs_master_mcast_ifn) < 0) { + result = set_mcast_if(sock->sk, ip_vs_master_mcast_ifn); + if (result < 0) { IP_VS_ERR("Error setting outbound mcast interface\n"); goto error; } @@ -551,14 +569,16 @@ static struct socket * make_send_sock(void) set_mcast_loop(sock->sk, 0); set_mcast_ttl(sock->sk, 1); - if (bind_mcastif_addr(sock, ip_vs_master_mcast_ifn) < 0) { + result = bind_mcastif_addr(sock, ip_vs_master_mcast_ifn); + if (result < 0) { IP_VS_ERR("Error binding address of the mcast interface\n"); goto error; } - if (sock->ops->connect(sock, - (struct sockaddr*)_addr, - sizeof(struct sockaddr), 0) < 0) { + result = sock->ops->connect(sock, + (struct
[git pull] x86 updates
Linus, please pull the pending x86 updates from: ssh://master.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git master The update contains: - a couple of bugfixes - CPA and DEBUG_PAGEALLOC improvements - x86 power management consolidation - GEODE updates - 32bit boot time page table construction rework - sparse and compile warning fixes - trivial cleanups There are two patches out of x86 scope as well: - lguest bugfix: x86 broke it, fix is obviously correct and Rusty is away - randomization docs: resulted out of a x86 randomization discussion Thanks, tglx Ahmed S. Darwish (1): lguest: accept guest _PAGE_PWT page table entries Andres Salomon (5): x86: GEODE: MFGPT: Minor cleanups x86: GEODE: MFGPT: drop module owner usage from MFGPT API x86: GEODE: MFGPT: replace 'flags' field with 'avail' bit x86: GEODE: MFGPT: make mfgpt_timer_setup available outside of mfgpt_32.c x86: GEODE: MFGPT: fix a potential race when disabling a timer Arnd Hannemann (1): x86: GEODE: MFGPT: fix typo in printk in mfgpt_timer_setup Denys Vlasenko (1): x86: trivial printk optimizations Harvey Harrison (6): x86: fix sparse warning in xen/time.c x86: sparse warning in therm_throt.c x86: sparse warnings in pageattr.c x86: fix sparse warning in topology.c x86: fix sparse warnings in acpi/bus.c x86, core: remove CONFIG_FORCED_INLINING Ian Campbell (2): x86: construct 32-bit boot time page tables in native format. x86: fix early_ioremap pagetable ops Ingo Molnar (2): x86: fixup more paravirt fallout brk: help text typo fix Jiri Kosina (1): brk: document randomize_va_space and CONFIG_COMPAT_BRK (was Re: Jordan Crouse (2): x86: GEODE: MFGPT: Use "just-in-time" detection for the MFGPT timers x86: GEODE: make sure the right MFGPT timer fired the timer tick Rafael J. Wysocki (4): x86 PM: move 64-bit hibernation files to arch/x86/power x86 PM: rename 32-bit files in arch/x86/power x86 PM: consolidate suspend and hibernation code x86 PM: update stale comments Thomas Gleixner (6): x86: avoid unused variable warning in mm/init_64.c x86: DEBUG_PAGEALLOC: enable after mem_init() x86: introduce page pool in cpa x86: cpa, use page pool x86: cpa, enable CONFIG_DEBUG_PAGEALLOC on 64-bit x86: cpa, strict range check in try_preserve_large_page() Willy Tarreau (1): x86: GEODE fix MFGPT input clock value -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: HPET timer broken using 2.6.23.13 / nanosleep() hangs
Thomas, I haven't found a good way to capture the SysRq output for this. I found that when it locks up at boot time, even SysRq is unresponsive. I don't have another way of getting console on the machine right now to get the output off of it. I have since upgraded to 2.6.24 and the problem still persists. Another interesting twist though.. I just rebuilt my kernel with ARCH=x86_64 and HPET works perfectly. So this only appears to break when in 32-bit mode. For some reason it picks tsc at boot time, but if I install hpet afterwards under x86_64 it no longer hangs when I run 'sleep 1'. Does that shed any more light on the problem? Thanks, -Andrew # uname -a Linux am2 2.6.24 #7 Sat Feb 9 18:06:50 EST 2008 x86_64 GNU/Linux # dmesg | egrep -i clock\|hpet ACPI: HPET 3DFE7780, 0038 (r1 RS690 AWRDACPI 42302E31 AWRD 98) ACPI: HPET id: 0x10b9a201 base: 0xfed0 hpet clockevent registered TSC calibrated against HPET hpet0: at MMIO 0xfed0, IRQs 2, 8, 0, 0 hpet0: 4 32-bit timers, 14318180 Hz Time: tsc clocksource has been installed. Real Time Clock Driver v1.12ac hpet_resources: 0xfed0 is busy # echo -n hpet > /sys/devices/system/clocksource/clocksource0/current_clocksource # dmesg | tail -1 Time: hpet clocksource has been installed. # time sleep 1 real0m1.001s user0m0.000s sys 0m0.000s On Jan 18, 2008 5:26 AM, Thomas Gleixner <[EMAIL PROTECTED]> wrote: > On Wed, 16 Jan 2008, Andrew Paprocki wrote: > > > I applied the patch and I am still locking up after > > Time: hpet clocksource has been installed. > > That was expected :) > > > I rebooted with "clocksource=tsc" to get the logs of the trace which > > was added. I'm assuming the grep below gets all the interesting parts. > > I enabled the HPET character device as mentioned before, which is why > > the hpet0 lines appear now. > > > > # dmesg | egrep -i "(hpet|time|clock)" > > ACPI: HPET 37FE7400, 0038 (r1 RS690 AWRDACPI 42302E31 AWRD 98) > > ATI board detected. Disabling timer routing over 8254. > > ACPI: PM-Timer IO Port: 0x4008 > > ACPI: HPET id: 0x10b9a201 base: 0xfed0 > > Kernel command line: vga=0x31a root=/dev/sda1 ro clocksource=tsc > > HPET check: t1=5 t2=1139 s=56226339975 n=56226539985 > > Ok, the counter works when we initialize the HPET. > > t2-t1 = 1134 ticks ~= 79us > s-n = 200010 ~= 2525MHz --> That should be the frequency of your CPU. > > > Jan 16 14:44:43 am2 kernel: Call Trace: > > Jan 16 14:44:48 am2 kernel: [] enqueue_hrtimer+0xd7/0xe2 > > Jan 16 14:44:48 am2 kernel: [] hrtimer_start+0xe8/0xf4 > > Jan 16 14:44:48 am2 kernel: [] do_nanosleep+0x48/0x73 > > Jan 16 14:44:48 am2 kernel: [] > > hrtimer_nanosleep_restart+0x34/0xa1 > > Jan 16 14:44:48 am2 kernel: [] hrtimer_wakeup+0x0/0x18 > > Jan 16 14:44:48 am2 kernel: [] sys_restart_syscall+0xe/0xf > > Jan 16 14:44:48 am2 kernel: [] sysenter_past_esp+0x5f/0x85 > > When the system is hung, can you please hit SysRq-Q wait a bit and hit > SysRq-Q again. Please provide the output. > > Thanks, > tglx > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] scsi: ses fix mem leaking when fail to add intf
[PATCH] scsi: ses fix mem leaking when fail to add intf fix leaking with scomp leaking when failing. also remove one extra space. Signed-off-by: Yinghai Lu <[EMAIL PROTECTED]> Index: linux-2.6/drivers/scsi/ses.c === --- linux-2.6.orig/drivers/scsi/ses.c +++ linux-2.6/drivers/scsi/ses.c @@ -416,11 +416,11 @@ static int ses_intf_add(struct class_dev int i, j, types, len, components = 0; int err = -ENOMEM; struct enclosure_device *edev; - struct ses_component *scomp; + struct ses_component *scomp = NULL; if (!scsi_device_enclosure(sdev)) { /* not an enclosure, but might be in one */ - edev = enclosure_find(>host->shost_gendev); + edev = enclosure_find(>host->shost_gendev); if (edev) { ses_match_to_enclosure(edev, sdev); class_device_put(>cdev); @@ -456,9 +456,6 @@ static int ses_intf_add(struct class_dev if (!buf) goto err_free; - ses_dev->page1 = buf; - ses_dev->page1_len = len; - result = ses_recv_diag(sdev, 1, buf, len); if (result) goto recv_failed; @@ -473,6 +470,9 @@ static int ses_intf_add(struct class_dev type_ptr[0] == ENCLOSURE_COMPONENT_ARRAY_DEVICE) components += type_ptr[1]; } + ses_dev->page1 = buf; + ses_dev->page1_len = len; + buf = NULL; result = ses_recv_diag(sdev, 2, hdr_buf, INIT_ALLOC_SIZE); if (result) @@ -489,6 +489,7 @@ static int ses_intf_add(struct class_dev goto recv_failed; ses_dev->page2 = buf; ses_dev->page2_len = len; + buf = NULL; /* The additional information page --- allows us * to match up the devices */ @@ -506,11 +507,26 @@ static int ses_intf_add(struct class_dev goto recv_failed; ses_dev->page10 = buf; ses_dev->page10_len = len; + buf = NULL; no_page10: - scomp = kmalloc(sizeof(struct ses_component) * components, GFP_KERNEL); + + /* Page 7 for the descriptors is optional */ + result = ses_recv_diag(sdev, 7, hdr_buf, INIT_ALLOC_SIZE); + if (result) + goto simple_populate; + + len = (hdr_buf[2] << 8) + hdr_buf[3] + 4; + /* add 1 for trailing '\0' we'll use */ + buf = kzalloc(len + 1, GFP_KERNEL); + if (!buf) + goto err_free; + result = ses_recv_diag(sdev, 7, buf, len); + + simple_populate: + scomp = kzalloc(sizeof(struct ses_component) * components, GFP_KERNEL); if (!scomp) - goto err_free; + goto err_free; edev = enclosure_register(cdev->dev, sdev->sdev_gendev.bus_id, components, _enclosure_callbacks); @@ -521,20 +537,10 @@ static int ses_intf_add(struct class_dev edev->scratch = ses_dev; for (i = 0; i < components; i++) - edev->component[i].scratch = scomp++; + edev->component[i].scratch = scomp + i; - /* Page 7 for the descriptors is optional */ - buf = NULL; - result = ses_recv_diag(sdev, 7, hdr_buf, INIT_ALLOC_SIZE); - if (result) - goto simple_populate; - - len = (hdr_buf[2] << 8) + hdr_buf[3] + 4; - /* add 1 for trailing '\0' we'll use */ - buf = kzalloc(len + 1, GFP_KERNEL); - result = ses_recv_diag(sdev, 7, buf, len); + /* result and buf from page 7 check */ if (result) { - simple_populate: kfree(buf); buf = NULL; desc_ptr = NULL; @@ -598,6 +604,7 @@ static int ses_intf_add(struct class_dev err = -ENODEV; err_free: kfree(buf); + kfree(scomp); kfree(ses_dev->page10); kfree(ses_dev->page2); kfree(ses_dev->page1); @@ -630,6 +637,7 @@ static void ses_intf_remove(struct class ses_dev = edev->scratch; edev->scratch = NULL; + kfree(ses_dev->page10); kfree(ses_dev->page1); kfree(ses_dev->page2); kfree(ses_dev); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Query about set_pages_* API
Is the set_pages_* API that replaces change_page_attr described somewhere? I have been unable to find it with Google. I'm trying to modify the VirtualBox kernel module to work with 2.6.24-git (and 2.6.25) on x86_64 architecture. The current code has a value of the third argument of the call (prot) with 3 variants. All variations have the following bits set: _PAGE_PRESENT, _PAGE_RW, _PAGE_DIRTY, and _PAGE_ACCESSED. Number 2 adds _PAGE_NX to the above, and number 3 adds _PAGE_GLOBAL to the bits in variation 1. From the code in arch/x86/mm/pageattr.c, I figured I need to call set_pages_wb() unconditionally, and set_pages_nx() if _PAGE_NX is set. Will these calls be sufficient? I thought about calling set_pages_rw(), but that entry is not exported. Thanks, Larry -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: how to tell i386 from x86-64 kernel
On Sat, 9 Feb 2008 21:13:43 +0100 (CET) Jan Engelhardt <[EMAIL PROTECTED]> wrote: > > On Feb 1 2008 12:53, Alejandro Riveira Fernández wrote: > >> > >> # uname -m > >> I won't tell you. > >> # linux32 uname -m > >> i686 > > > > Ubuntu 7.10 64 bit userland 2.6.24 > > > >$ uname -m > >x86_64 > >$ linux32 uname -m > >i686 > > What I am saying is that uname(2) does not reliably tell you whether > you have a 64-bit kernel underneath unless you have other sources of > information. that's sort of a rabbit-and-the-frog problem. The 32 bit emulator tries to look EXACTLY like the 32 bit kernel, and it really should. If someone wants a method to detect even that... we would really want to know the exact usecase.. because very likely it's the wrong answer to some other problem ;-) -- If you want to reach me at my work email, use [EMAIL PROTECTED] For development, discussion and tips for power savings, visit http://www.lesswatts.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] time: Fix constant size in kernel/timeconst.h
H. Peter Anvin wrote: > Johann Felix Soden wrote: > > kernel/timeconst.pl generates only long sized constants in timeconst.pl > > which gives this warning: > > > > kernel/time.c: In function 'msecs_to_jiffies': > > kernel/time.c:472: warning: integer constant is too large for 'long' type > > > > unsigned long long is needed. > Hm, you've just taken a warning and elevated it to a bug. > > According to the C standard, a constant has the shortest type (>= int) > needed to hold the constant, and the warning above is somewhat bogus in > that context (what version of gcc is that, anyway?) > > ULL is only appropriate to 32-bit machines, or there will be other > issues downstream. The Right Way[TM] to do this would be to get Linux > to have the [U]INTxx_C() macros from C99. > > -hpa Sorry for this. Thanks for teaching about the C standard. About your question: gcc 4.2.3 gave me this warning. And I'm a little bit surprised because the kernel code is full of constants with ULL. Is kernel/time.c a special case? J. F. Soden -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scsi: ses fix for len and mem leaking when fail to add intf
On Feb 9, 2008 7:00 AM, James Bottomley <[EMAIL PROTECTED]> wrote: > > On Sat, 2008-02-09 at 04:13 -0800, Yinghai Lu wrote: > > [PATCH] scsi: ses fix for len and mem leaking when fail to add intf > > > > change to u32 before left shifting char > > This one is a bit unnecessary; C promotion rules guarantee that > everything is promoted to int (or above) before doing arithmetic. Since > it's only ever done on 16 bits, signed or unsigned int is adequate for > the conversion. thank. just learned that. [EMAIL PROTECTED]:~/xx/xx/notes> cat ctest.c #include int main(int argc, char *argv[]) { unsigned char buf[20]; int len; buf[2] = 0x02; buf[3] = 0x03; len = (buf[2] << 8) + buf[3]; printf("len = %x\n", len); return 0; } [EMAIL PROTECTED]:~/xx/xx/notes> gcc -o ctest ctest.c [EMAIL PROTECTED]:~/xx/xx/notes> ./ctest len = 203 [EMAIL PROTECTED]:~/xx/xx/notes> > > > also fix leaking with scomp leaking when failing. > > Yes, I see that, thanks! There's also the kmalloc of scomp which should > be kzalloc if you care to fix that up in the resend. > > > - edev = enclosure_find(>host->shost_gendev); > > + edev = enclosure_find(>host->shost_gendev); > > Space cleanups also need mention in the changelog. > > > - ses_dev->page1 = buf; > > - ses_dev->page1_len = len; > > - > > result = ses_recv_diag(sdev, 1, buf, len); > > if (result) > > goto recv_failed; > > > > + ses_dev->page1 = buf; > > + ses_dev->page1_len = len; > > + > > Neither of us gets this right. By removing the kfree(buf) from the > err_free path, you cause a leak here. I cause a double free. I think > putting back the kfree(buf) and keeping this hunk is the fix. the buf already become sdev->page1, sdev->pag10, sdev->page2. so it will be freed via them > > > types = buf[10]; > > len = buf[11]; > > > > @@ -474,11 +474,12 @@ static int ses_intf_add(struct class_dev > > components += type_ptr[1]; > > } > > > > + buf = NULL; > > Yes, prevents double free (but only if buf is freed). it became sdev->page1 already > > > result = ses_recv_diag(sdev, 2, hdr_buf, INIT_ALLOC_SIZE); > > if (result) > > goto recv_failed; > > > > @@ -492,11 +493,12 @@ static int ses_intf_add(struct class_dev > > > > /* The additional information page --- allows us > >* to match up the devices */ > > + buf = NULL; > > It's probably better to move these closer to the statements that make > them necessary (in this case above the comment). OK > > > if (IS_ERR(edev)) { > > err = PTR_ERR(edev); > > + kfree(scomp); > > goto err_free; > > } > > kfree(scomp) should be in the err_free path just in case someone else > adds something to this. ok. > > > /* add 1 for trailing '\0' we'll use */ > > buf = kzalloc(len + 1, GFP_KERNEL); > > - result = ses_recv_diag(sdev, 7, buf, len); > > - if (result) { > > + if (buf) > > + result = ses_recv_diag(sdev, 7, buf, len); > > + else > > + result = 7; > > + > > What exactly is this supposed to be doing, and why 7? If you're > thinking of conditioning the page 7 receive on the success of the > allocation, we really need the allocation failure report more than we > need the driver to attach. want to move out label out of if later. > > > - addl_desc_ptr += addl_desc_ptr[1] + 2; > > + addl_desc_ptr += 2 + addl_desc_ptr[1]; > > This is rather pointless, isn't it? > > > err_free: > > - kfree(buf); > > You can't remove this. Also add kfree(scomp) here. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/4] make pr_debug() dynamic
On Feb 8 2008 10:52, Jason Baron wrote: >On Thu, Feb 07, 2008 at 02:42:14PM -0800, Joe Perches wrote: >> On Thu, 2008-02-07 at 16:03 -0500, Jason Baron wrote: >> > make the pr_debug() function dependent upon the new immediate >> > infrastruture. >> >> What's wrong with klogd -c 8 or equivalent? > >Setting the loglevel higher, will not make pr_debug() calls visible. The only >way to make them visible right now, is by re-compiling the kernel. pr_debug() was IMHO meant to be a compile-time optimization to throw out debug messages most people do not want. If you want to switch on/off debugging messages, use printk(KERN_DEBUG) [with klogd -c something] and not pr_debug! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: wrong cylinders of kingston usb pendrive [intel 82801DB]
On Mon, 4 Feb 2008, Patrick Ringl wrote: > Hello, > > I am suffering from the following (usb-related?) problem: > > I have several different mashines - all x86 architecture - just lets > call them mashineA, mashineB and mashineC. > Anyway, mashineA has a severe problem with a > Kingston-USB-pendrive(2gig). I simply cant install anything on it - the > kernel usually moans with problems like "attempt to access beyond end of > device" - while it does work fine with several noname usb-pendrives of > the same size. > Now, I just tested that kingston pendrive on mashineB and mashineC - > where it runs fine .. I can install debian to it (same installation > media) without any problem or kernel errors. > > I compared the output of dmesg and fdisk from mashineA and mashineB and > C .. and the difference is simple: mashineA always shows 248 cylinders - > while all the other mashines show 228 cylinders. The number of cylinders is meaningless. What matters is the number of sectors. What does "fdisk -l /dev/sdX" (substitute the appropriate letter for X) display for the pendrive on each of the machines? What messages show up in the dmesg log when you plug in the pendrive? What version of the Linux kernel are you using? Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Sectionized printk data
On Feb 4 2008 19:07, Sam Ravnborg wrote: >> The attached patch allows something along the lines: >> >> int __init some_function(void) >> { >> [...] >> pr_init(KERN_WARNING "failure %s in %s\n", ...); >> [...] >> } >> >> Another idea I had was to make printk a macro that figures out the >> section of the surrounding function and then moves the data >> automatically when it is a literal, but I couldn't find mechanisms that >> allow this. Anyone of you got an idea? >> >> What do you think in general? > >What is the rationale behind this? To drop strings that are only shown once anyway, such as: static int __init ebtables_init(void) { int ret; mutex_lock(_mutex); list_add(_standard_target.list, _targets); mutex_unlock(_mutex); if ((ret = nf_register_sockopt(_sockopts)) < 0) return ret; -> printk(KERN_INFO "Ebtables v2.0 registered\n"); return 0; } >If you say "saving memory" then please let us know with specific examples >in what area these savings will really pay off. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/