date:20070424

Re: [patch] oom: kill all threads that share mm with killed task

2007-04-24 Thread David Rientjes

On Mon, 23 Apr 2007, Christoph Lameter wrote:

 Obvious fix. It was broken by
  
 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=f2a2a7108aa0039ba7a5fe7a0d2ecef2219a7584
 Dec 7. So its in 2.6.20 and later. Candiate for stable?
 

I agree it's obvious enough that it should be included in stable.  
Otherwise the entire iteration becomes a big no-op and it won't alleviate 
the OOM condition in one call to out_of_memory() because there may be 
outstanding tasks with the shared -mm.

David
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Transparently handle .symbol lookup for kprobes

2007-04-24 Thread Paul Mackerras

Srinivasa Ds writes:

 + } else {\
 + char dot_name[KSYM_NAME_LEN+1]; \
 + dot_name[0] = '.';  \
 + dot_name[1] = '\0'; \
 + strncat(dot_name, name, KSYM_NAME_LEN); \

Assuming the kernel strncat works like the userspace one does, there
is a possibility that dot_name[] won't be properly null-terminated
here.  If strlen(name) = KSYM_NAME_LEN-1, then strncat will set
dot_name[KSYM_NAME_LEN-1] to something non-null and won't touch
dot_name[KSYM_NAME_LEN].

Paul.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] powerpc pseries eeh: Convert to kthread API

2007-04-24 Thread Paul Mackerras

Christoph Hellwig writes:

 The first question is obviously, is this really something we want?
 spawning kernel thread on demand without reaping them properly seems
 quite dangerous.

What specifically has to be done to reap a kernel thread?  Are you
concerned about the number of threads, or about having zombies hanging
around?

Paul.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

SOME STUFF ABOUT REISER4

2007-04-24 Thread lkml777

On Sun, 22 Apr 2007 19:00:46 -0700, Eric Hopper
[EMAIL PROTECTED] said:

 I know that this whole effort has been put in disarray by the
 prosecution of Hans Reiser, but I'm curious as to its status. Is
 Reiser4 going to be going into the Linus kernel anytime soon? Is there
 somewhere I should be looking to find this out without wasting bandwidth
 here?

There was a thread the other day, that talked about Reiser4.

It took a while but I have found it (actually two)

http://lkml.org/lkml/2007/4/5/360
http://lkml.org/lkml/2007/4/9/4

You may want to check them out.
-- 
  
  [EMAIL PROTECTED]

-- 
http://www.fastmail.fm - Access your email from home and the web

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 03/25] xen: Add nosegneg capability to the vsyscall page notes

2007-04-24 Thread Jeremy Fitzhardinge

Roland McGrath wrote:
 I have to admit I still don't really understand all this.  Is it
 documented somewhere?
 

 I have explained it in public more than once, but I don't know off hand
 anywhere that was helpfully recorded.
   

Thanks very much.  I'd been poking about, but the closest I came to an
actual description was various patches fixing bugs, so it was a little
incomplete.

 For example, a Xen-enabled kernel can use a single vDSO image (or a single
 pair of int80/sysenter images), containing the nosegneg hwcap note.  When
 there is no need for it (native or hvm or 64-bit hv or whatever), it just
 clears the mask word.  If you actually do this, you'll want to modify the
 NOTE_KERNELCAP_BEGIN macro to define a global label you can use with VDSO_SYM.
   

Thanks for the pointer.  I'd been getting a bit of heat for enabling the
nonegseg flag unconditionally.  If I can make Xen-specific then that
will be one less source of complaints.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Peter Williams


Arjan van de Ven wrote:
Within reason, it's not the number of clients that X has that causes its 
CPU bandwidth use to sky rocket and cause problems.  It's more to to 
with what type of clients they are.  Most GUIs (even ones that are 
constantly updating visual data (e.g. gkrellm -- I can open quite a 
large number of these without increasing X's CPU usage very much)) cause 
very little load on the X server.  The exceptions to this are the 



there is actually 2 and not just 1 X server, and they are VERY VERY
different in behavior.

Case 1: Accelerated driver

If X talks to a decent enough card it supports will with acceleration,
it will be very rare for X itself to spend any kind of significant
amount of CPU time, all the really heavy stuff is done in hardware, and
asynchronously at that. A bit of batching will greatly improve system
performance in this case.

Case 2: Unaccelerated VESA

Some drivers in X, especially the VESA and NV drivers (which are quite
common, vesa is used on all hardware without a special driver nowadays),
have no or not enough acceleration to matter for modern desktops. This
means the CPU is doing all the heavy lifting, in the X program. In this
case even a simple move the window a bit becomes quite a bit of a CPU
hog already.


Mine's a:

SiS 661/741/760 PCI/AGP or 662/761Gx PCIE VGA Display adapter according 
to X's display settings tool.  Which category does that fall into?


It's not a special adapter and is just the one that came with the 
motherboard. It doesn't use much CPU unless I grab a window and wiggle 
it all over the screen or do something like ls -lR / in an xterm.




The cases are fundamentally different in behavior, because in the first
case, X hardly consumes the time it would get in any scheme, while in
the second case X really is CPU bound and will happily consume any CPU
time it can get.


Which still doesn't justify an elaborate points sharing scheme. 
Whichever way you look at that that's just another way of giving X more 
CPU bandwidth and there are simpler ways to give X more CPU if it needs 
it.  However, I think there's something seriously wrong if it needs the 
-19 nice that I've heard mentioned.  You might as well just run it as a 
real time process.


Peter
--
Peter Williams   [EMAIL PROTECTED]

Learning, n. The kind of ignorance distinguishing the studious.
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

NonExecutable Bit in 32Bit

2007-04-24 Thread Cestonaro, Thilo \(external\)

Hey,

is it right, that the NX Bit is not used under i386-Arch but under x86_64-Arch?
When yes, is there a special argument for it not to be used?

Ciao Thilo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] x86_64: Reflect the relocatability of the kernel in the ELF header.

2007-04-24 Thread Vivek Goyal

On Sun, Apr 22, 2007 at 11:12:13PM -0600, Eric W. Biederman wrote:
 
 Currently because vmlinux does not reflect that the kernel is relocatable
 we still have to support CONFIG_PHYSICAL_START.  So this patch adds a small
 c program to do what we cannot do with a linker script, set the elf header
 type to ET_DYN.
 
 This should remove the last obstacle to removing CONFIG_PHYSICAL_START
 on x86_64.
 
 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]

[Dropping fastboot mailing list from CC as kexec mailing list is new list
 for this discussion]

[..]
 +void file_open(const char *name)
 +{
 + if ((fd = open(name, O_RDWR, 0))  0)
 + die(Unable to open `%s': %m, name);
 +}
 +
 +static void mketrel(void)
 +{
 + unsigned char e_type[2];
 + if (read(fd, e_ident, sizeof(e_ident)) != sizeof(e_ident))
 + die(Cannot read ELF header: %s\n, strerror(errno));
 +
 + if (memcmp(e_ident, ELFMAG, 4) != 0)
 + die(No ELF magic\n);
 +
 + if ((e_ident[EI_CLASS] != ELFCLASS64) 
 + (e_ident[EI_CLASS] != ELFCLASS32))
 + die(Unrecognized ELF class: %x\n, e_ident[EI_CLASS]);
 + 
 + if ((e_ident[EI_DATA] != ELFDATA2LSB) 
 + (e_ident[EI_DATA] != ELFDATA2MSB))
 + die(Unrecognized ELF data encoding: %x\n, e_ident[EI_DATA]);
 +
 + if (e_ident[EI_VERSION] != EV_CURRENT)
 + die(Unknown ELF version: %d\n, e_ident[EI_VERSION]);
 +
 + if (e_ident[EI_DATA] == ELFDATA2LSB) {
 + e_type[0] = ET_REL  0xff;
 + e_type[1] = ET_REL  8;
 + } else {
 + e_type[1] = ET_REL  0xff;
 + e_type[0] = ET_REL  8;
 + }

Hi Eric,

Should this be ET_REL or ET_DYN? kexec refuses to load this vmlinux
as it does not find it to be executable type.

I am not well versed with various conventions but if I go through Executable
and Linking Format document, this is what it says about various file types.

• A relocatable file holds code and data suitable for linking with other
  object files to create an executable or a shared object file.

• An executable file holds a program suitable for execution.

• A shared object file holds code and data suitable for linking in two
  contexts. First, the link editor may process it with other relocatable and
  shared object files to create another object file. Second, the dynamic
  linker combines it with an executable file and other shared objects
  to create a process image.

So above does not seem to fit in the ET_REL type. We can't relink this
vmlinux? And it does not seem to fit in ET_DYN definition too. We are
not relinking this vmlinux with another executable or other relocatable
files.

I remember once you mentioned the term dynamic executable which can be
loaded at a non-compiled address and let run without requiring any
relocation processing. This vmlinux will fall in that category but can't 
relate it to standard elf file definitions.

Thanks
Vivek
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Ingo Molnar


* Peter Williams [EMAIL PROTECTED] wrote:

  The cases are fundamentally different in behavior, because in the 
  first case, X hardly consumes the time it would get in any scheme, 
  while in the second case X really is CPU bound and will happily 
  consume any CPU time it can get.
 
 Which still doesn't justify an elaborate points sharing scheme. 
 Whichever way you look at that that's just another way of giving X 
 more CPU bandwidth and there are simpler ways to give X more CPU if it 
 needs it.  However, I think there's something seriously wrong if it 
 needs the -19 nice that I've heard mentioned.

Gene has done some testing under CFS with X reniced to +10 and the 
desktop still worked smoothly for him. So CFS does not 'need' a reniced 
X. There are simply advantages to negative nice levels: for example 
screen refreshes are smoother on any scheduler i tried. BUT, there is a 
caveat: on non-CFS schedulers i tried X is much more prone to get into 
'overscheduling' scenarios that visibly hurt X's performance, while on 
CFS there's a max of 1000-1500 context switches a second at nice -10. 
(which, considering the cost of a context switch is well under 1% 
overhead.)

So, my point is, the nice level of X for desktop users should not be set 
lower than a low limit suggested by that particular scheduler's author. 
That limit is scheduler-specific. Con i think recommends a nice level of 
-1 for X when using SD [Con, can you confirm?], while my tests show that 
if you want you can go as low as -10 under CFS, without any bad 
side-effects. (-19 was a bit too much)

 [...]  You might as well just run it as a real time process.

hm, that would be a bad idea under any scheduler (including CFS), 
because real time processes can starve other processes indefinitely.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NonExecutable Bit in 32Bit

2007-04-24 Thread William Heimbigner


On Tue, 24 Apr 2007, Cestonaro, Thilo (external) wrote:


Hey,

is it right, that the NX Bit is not used under i386-Arch but under x86_64-Arch?
When yes, is there a special argument for it not to be used?

Ciao Thilo

I don't think so - some i386 cpus definitely have support for the NX bit.

Would having this be supported in i386 help debugging (and security) 
significantly?


William Heimbigner
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v2] Fixes and cleanups for earlyprintk aka boot console.

2007-04-24 Thread Andrew Morton

On Thu, 15 Mar 2007 16:46:39 +0100 Gerd Hoffmann [EMAIL PROTECTED] wrote:

 The console subsystem already has an idea of a boot console, using the
 CON_BOOT flag.  The implementation has some flaws though.  The major
 problem is that presence of a boot console makes register_console()
 ignore any other console devices (unless explicitly specified on the
 kernel command line).
 
 This patch fixes the console selection code to *not* consider a boot
 console a full-featured one, so the first non-boot console registering
 will become the default console instead.  This way the unregister call
 for the boot console in the register_console() function actually
 triggers and the handover from the boot console to the real console
 device works smoothly.  Added a printk for the handover, so you know
 which console device the output goes to when the boot console stops
 printing messages.
 
 The disable_early_printk() call is obsolete with that patch, explicitly
 disabling the early console isn't needed any more as it works
 automagically with that patch.
 
 I've walked through the tree, dropped all disable_early_printk()
 instances found below arch/ and tagged the consoles with CON_BOOT if
 needed.  The code is tested on x86, sh (thanks to Paul) and mips
 (thanks to Ralf).
 
 Changes to last version: Rediffed against -rc3, adapted to mips
 cleanups by Ralf, fixed udbg-immortal cmd line arg on powerpc.

I get this, across netconsole:

[17179569.184000] console handover: boot [earlyvga_f_0] - real [tty0]

wanna take a look at why there's cruft in bootconsole-name please?

in grub.conf I have

kernel /boot/bzImage-2.6.21-rc7-mm1 ro root=LABEL=/ rhgb vga=0x263 
[EMAIL PROTECTED]/eth0,[EMAIL PROTECTED]/00:0D:56:C6:C6:CC profile=1 
earlyprintk=vga resume=8:5 time

and I'm using

http://userweb.kernel.org/~akpm/config-sony.txt

Thanks.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Transparently handle .symbol lookup for kprobes

2007-04-24 Thread Srinivasa Ds

Paul Mackerras wrote:
 Srinivasa Ds writes:
 
 +} else {\
 +char dot_name[KSYM_NAME_LEN+1]; \
 +dot_name[0] = '.';  \
 +dot_name[1] = '\0'; \
 +strncat(dot_name, name, KSYM_NAME_LEN); \
 
 Assuming the kernel strncat works like the userspace one does, there
 is a possibility that dot_name[] won't be properly null-terminated
 here.  If strlen(name) = KSYM_NAME_LEN-1, then strncat will set
 dot_name[KSYM_NAME_LEN-1] to something non-null and won't touch
 dot_name[KSYM_NAME_LEN].

Irrespective of length of the string, kernel implementation of
strncat(lib/string.c) ensures that last character of string is set to
null. So dot_name[] is always null terminated.


char *strncat(char *dest, const char *src, size_t count)
{
char *tmp = dest;

if (count) {
while (*dest)
dest++;
while ((*dest++ = *src++) != 0) {
if (--count == 0) {
*dest = '\0';
break;
}
}
}
return tmp;
}
EXPORT_SYMBOL(strncat);
===

Is this OK then ??


Thanks
 Srinivasa DS

 
 Paul.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Andrew Morton

On Tue, 27 Mar 2007 14:49:20 -0700 Jeremy Fitzhardinge [EMAIL PROTECTED] 
wrote:

 The softlockup watchdog is currently a nuisance in a virtual machine,
 since the whole system could have the CPU stolen from it for a long
 period of time.  While it would be unlikely for a guest domain to be
 denied timer interrupts for over 10s, it could happen and any softlockup
 message would be completely spurious.
 
 Earlier I proposed that sched_clock() return time in unstolen
 nanoseconds, which is how Xen and VMI currently implement it.  If the
 softlockup watchdog uses sched_clock() to measure time, it would
 automatically ignore stolen time, and therefore only report when the
 guest itself locked up.  When running native, sched_clock() returns
 real-time nanoseconds, so the behaviour would be unchanged.
 
 Note that sched_clock() used this way is inherently per-cpu, so this
 patch makes sure that the per-processor watchdog thread initialized
 its own timestamp.

This patch
(ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/ignore-stolen-time-in-the-softlockup-watchdog.patch)
causes six failures in the locking self-tests, which I must say is rather
clever of it.


Here's the first one:

[17179569.184000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., 
Ingo Molnar
[17179569.184000] ... MAX_LOCKDEP_SUBCLASSES:8
[17179569.184000] ... MAX_LOCK_DEPTH:  30
[17179569.184000] ... MAX_LOCKDEP_KEYS:2048
[17179569.184000] ... CLASSHASH_SIZE:   1024
[17179569.184000] ... MAX_LOCKDEP_ENTRIES: 8192
[17179569.184000] ... MAX_LOCKDEP_CHAINS:  16384
[17179569.184000] ... CHAINHASH_SIZE:  8192
[17179569.184000]  memory used by lock dependency info: 992 kB
[17179569.184000]  per task-struct memory footprint: 1200 bytes
[17179569.184000] 
[17179569.184000] | Locking API testsuite:
[17179569.184000] 

[17179569.184000]  | spin |wlock |rlock |mutex 
| wsem | rsem |
[17179569.184000]   
--
[17179569.184000]  A-A deadlock:  ok  |  ok  |  ok  |  ok  
|  ok  |  ok  |
[17179569.184000]  A-B-B-A deadlock:  ok  |  ok  |  ok  |  ok  
|  ok  |  ok  |
[17179569.184000]  A-B-B-C-C-A deadlock:  ok  |  ok  |  ok  |  ok  
|  ok  |  ok  |
[17179569.184001]  A-B-C-A-B-C deadlock:  ok  |  ok  |  ok  |  ok  
|  ok  |  ok  |
[17179569.184002]  A-B-B-C-C-D-D-A deadlock:  ok  |  ok  |  ok  |  ok  
|  ok  |  ok  |
[17179569.184003]  A-B-C-D-B-D-D-A deadlock:  ok  |  ok  |  ok  |  ok  
|  ok  |  ok  |
[17179569.184004]  A-B-C-D-B-C-D-A deadlock:  ok  |  ok  |  ok  |  ok  
|  ok  |  ok  |
[17179569.184005] double unlock:  ok  |  ok  |  ok  |  ok  
|  ok  |  ok  |
[17179569.184006]   initialize held:  ok  |  ok  |  ok  |  ok  
|  ok  |  ok  |
[17179569.184006]  bad unlock order:  ok  |  ok  |  ok  |  ok  
|  ok  |  ok  |
[17179569.184006]   
--
[17179569.184006]   recursive read-lock: |  ok  |   
  |  ok  |
[17179569.184006]recursive read-lock #2: |  ok  |   
  |  ok  |
[17179569.184007] mixed read-write-lock: |  ok  |   
  |  ok  |
[17179569.184007] mixed write-read-lock: |  ok  |   
  |  ok  |
[17179569.184007]   
--
[17179569.184007]  hard-irqs-on + irq-safe-A/12:  ok  |  ok  |  ok  |
[17179569.184007]  soft-irqs-on + irq-safe-A/12:  ok  |  ok  |  ok  |
[17179569.184007]  hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[17179569.184007]  soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[17179569.184007]sirq-safe-A = hirqs-on/12:  ok  |  ok  |irq event 
stamp: 458
[17179569.184007] hardirqs last  enabled at (458): [c01e4116] 
irqsafe2A_rlock_12+0x96/0xa3
[17179569.184007] hardirqs last disabled at (457): [c01095b9] 
sched_clock+0x5e/0xe9
[17179569.184007] softirqs last  enabled at (454): [c01e4101] 
irqsafe2A_rlock_12+0x81/0xa3
[17179569.184007] softirqs last disabled at (450): [c01e408b] 
irqsafe2A_rlock_12+0xb/0xa3
[17179569.184007] FAILED| [c0104cf0] dump_trace+0x63/0x1ec
[17179569.184007]  [c0104e93] show_trace_log_lvl+0x1a/0x30
[17179569.184007]  [c01059ec] show_trace+0x12/0x14
[17179569.184007]  [c0105a45] dump_stack+0x16/0x18
[17179569.184007]  [c01e1eb5] dotest+0x6b/0x3d0
[17179569.184007]  [c01eb249] locking_selftest+0x915/0x1a58
[17179569.184007]  [c048c979] start_kernel+0x1d0/0x2a2
[17179569.184007]  ===
[17179569.184007] 
[17179569.184007]sirq-safe-A = hirqs-on/21:irq event stamp: 462

Re: [REPORT] First glitch1 results, 2.6.21-rc7-git6-CFSv5 + SD 0.46

2007-04-24 Thread Ingo Molnar


* Ed Tomlinson [EMAIL PROTECTED] wrote:

  SD 0.46 1-2 FPS
  cfs v5 nice -19 219-233 FPS
  cfs v5 nice 0   1000-1996
cfs v5 nice -10  60-65 FPS

the problem is, the glxgears portion of this test is an _inverse_ 
testcase.

The reason? glxgears on true 3D hardware will _not_ use X, it will 
directly use the 3D driver of the kernel. So by renicing X to -19 you 
give the xterms more chance to show stuff - the performance of the 
glxgears will 'degrade' - but that is what you asked for: glxgears is 
'just another CPU hog' that competes with X, it's not a true X client.

if you are after glxgears performance in this test then you'll get the 
best performance out of this by renicing X to +19 or even SCHED_BATCH.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Jeremy Fitzhardinge

Andrew Morton wrote:
On Tue, 27 Mar 2007 14:49:20 -0700 Jeremy Fitzhardinge [EMAIL PROTECTED]
wrote:

The softlockup watchdog is currently a nuisance in a virtual machine,
since the whole system could have the CPU stolen from it for a long
period of time. While it would be unlikely for a guest domain to be
denied timer interrupts for over 10s, it could happen and any softlockup
message would be completely spurious.

Earlier I proposed that sched_clock() return time in unstolen
nanoseconds, which is how Xen and VMI currently implement it. If the
softlockup watchdog uses sched_clock() to measure time, it would
automatically ignore stolen time, and therefore only report when the
guest itself locked up. When running native, sched_clock() returns
real-time nanoseconds, so the behaviour would be unchanged.

Note that sched_clock() used this way is inherently per-cpu, so this
patch makes sure that the per-processor watchdog thread initialized
its own timestamp.

This patch
(ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/ignore-stolen-time-in-the-softlockup-watchdog.patch)
causes six failures in the locking self-tests, which I must say is rather
clever of it.

Interesting. Which variation of sched_clock do you have in your tree at
the moment?

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Gene Heskett

On Tuesday 24 April 2007, Ingo Molnar wrote:
* Peter Williams [EMAIL PROTECTED] wrote:
  The cases are fundamentally different in behavior, because in the
  first case, X hardly consumes the time it would get in any scheme,
  while in the second case X really is CPU bound and will happily
  consume any CPU time it can get.

 Which still doesn't justify an elaborate points sharing scheme.
 Whichever way you look at that that's just another way of giving X
 more CPU bandwidth and there are simpler ways to give X more CPU if it
 needs it.  However, I think there's something seriously wrong if it
 needs the -19 nice that I've heard mentioned.

Gene has done some testing under CFS with X reniced to +10 and the
desktop still worked smoothly for him.

As a data point here, and probably nothing to do with X, but I did manage to 
lock it up, solid, reset button time tonight, by wanting 'smart' to get done 
with an update session after amanda had started.  I took both smart processes 
I could see in htop all the way to -19, but when it was about done about 3 
minutes later, everything came to an instant, frozen, reset button required 
lockup.  I should have stopped at -17 I guess. :(

So CFS does not 'need' a reniced 
X. There are simply advantages to negative nice levels: for example
screen refreshes are smoother on any scheduler i tried. BUT, there is a
caveat: on non-CFS schedulers i tried X is much more prone to get into
'overscheduling' scenarios that visibly hurt X's performance, while on
CFS there's a max of 1000-1500 context switches a second at nice -10.
(which, considering the cost of a context switch is well under 1%
overhead.)

So, my point is, the nice level of X for desktop users should not be set
lower than a low limit suggested by that particular scheduler's author.
That limit is scheduler-specific. Con i think recommends a nice level of
-1 for X when using SD [Con, can you confirm?], while my tests show that
if you want you can go as low as -10 under CFS, without any bad
side-effects. (-19 was a bit too much)

 [...]  You might as well just run it as a real time process.

hm, that would be a bad idea under any scheduler (including CFS),
because real time processes can starve other processes indefinitely.

   Ingo



-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
I have discovered that all human evil comes from this, man's being unable
to sit still in a room.
-- Blaise Pascal
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Rogan Dawes


Ingo Molnar wrote:


static void
yield_task_fair(struct rq *rq, struct task_struct *p, struct task_struct *p_to)
{
struct rb_node *curr, *next, *first;
struct task_struct *p_next;

/*
 * yield-to support: if we are on the same runqueue then
 * give half of our wait_runtime (if it's positive) to the other task:
 */
if (p_to  p-wait_runtime  0) {
p-wait_runtime = 1;
p_to-wait_runtime += p-wait_runtime;
}

the above is the basic expression of: charge a positive bank balance. 



[..]

[note, due to the nanoseconds unit there's no rounding loss to worry 
about.]


Surely if you divide 5 nanoseconds by 2, you'll get a rounding loss?


Ingo


Rogan
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Ingo Molnar


* Gene Heskett [EMAIL PROTECTED] wrote:

  Gene has done some testing under CFS with X reniced to +10 and the 
  desktop still worked smoothly for him.
 
 As a data point here, and probably nothing to do with X, but I did 
 manage to lock it up, solid, reset button time tonight, by wanting 
 'smart' to get done with an update session after amanda had started.  
 I took both smart processes I could see in htop all the way to -19, 
 but when it was about done about 3 minutes later, everything came to 
 an instant, frozen, reset button required lockup.  I should have 
 stopped at -17 I guess. :(

yeah, i guess this has little to do with X. I think in your scenario it 
might have been smarter to either stop, or to renice the workloads that 
took away CPU power from others to _positive_ nice levels. Negative nice 
levels can indeed be dangerous.

(Btw., to protect against such mishaps in the future i have changed the 
SysRq-N [SysRq-Nice] implementation in my tree to not only change 
real-time tasks to SCHED_OTHER, but to also renice negative nice levels 
back to 0 - this will show up in -v6. That way you'd only have had to 
hit SysRq-N to get the system out of the wedge.)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 10/10] mm: per device dirty threshold

2007-04-24 Thread Peter Zijlstra

On Tue, 2007-04-24 at 12:58 +1000, Neil Brown wrote:
 On Friday April 20, [EMAIL PROTECTED] wrote:
  Scale writeback cache per backing device, proportional to its writeout 
  speed.
 
 So it works like this:
 
  We account for writeout in full pages.
  When a page has the Writeback flag cleared, we account that as a
  successfully retired write for the relevant bdi.
  By using floating averages we keep track of how many writes each bdi
  has retired 'recently' where the unit of time in which we understand
  'recently' is a single page written.

That is actually that period I keep referring to. So recently is the
last 'period' number of writeout completions.

  We keep a floating average for each bdi, and a floating average for
  the total writeouts (that 'average' is, of course, 1.)

1 in the sense of unity, yes :-)

  Using these numbers we can calculate what faction of 'recently'
  retired writes were retired by each bdi (get_writeout_scale).
 
  Multiplying this fraction by the system-wide number of pages that are
  allowed to be dirty before write-throttling, we get the number of
  pages that the bdi can have dirty before write-throttling the bdi.
 
  I note that the same fraction is *not* applied to background_thresh.
  Should it be?  I guess not - there would be interesting starting
  transients, as a bdi which had done no writeout would not be allowed
  any dirty pages, so background writeout would start immediately,
  which isn't what you want... or is it?

This is something I have not been able to come to a conclusive answer
yet,... 

  For each bdi we also track the number of (dirty, writeback, unstable)
  pages and do not allow this to exceed the limit set for this bdi.
 
  The calculations involving 'reserve' in get_dirty_limits are a little
  confusing.  It looks like you calculating how much total head-room
  there is for the bdi (pages that the system can still dirty - pages
  this bdi has dirty) and making sure the number returned in pbdi_dirty
  doesn't allow more than that to be used.  

Yes, it limits the earned share of the total dirty limit to the possible
share, ensuring that the total dirty limit is never exceeded.

This is especially relevant when the proportions change faster than the
pages get written out, ie. when the period  total dirty limit.

 This is probably a
  reasonable thing to do but it doesn't feel like the right place.  I
  think get_dirty_limits should return the raw threshold, and
  balance_dirty_pages should do both tests - the bdi-local test and the
  system-wide test.

Ok, that makes sense I guess.

  Currently you have a rather odd situation where
 + if (bdi_nr_reclaimable + bdi_nr_writeback = bdi_thresh)
 + break;
  might included numbers obtained with bdi_stat_sum being compared with
  numbers obtained with bdi_stat.

Yes, I was aware of that. The bdi_thresh is based on bdi_stat() numbers,
whereas the others could be bdi_stat_sum(). I think this is ok, since
the threshold is a 'guess' anyway, we just _need_ to ensure we do not
get trapped by writeouts not arriving (due to getting stuck in the per
cpu deltas).  -- I have all this commented in the new version.

  With these patches, the VM still (I think) assumes that each BDI has
  a reasonable queue limit, so that writeback_inodes will block on a
  full queue.  If a BDI has a very large queue, balance_dirty_pages
  will simply turn lots of DIRTY pages into WRITEBACK pages and then
  think We've done our duty without actually blocking at all.

It will block once we exceed the total number of dirty pages allowed for
that BDI. But yes, this does not take away the need for queue limits.

This work was primarily aimed at allowing multiple queues to not
interfere as much, so they all can make progress and not get starved.

  With the extra accounting that we now have, I would like to see
  balance_dirty_pages dirty pages wait until RECLAIMABLE+WRITEBACK is
  actually less than 'threshold'.  This would probably mean that we
  would need to support per-bdi background_writeout to smooth things
  out.  Maybe that it fodder for another patch-set.

Indeed, I still have to wrap my mind around the background thing. Your
input is appreciated.

  You set:
 + vm_cycle_shift = 1 + ilog2(vm_total_pages);
 
  Can you explain that?

You found the one random knob I hid :-)

   My experience is that scaling dirty limits
  with main memory isn't what we really want.  When you get machines
  with very large memory, the amount that you want to be dirty is more
  a function of the speed of your IO devices, rather than the amount
  of memory, otherwise you can sometimes see large filesystem lags
  ('sync' taking minutes?)
 
  I wonder if it makes sense to try to limit the dirty data for a bdi
  to the amount that it can write out in some period of time - maybe 3
  seconds.  Probably configurable.  You seem to have almost all the
  infrastructure in place to do that, and I think it

Re: [patch 1/4] Ignore stolen time in the softlockup watchdog

2007-04-24 Thread Andrew Morton

On Mon, 23 Apr 2007 23:58:20 -0700 Jeremy Fitzhardinge [EMAIL PROTECTED] 
wrote:

 Andrew Morton wrote:
  On Tue, 27 Mar 2007 14:49:20 -0700 Jeremy Fitzhardinge [EMAIL PROTECTED] 
  wrote:
 

  The softlockup watchdog is currently a nuisance in a virtual machine,
  since the whole system could have the CPU stolen from it for a long
  period of time.  While it would be unlikely for a guest domain to be
  denied timer interrupts for over 10s, it could happen and any softlockup
  message would be completely spurious.
 
  Earlier I proposed that sched_clock() return time in unstolen
  nanoseconds, which is how Xen and VMI currently implement it.  If the
  softlockup watchdog uses sched_clock() to measure time, it would
  automatically ignore stolen time, and therefore only report when the
  guest itself locked up.  When running native, sched_clock() returns
  real-time nanoseconds, so the behaviour would be unchanged.
 
  Note that sched_clock() used this way is inherently per-cpu, so this
  patch makes sure that the per-processor watchdog thread initialized
  its own timestamp.
  
 
  This patch
  (ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/ignore-stolen-time-in-the-softlockup-watchdog.patch)
  causes six failures in the locking self-tests, which I must say is rather
  clever of it.

 
 Interesting.

I'll say.

  Which variation of sched_clock do you have in your tree at
 the moment?

Andi's, plus the below fix.

Sigh.  I thought I was only two more bugs away from a release, then...


[18014389.347124] BUG: unable to handle kernel paging request at virtual 
address 6b6b7193
[18014389.347142]  printing eip:
[18014389.347149] c029a80c
[18014389.347156] *pde = 
[18014389.347166] Oops:  [#1]
[18014389.347174] Modules linked in: i915 drm ipw2200 sonypi ipv6 autofs4 hidp 
l2cap bluetooth sunrpc nf_conntrack_netbios_ns ipt_REJECT nf_conntrack_ipv4 
xt_state nf_conntrack nfnetlink xt_tcpudp iptable_filter ip_tables x_tables 
cpufreq_ondemand video sbs button battery asus_acpi ac nvram ohci1394 ieee1394 
ehci_hcd uhci_hcd sg joydev snd_hda_intel snd_seq_dummy snd_seq_oss 
snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm 
sr_mod cdrom snd_timer ieee80211 i2c_i801 piix ieee80211_crypt i2c_core generic 
snd soundcore snd_page_alloc ext3 jbd ide_disk ide_core
[18014389.347520] CPU:0
[18014389.347521] EIP:0060:[c029a80c]Tainted: G  D VLI
[18014389.347522] EFLAGS: 00010296   (2.6.21-rc7-mm1 #35)
[18014389.347547] EIP is at input_release_device+0x8/0x4e
[18014389.347555] eax: c99709a8   ebx: 6b6b6b6b   ecx: 0286   edx: 
[18014389.347563] esi: 6b6b6b6b   edi: c99709cc   ebp: c21e3d40   esp: c21e3d38
[18014389.347571] ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
[18014389.347580] Process khubd (pid: 159, ti=c21e2000 task=c20a62f0 
task.ti=c21e2000)
[18014389.347588] Stack: 6b6b6b6b c99709a8 c21e3d60 c029b489 c2014ec8 c9182000 
c96b167c c9970954 
[18014389.347655]c9970954 c99709cc c21e3d80 c029d401 c9977a6c c96b1000 
c21e3d90 c9970954 
[18014389.347708]c99709a8 c9164000 c21e3d90 c029d4b5 c96b1000 c9970564 
c21e3db0 c029c50b 
[18014389.347771] Call Trace:
[18014389.347792]  [c029b489] input_close_device+0x13/0x51
[18014389.347810]  [c029d401] mousedev_destroy+0x29/0x7e
[18014389.347827]  [c029d4b5] mousedev_disconnect+0x5f/0x63
[18014389.347842]  [c029c50b] input_unregister_device+0x6a/0x100
[18014389.347858]  [c02abf9c] hidinput_disconnect+0x24/0x41
[18014389.347874]  [c02aef29] hid_disconnect+0x79/0xc9
[18014389.347889]  [c028e1db] usb_unbind_interface+0x47/0x8f
[18014389.347916]  [c0256852] __device_release_driver+0x74/0x90
[18014389.347933]  [c0256c5f] device_release_driver+0x37/0x4e
[18014389.347957]  [c02561c6] bus_remove_device+0x73/0x82
[18014389.347977]  [c02547c1] device_del+0x214/0x28c
[18014389.348132]  [c028bb72] usb_disable_device+0x62/0xc2
[18014389.348148]  [c0288893] usb_disconnect+0x99/0x126
[18014389.348163]  [c0288d2c] hub_thread+0x3a5/0xb07
[18014389.348178]  [c012cbe5] kthread+0x6e/0x79
[18014389.348194]  [c0104917] kernel_thread_helper+0x7/0x10
[18014389.348210]  ===
[18014389.348218] INFO: lockdep is turned off.
[18014389.348224] Code: 5b 5d c3 55 b9 f0 ff ff ff 8b 50 0c 89 e5 83 ba 28 06 
00 00 00 75 08 89 82 28 06 00 00 31 c9 5d 89 c8 c3 55 89 e5 56 53 8b 70 0c 39 
86 28 06 00 00 75 3a 8b 9e e4 08 00 00 c7 86 28 06 00 00 00 

I dunno.  I'll keep plugging for another couple hours then I'll shove
out what I have as a -mm snapshot whatsit.

Things are just ridiculous.  I'm thinking of having a hard-disk crash and
accidentally losing everything.



From: Andrew Morton [EMAIL PROTECTED]

WARNING: arch/x86_64/kernel/built-in.o - Section mismatch: reference to 
.init.text:sc_cpu_event from .data between 'sc_cpu_notifier' (at offset 0x2110) 
and 'mcelog'

Use hotcpu_notifier().  This takes care of making sure that the unused code

How do you send a reply to an email you have deleted.

2007-04-24 Thread lkml777

How do you send a reply to an email you have deleted?
-- 
  
  [EMAIL PROTECTED]

-- 
http://www.fastmail.fm - I mean, what is it about a decent email service?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Gene Heskett

On Tuesday 24 April 2007, Ingo Molnar wrote:
* Gene Heskett [EMAIL PROTECTED] wrote:
  Gene has done some testing under CFS with X reniced to +10 and the
  desktop still worked smoothly for him.

 As a data point here, and probably nothing to do with X, but I did
 manage to lock it up, solid, reset button time tonight, by wanting
 'smart' to get done with an update session after amanda had started.
 I took both smart processes I could see in htop all the way to -19,
 but when it was about done about 3 minutes later, everything came to
 an instant, frozen, reset button required lockup.  I should have
 stopped at -17 I guess. :(

yeah, i guess this has little to do with X. I think in your scenario it
might have been smarter to either stop, or to renice the workloads that
took away CPU power from others to _positive_ nice levels. Negative nice
levels can indeed be dangerous.

(Btw., to protect against such mishaps in the future i have changed the
SysRq-N [SysRq-Nice] implementation in my tree to not only change
real-time tasks to SCHED_OTHER, but to also renice negative nice levels
back to 0 - this will show up in -v6. That way you'd only have had to
hit SysRq-N to get the system out of the wedge.)

   Ingo

That sounds handy, particularly with idiots like me at the wheel...


-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
When a Banker jumps out of a window, jump after him--that's where the money 
is.
-- Robespierre
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Ingo Molnar


* Gene Heskett [EMAIL PROTECTED] wrote:

  (Btw., to protect against such mishaps in the future i have changed 
  the SysRq-N [SysRq-Nice] implementation in my tree to not only 
  change real-time tasks to SCHED_OTHER, but to also renice negative 
  nice levels back to 0 - this will show up in -v6. That way you'd 
  only have had to hit SysRq-N to get the system out of the wedge.)
 
 That sounds handy, particularly with idiots like me at the wheel...

by that standard i guess we tinkerers are all idiots ;)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread David Lang


On Tue, 24 Apr 2007, Ingo Molnar wrote:


* Gene Heskett [EMAIL PROTECTED] wrote:


Gene has done some testing under CFS with X reniced to +10 and the
desktop still worked smoothly for him.


As a data point here, and probably nothing to do with X, but I did
manage to lock it up, solid, reset button time tonight, by wanting
'smart' to get done with an update session after amanda had started.
I took both smart processes I could see in htop all the way to -19,
but when it was about done about 3 minutes later, everything came to
an instant, frozen, reset button required lockup.  I should have
stopped at -17 I guess. :(


yeah, i guess this has little to do with X. I think in your scenario it
might have been smarter to either stop, or to renice the workloads that
took away CPU power from others to _positive_ nice levels. Negative nice
levels can indeed be dangerous.

(Btw., to protect against such mishaps in the future i have changed the
SysRq-N [SysRq-Nice] implementation in my tree to not only change
real-time tasks to SCHED_OTHER, but to also renice negative nice levels
back to 0 - this will show up in -v6. That way you'd only have had to
hit SysRq-N to get the system out of the wedge.)


if you are trying to unwedge a system it may be a good idea to renice all tasks 
to 0, it could be that a task at +19 is holding a lock that something else is 
waiting for.


David Lang
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] x86_64: Reflect the relocatability of the kernel in the ELF header.

2007-04-24 Thread Eric W. Biederman

Vivek Goyal [EMAIL PROTECTED] writes:

 On Sun, Apr 22, 2007 at 11:12:13PM -0600, Eric W. Biederman wrote:
 
 Currently because vmlinux does not reflect that the kernel is relocatable
 we still have to support CONFIG_PHYSICAL_START.  So this patch adds a small
 c program to do what we cannot do with a linker script, set the elf header
 type to ET_DYN.
 
 This should remove the last obstacle to removing CONFIG_PHYSICAL_START
 on x86_64.
 
 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]

 [Dropping fastboot mailing list from CC as kexec mailing list is new list
  for this discussion]

 [..]
 +void file_open(const char *name)
 +{
 +if ((fd = open(name, O_RDWR, 0))  0)
 +die(Unable to open `%s': %m, name);
 +}
 +
 +static void mketrel(void)
 +{
 +unsigned char e_type[2];
 +if (read(fd, e_ident, sizeof(e_ident)) != sizeof(e_ident))
 +die(Cannot read ELF header: %s\n, strerror(errno));
 +
 +if (memcmp(e_ident, ELFMAG, 4) != 0)
 +die(No ELF magic\n);
 +
 +if ((e_ident[EI_CLASS] != ELFCLASS64) 
 +(e_ident[EI_CLASS] != ELFCLASS32))
 +die(Unrecognized ELF class: %x\n, e_ident[EI_CLASS]);
 +
 +if ((e_ident[EI_DATA] != ELFDATA2LSB) 
 +(e_ident[EI_DATA] != ELFDATA2MSB))
 +die(Unrecognized ELF data encoding: %x\n, e_ident[EI_DATA]);
 +
 +if (e_ident[EI_VERSION] != EV_CURRENT)
 +die(Unknown ELF version: %d\n, e_ident[EI_VERSION]);
 +
 +if (e_ident[EI_DATA] == ELFDATA2LSB) {
 +e_type[0] = ET_REL  0xff;
 +e_type[1] = ET_REL  8;
 +} else {
 +e_type[1] = ET_REL  0xff;
 +e_type[0] = ET_REL  8;
 +}

 Hi Eric,

 Should this be ET_REL or ET_DYN? kexec refuses to load this vmlinux
 as it does not find it to be executable type.

Doh.  It should be ET_DYN.  I had relocatable much to much on the brain,
and so I stuffed in the wrong type.

 I am not well versed with various conventions but if I go through Executable
 and Linking Format document, this is what it says about various file types.

 • A relocatable file holds code and data suitable for linking with other
   object files to create an executable or a shared object file.

 • An executable file holds a program suitable for execution.

 • A shared object file holds code and data suitable for linking in two
   contexts. First, the link editor may process it with other relocatable and
   shared object files to create another object file. Second, the dynamic
   linker combines it with an executable file and other shared objects
   to create a process image.

 So above does not seem to fit in the ET_REL type. We can't relink this
 vmlinux? And it does not seem to fit in ET_DYN definition too. We are
 not relinking this vmlinux with another executable or other relocatable
 files.

 I remember once you mentioned the term dynamic executable which can be
 loaded at a non-compiled address and let run without requiring any
 relocation processing. This vmlinux will fall in that category but can't 
 relate it to standard elf file definitions.

Sorry about that.  

ET_DYN without a PT_DYNAMIC segment, without a PT_INTERP segment,
and with a valid entry point is exactly that.  Loaders never perform
relocation processing on a ET_DYN executable but they are allowed to
shift all of the addresses by a single delta so long as all of the
alignment restrictions are honored.

Relocation processing when it happens comes from the dynamic linker,
which is set in PT_INTERP and the dynamic linker looks a PT_DYNAMIC
to figure out what relocations are available for processing.

The basic issue is that ld don't really comprehend what we are doing
since we are building a position independent executable in a way
that the normal tools don't allow, so we have to poke the header.

If we had compiled with -fPIC we could have specified -pie or
--pic-executable to ld and it would have done the right thing.
But as it is our executable only changes physical addresses and
not virtual addresses something completely foreign to ld.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Ingo Molnar


* David Lang [EMAIL PROTECTED] wrote:

  (Btw., to protect against such mishaps in the future i have changed 
  the SysRq-N [SysRq-Nice] implementation in my tree to not only 
  change real-time tasks to SCHED_OTHER, but to also renice negative 
  nice levels back to 0 - this will show up in -v6. That way you'd 
  only have had to hit SysRq-N to get the system out of the wedge.)
 
 if you are trying to unwedge a system it may be a good idea to renice 
 all tasks to 0, it could be that a task at +19 is holding a lock that 
 something else is waiting for.

Yeah, that's possible too, but +19 tasks are getting a small but 
guaranteed share of the CPU so eventually it ought to release it. It's 
still a possibility, but i think i'll wait for a specific incident to 
happen first, and then react to that incident :-)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Ingo Molnar


* Ingo Molnar [EMAIL PROTECTED] wrote:

 yeah, i guess this has little to do with X. I think in your scenario 
 it might have been smarter to either stop, or to renice the workloads 
 that took away CPU power from others to _positive_ nice levels. 
 Negative nice levels can indeed be dangerous.

btw., was X itself at nice 0 or nice -10 when the lockup happened?

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Ingo Molnar


* Rogan Dawes [EMAIL PROTECTED] wrote:

 if (p_to  p-wait_runtime  0) {
 p-wait_runtime = 1;
 p_to-wait_runtime += p-wait_runtime;
 }
 
 the above is the basic expression of: charge a positive bank balance. 
 
 
 [..]
 
  [note, due to the nanoseconds unit there's no rounding loss to worry 
  about.]
 
 Surely if you divide 5 nanoseconds by 2, you'll get a rounding loss?

yes. But not that we'll only truly have to worry about that when we'll 
have context-switching performance in that range - currently it's at 
least 2-3 orders of magnitude above that. Microseconds seemed to me to 
be too coarse already, that's why i picked nanoseconds and 64-bit 
arithmetics for CFS.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH]Fix parsing kernelcore boot option for ia64

2007-04-24 Thread Yasunori Goto

Mel-san.

I tested your patch (Thanks!). It worked. But..

 In my understanding, why ia64 doesn't use early_param() macro for mem= at el. 
 is that 
 it has to use mem= option at efi handling which is called before 
 parse_early_param().
 
 Current ia64's boot path is
  setup_arch()
 - efi handling - parse_early_param() - numa handling - pgdat/zone init
 
 kernelcore= option is just used at pgdat/zone initialization. (no arch 
 dependent part...)
 
 So I think just adding
 ==
 early_param(kernelcore,cmpdline_parse_kernelcore)
 ==
 to ia64 is ok.

Then, it can be common code.
How is this patch? I confirmed this can work well too.



When kernelcore boot option is specified, kernel can't boot up
on ia64. It is cause of eternal loop.
In addition, its code can be common code. This is fix for it.
I tested this patch on my ia64 box.


Signed-off-by: Yasunori Goto [EMAIL PROTECTED]

-

 arch/i386/kernel/setup.c   |1 -
 arch/ia64/kernel/efi.c |2 --
 arch/powerpc/kernel/prom.c |1 -
 arch/ppc/mm/init.c |2 --
 arch/x86_64/kernel/e820.c  |1 -
 include/linux/mm.h |1 -
 mm/page_alloc.c|3 +++
 7 files changed, 3 insertions(+), 8 deletions(-)

Index: kernelcore/arch/ia64/kernel/efi.c
===
--- kernelcore.orig/arch/ia64/kernel/efi.c  2007-04-24 15:09:37.0 
+0900
+++ kernelcore/arch/ia64/kernel/efi.c   2007-04-24 15:25:22.0 +0900
@@ -423,8 +423,6 @@ efi_init (void)
mem_limit = memparse(cp + 4, cp);
} else if (memcmp(cp, max_addr=, 9) == 0) {
max_addr = GRANULEROUNDDOWN(memparse(cp + 9, cp));
-   } else if (memcmp(cp, kernelcore=,11) == 0) {
-   cmdline_parse_kernelcore(cp+11);
} else if (memcmp(cp, min_addr=, 9) == 0) {
min_addr = GRANULEROUNDDOWN(memparse(cp + 9, cp));
} else {
Index: kernelcore/arch/i386/kernel/setup.c
===
--- kernelcore.orig/arch/i386/kernel/setup.c2007-04-24 15:29:20.0 
+0900
+++ kernelcore/arch/i386/kernel/setup.c 2007-04-24 15:29:39.0 +0900
@@ -195,7 +195,6 @@ static int __init parse_mem(char *arg)
return 0;
 }
 early_param(mem, parse_mem);
-early_param(kernelcore, cmdline_parse_kernelcore);
 
 #ifdef CONFIG_PROC_VMCORE
 /* elfcorehdr= specifies the location of elf core header
Index: kernelcore/arch/powerpc/kernel/prom.c
===
--- kernelcore.orig/arch/powerpc/kernel/prom.c  2007-04-24 15:04:47.0 
+0900
+++ kernelcore/arch/powerpc/kernel/prom.c   2007-04-24 15:30:25.0 
+0900
@@ -431,7 +431,6 @@ static int __init early_parse_mem(char *
return 0;
 }
 early_param(mem, early_parse_mem);
-early_param(kernelcore, cmdline_parse_kernelcore);
 
 /*
  * The device tree may be allocated below our memory limit, or inside the
Index: kernelcore/arch/ppc/mm/init.c
===
--- kernelcore.orig/arch/ppc/mm/init.c  2007-04-24 15:04:47.0 +0900
+++ kernelcore/arch/ppc/mm/init.c   2007-04-24 15:30:56.0 +0900
@@ -214,8 +214,6 @@ void MMU_setup(void)
}
 }
 
-early_param(kernelcore, cmdline_parse_kernelcore);
-
 /*
  * MMU_init sets up the basic memory mappings for the kernel,
  * including both RAM and possibly some I/O regions,
Index: kernelcore/arch/x86_64/kernel/e820.c
===
--- kernelcore.orig/arch/x86_64/kernel/e820.c   2007-04-24 15:04:47.0 
+0900
+++ kernelcore/arch/x86_64/kernel/e820.c2007-04-24 15:34:02.0 
+0900
@@ -604,7 +604,6 @@ static int __init parse_memopt(char *p)
return 0;
 } 
 early_param(mem, parse_memopt);
-early_param(kernelcore, cmdline_parse_kernelcore);
 
 static int userdef __initdata;
 
Index: kernelcore/include/linux/mm.h
===
--- kernelcore.orig/include/linux/mm.h  2007-04-24 15:09:37.0 +0900
+++ kernelcore/include/linux/mm.h   2007-04-24 15:35:52.0 +0900
@@ -1051,7 +1051,6 @@ extern unsigned long find_max_pfn_with_a
 extern void free_bootmem_with_active_regions(int nid,
unsigned long max_low_pfn);
 extern void sparse_memory_present_with_active_regions(int nid);
-extern int cmdline_parse_kernelcore(char *p);
 #ifndef CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
 extern int early_pfn_to_nid(unsigned long pfn);
 #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */
Index: kernelcore/mm/page_alloc.c
===
--- kernelcore.orig/mm/page_alloc.c 2007-04-24 15:09:37.0 +0900
+++ kernelcore/mm/page_alloc.c  2007-04-24 16:00:21.0 +0900
@@

Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-24 Thread Ingo Molnar


* Ingo Molnar [EMAIL PROTECTED] wrote:

 [...] That way you'd only have had to hit SysRq-N to get the system 
 out of the wedge.)

small correction: Alt-SysRq-N.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] i802.11: fixed memory leak on multicasts

2007-04-24 Thread Markus Pietrek


Hi,

socket buffers were not always freed when receiving multicasts

Bye,
--
Markus Pietrek
Lead Software Engineer
Phone: +49-7667-908-501, Fax: +49-7667-908-200
mailto:[EMAIL PROTECTED]

FS Forth-Systeme GmbH
A Digi International Company
Kueferstr. 8, 79206 Breisach, Germany
Tax: 07008/12000 / VAT: DE142208834 / Reg. Amtsgericht Freiburg HRB 290212
Directors: Klaus Flesch, Subramanian Krishnan, Dieter Vesper
http://www.digi.com
Index: net/ieee80211/ieee80211_rx.c
===
RCS file: 
/data/vcs/cvs/fsforth_products/LxNETES/linux/net/ieee80211/ieee80211_rx.c,v
retrieving revision 1.5
retrieving revision 1.6
diff -c -r1.5 -r1.6
*** net/ieee80211/ieee80211_rx.c13 Apr 2007 12:39:38 -  1.5
--- net/ieee80211/ieee80211_rx.c23 Apr 2007 15:51:28 -  1.6
***
*** 860,868 
break;
}
  
!   if (is_packet_for_us)
if (!ieee80211_rx(ieee, skb, stats))
dev_kfree_skb_irq(skb);
return;
  
  drop_free:
--- 860,871 
break;
}
  
!   if (is_packet_for_us) {
if (!ieee80211_rx(ieee, skb, stats))
dev_kfree_skb_irq(skb);
+ } else
+ dev_kfree_skb_irq(skb);
+ 
return;
  
  drop_free:

cfs works fine for me

2007-04-24 Thread Hemmann, Volker Armin

Hello,

I have tried the cfs patches with 2.6.20.7 in the last days.

I am using KDE 3.5.6, gentoo unstable and have a dual core AMD64 system with 
1GB ram and a nvidia card (using the closed source drivers, yes I suck, but I 
love playing 3d games once in a while).

I don't have interactivity problems with plain kernel.org kernels (except when 
swapping a lot, swapping really sucks)
My system works well and is stable.

With the cfs patches, my system continues to work well. I have not seen any 
regressions, desktop is snappy, emerge'ing stuff (niced to +19), does not 
hurt and unreal tournament 2004 is as fast (or slow, depends on the 
situation) as always. It even looks like FPS under heavy stress (like 
onslaught torlan when lots of bots and me are fighting at a powernode), don't 
go down as low as with the mainline scheduler. Not a big difference, but it 
is there (20-25 with plain kernel.org kernel in extrem situations compared to 
30 with the cfs patches). Maybe I did not hit the worst case, playing is a 
little bit restricted at the moment - my wrist and ellbow hate me, but it 
looks promising. Apart from the worst case scenrios, FPS are more or less the 
same.

My usage consisted of surfing the web with konqueror, watching videos with 
xine and mplayer, using kmail (with tens of thousands of mails in different 
folders), looking at pictures with kuickshow, installing XFCE, asorted 
updates, typing lots and lots of stuff in kate and web forums, listening to 
mp3/ogg with amarok, playing pysol/kpat/lgeneral/wesnoth/ut2004/freecol, a 
lot of that parallel (not ut2004... I don't want to hurt my precious fps...).

Again, my system worked fine with the 'normal' scheduler, from the stuff I 
read in the lkml archives I must be some special kind of guy, so there was no 
improvement on the 'feels snappy or not' front, but there are also no 
regressions. So from my point of view, everything is fine with cfs and I 
would not mind having it as default scheduler. 

If you want specs of my hardware, my kernel config or any other information, 
just send me an email. I am not subscribed to lkml, nor can I read any of its 
archives in the next couple of days, which is one reason why I don't answer 
to one at the existing threads (I don't even know if there are some at the 
moment), so in case of an answer cc'ing me would be nice.

Glück Auf
Volker
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[REPORT] cfs-v5 vs sd-0.46

2007-04-24 Thread Michael Gerdau

Hi list,

with cfs-v5 finally booting on my machine I have run my daily
numbercrunching jobs on both cfs-v5 and sd-0.46, 2.6.21-v7 on
top of a stock openSUSE 10.2 (X86_64). Config for both kernel
is the same except for the X boost option in cfs-v5 which on
my system didn't work (X still was @ -19; I understand this will
be fixed in -v6). HZ is 250 in both.

System is a Dell XPS M1710, Intel Core2 2.33GHz, 4GB,
NVIDIA GeForce Go 7950 GTX with proprietary driver 1.0-9755

I'm running three single threaded perl scripts that do double
precision floating point math with little i/o after initially
loading the data.

Both cfs and sd showed very similar behavior when monitored in top.
I'll show more or less representative excerpt from a 10 minutes
log, delay 3sec.

sd-0.46
top - 00:14:24 up  1:17,  9 users,  load average: 4.79, 4.95, 4.80
Tasks:   3 total,   3 running,   0 sleeping,   0 stopped,   0 zombie
Cpu(s): 99.8%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.2%hi,  0.0%si,  0.0%st
Mem:   3348628k total,  1648560k used,  1700068k free,64392k buffers
Swap:  2097144k total,0k used,  2097144k free,   828204k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND

 
 6671 mgd   33   0 95508  22m 3652 R  100  0.7  44:28.11 perl   

 
 6669 mgd   31   0 95176  22m 3652 R   50  0.7  43:50.02 perl   

 
 6674
 mgd   31   0 95368  22m 3652 R   50  0.7  47:55.29 perl



cfs-v5
top - 08:07:50 up 21 min,  9 users,  load average: 4.13, 4.16, 3.23
Tasks:   3 total,   3 running,   0 sleeping,   0 stopped,   0 zombie
Cpu(s): 99.5%us,  0.2%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:   3348624k total,  1193500k used,  2155124k free,32516k buffers
Swap:  2097144k total,0k used,  2097144k free,   545568k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND

 
 6357 mgd   20   0 92024  19m 3652 R  100  0.6   8:54.21 perl   

 
 6356 mgd   20   0 91652  18m 3652 R   50  0.6  10:35.52 perl   

 
 6359 mgd   20   0 91700  18m 3652 R   50  0.6   8:47.32 perl   

 

What did surprise me is that cpu utilization had been spread 100/50/50
(round robin) most of the time. I did expect 66/66/66 or so.

What I also don't understand is the difference in load average, sd
constantly had higher values, the above figures are representative
for the whole log. I don't know which is better though.


Here are excerpts from a concurrently run vmstat 3 200:

sd-0.46
procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 5  0  0 1702928  63664 82787600 067  458 1350 100  0  0  0
 3  0  0 1702928  63684 82787600 089  468 1362 100  0  0  0
 5  0  0 1702680  63696 82787600 0   132  461 1598 99  1  0  0
 8  0  0 1702680  63712 82789200 080  465 1180 99  1  0  0
 3  0  0 1702712  63732 82788400 067  453 1005 100  0  0  0
 4  0  0 1702792  63744 82792000 041  461 1138 100  0  0  0
 3  0  0 1702792  63760 82791600 057  456 1073 100  0  0  0
 3  0  0 1702808  63776 82792800 0   111  473 1095 100  0  0  0
 3  0  0 1702808  63788 82792800 081  461 1092 99  1  0  0
 3  0  0 1702188  63808 82792800 0   160  463 1437 99  1  0  0
 3  0  0 1702064  63884 82790000 0   229  479 1125 99  0  0  0
 4  0  0 1702064  63912 82797200 177  460 1108 100  0  0  0
 7  0  0 1702032  63920 82800000 040  463 1068 100  0  0  0
 4  0  0 1702048  63928 82800800 068  454 1114 100  0  0  0
11  0  0 1702048  63928 82800800 0 0  458 1001 100  0  0  0
 3  0  0 1701500  63960 82802000 0

Re: [PATCH] powerpc pseries eeh: Convert to kthread API

2007-04-24 Thread Cornelia Huck

On Tue, 24 Apr 2007 15:00:42 +1000,
Benjamin Herrenschmidt [EMAIL PROTECTED] wrote:

 Like anything else, modules should have separated the entrypoints for
 
  - Initiating a removal request
  - Releasing the module
 
 The former is use did rmmod, can unregister things from subsystems,
 etc... (and can file if the driver decides to refuse removal requests
 when it's busy doing things or whatever policy that module wants to
 implement).
 
 The later is called when all references to the modules have been
 dropped, it's a bit like the kref release (and could be implemented as
 one).

That sounds quite similar to the problems we have with kobject
refcounting vs. module unloading. The patchset I posted at
http://marc.info/?l=linux-kernelm=117679014404994w=2 exposes the
refcount of the kobject embedded in the module. Maybe the kthread code
could use that reference as well?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NonExecutable Bit in 32Bit

2007-04-24 Thread Tuncer Ayaz


On 4/24/07, William Heimbigner [EMAIL PROTECTED] wrote:

On Tue, 24 Apr 2007, Cestonaro, Thilo (external) wrote:

 Hey,

 is it right, that the NX Bit is not used under i386-Arch but
 under x86_64-Arch?
 When yes, is there a special argument for it not to be used?

 Ciao Thilo
I don't think so - some i386 cpus definitely have support for
the NX bit.



In detail:
1) if your CPU has NX support (some 32bit Xeons do)
2) it is not disabled in the BIOS
3) you see 'nx' in the 'flags' line in /proc/cpuinfo
4) and you have a kernel with the following config options
CONFIG_HIGHMEM64G=y
CONFIG_HIGHMEM=y
CONFIG_X86_PAE=y

NX should just work.

[snip]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ofa-general] [PATCH] eHCA: Add Modify Port verb

2007-04-24 Thread Christoph Raisch


Hi Hal,
you are correct,
with the current firmware version it will fail later.

Christoph R.

[EMAIL PROTECTED] wrote on 23.04.2007 18:55:59:

 Hi Joachim,

 On Mon, 2007-04-23 at 12:23, Joachim Fenkes wrote:
  Add Modify Port verb support to eHCA driver.
  ib_cm needs this to initialize properly.

 I didn't think IB_PORT_SM was allowed (as QP0 is not exposed) or does
 this just fail later when it is attempted to be actually set ?

 -- Hal

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/9] Kconfig: cleanup s390 v2.

2007-04-24 Thread Martin Schwidefsky

On Mon, 2007-04-23 at 10:45 -0700, Andrew Morton wrote:
  Andrew: I plan to add patches 1-5 to the for-andrew branch of the
  git390 repository if that is fine with you. The only thing that will
  be missing in the tree is the patch that disables wireless for s390.
  The code does compile but without hardware it is mute to have the
  config options. I'll wait until the git-wireless.patch is upstream.
  Patches 7-9 depend on patches found in -mm.
  
 
 umm, OK.  If it's Ok I think I'll duck it for now: -mm is full.
 
 Over-full, really: I've been working basically continuously since Friday
 getting the current dungpile to compile and boot, and it's still miles away
 from that.

I understand. I'll wait until -mm is a little bit smaller again. It is
just that someday I want to finish with the Kconfig cleanup, it has been
sitting on my harddriver for ages now.

-- 
blue skies,  IBM Deutschland Entwicklung GmbH
   MartinVorsitzender des Aufsichtsrats: Johann Weihen
 Geschäftsführung: Herbert Kircher
Martin Schwidefsky   Sitz der Gesellschaft: Böblingen
Linux on zSeries Registergericht: Amtsgericht Stuttgart,
   Development   HRB 243294

Reality continues to ruin my life. - Calvin.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [REPORT] cfs-v5 vs sd-0.46

2007-04-24 Thread Ingo Molnar


* Michael Gerdau [EMAIL PROTECTED] wrote:

 I'm running three single threaded perl scripts that do double 
 precision floating point math with little i/o after initially loading 
 the data.

thanks for the testing!

 What I also don't understand is the difference in load average, sd 
 constantly had higher values, the above figures are representative for 
 the whole log. I don't know which is better though.

hm, it's hard from here to tell that. What load average does the vanilla 
kernel report? I'd take that as a reference.

 Here are excerpts from a concurrently run vmstat 3 200:
 
 sd-0.46
 procs ---memory-- ---swap-- -io -system-- cpu
  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
  5  0  0 1702928  63664 82787600 067  458 1350 100  0  0   0
  3  0  0 1702928  63684 82787600 089  468 1362 100  0  0   0
  5  0  0 1702680  63696 82787600 0   132  461 1598 99  1  0  0
  8  0  0 1702680  63712 82789200 080  465 1180 99  1  0  0

 cfs-v5
 procs ---memory-- ---swap-- -io -system-- cpu
  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
  6  0  0 2157728  31816 54523600 0   103  543  748 100  0  0   0
  4  0  0 2157780  31828 54525600 063  435  752 100  0  0   0
  4  0  0 2157928  31852 54525600 0   105  424  770 100  0  0   0
  4  0  0 2157928  31868 54526800 0   261  457  763 100  0  0   0

interesting - CFS has half the context-switch rate of SD. That is 
probably because on your workload CFS defaults to longer 'timeslices' 
than SD. You can influence the 'timeslice length' under SD via 
/proc/sys/kernel/rr_interval (milliseconds units) and under CFS via 
/proc/sys/kernel/sched_granularity_ns. On CFS the value is not 
necessarily the timeslice length you will observe - for example in your 
workload above the granularity is set to 5 msec, but your rescheduling 
rate is 13 msecs. SD default to a rr_interval value of 8 msecs, which in 
your workload produces a timeslice length of 6-7 msecs.

so to be totally 'fair' and get the same rescheduling 'granularity' you 
should probably lower CFS's sched_granularity_ns to 2 msecs.

 Last not least I'd like to add that at least on my system having X 
 niced to -19 does result in kind of erratic (for lack of a better 
 word) desktop behavior. I'll will reevaluate this with -v6 but for now 
 IMO nicing X to -19 is a regression at least on my machine despite the 
 claim that cfs doesn't suffer from it.

indeed with -19 the rescheduling limit is so high under CFS that it does 
not throttle X's scheduling rate enough and so it will make CFS behave 
as badly as other schedulers.

I retested this with -10 and it should work better with that. In -v6 i 
changed the default to -10 too.

 PS: Only learning how to test these things I'm happy to get pointed 
 out the shortcomings of what I tested above. Of course suggestions for 
 improvements are welcome.

your report was perfectly fine and useful. no visible regressions is 
valuable feedback too. [ In fact, such type of feedback is the one i 
find the easiest to resolve ;-) ]

Since you are running number-crunchers you might be able to give 
performacne feedback too: do you have any reliable 'performance metric' 
available for your number cruncher jobs (ops per minute, runtime, etc.) 
so that it would be possible to compare number-crunching performance of 
mainline to SD and to CFS as well? If that value is easy to get and 
reliable/stable enough to be meaningful. (And it would be nice to also 
establish some ballpark figure about how much noise there is in any 
performance metric, so that we can see whether any differences between 
schedulers are systematic or not.)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

cpufreq default governor

2007-04-24 Thread William Heimbigner

Question: is there some reason that kconfig does not allow for default 
governors of conservative/ondemand/powersave?
I'm not aware of any reason why one of those governors could not be used 
as default.


William Heimbigner
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-rc7: BUG: sleeping function called from invalid context at net/core/sock.c:1523

2007-04-24 Thread Jiri Kosina

On Tue, 24 Apr 2007, Herbert Xu wrote:

  Hmm, *sigh*. I guess the patch below fixes the problem, but it is a 
  masterpiece in the field of ugliness. And I am not sure whether it is 
  completely correct either. Are there any immediate ideas for better 
  solution with respect to how struct sock locking works?
 Please cc such patches to netdev.  Thanks.

Hi Herbert,

well it's pretty much bluetooth-specific, and bluez-devel was CCed, but 
OK.

  diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
  index 71f5cfb..c5c93cd 100644
  --- a/net/bluetooth/hci_sock.c
  +++ b/net/bluetooth/hci_sock.c
  @@ -656,7 +656,10 @@ static int hci_sock_dev_event(struct notifier_block 
  *this, unsigned long event,
 /* Detach sockets from device */
 read_lock(hci_sk_list.lock);
 sk_for_each(sk, node, hci_sk_list.head) {
  -   lock_sock(sk);
  +   if (in_atomic())
  +   bh_lock_sock(sk);
  +   else
  +   lock_sock(sk);
 
 This doesn't do what you think it does.  bh_lock_sock can still succeed
 even with lock_sock held by someone else.

I know, this was precisely the reason why I converted the bh_lock_sock() 
to lock_sock() here some time ago (as it was racy with 
l2cap_connect_cfm()).

 Does this need to occur immediately when an event occurs? If not I'd
 suggest moving this into a workqueue.

Will have to check whether this will be processed properly in time when 
going to suspend.

Thanks,

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/7] libata: check for AN support

2007-04-24 Thread Tejun Heo

Hello,

Kristen Carlson Accardi wrote:
  static unsigned int ata_print_id = 1;
 @@ -1744,6 +1745,23 @@ int ata_dev_configure(struct ata_device 
   }
   dev-cdb_len = (unsigned int) rc;
  
 + /*
 +  * check to see if this ATAPI device supports
 +  * Asynchronous Notification
 +  */
 + if ((ap-flags  ATA_FLAG_AN)  ata_id_has_AN(id))
 + {
 + /* issue SET feature command to turn this on */
 + rc = ata_dev_set_AN(dev);

Please don't store err_mask into int rc.  Please store it to a separate
err_mask variable and report it when printing error message.

 + if (rc) {
 + ata_dev_printk(dev, KERN_ERR,
 + unable to set AN\n);
 + rc = -EINVAL;

Wouldn't -EIO be more appropriate?

 + goto err_out_nosup;
 + }
 + dev-flags |= ATA_DFLAG_AN;
 + }
 +

Not NACKing.  Just notes for future improvements.  We need to be more
careful here.  ATA/ATAPI world is filled with braindamaged devices and I
bet there are devices which advertises it can do AN but chokes when AN
is enabled.

This should be handled similarly to ACPI failure.  Currently ACPI does
the following.

1. try once, if fail, record that ACPI failed.  return error to trigger
retry.
2. try again, if fail again, ignore error if possible (!FROZEN) and turn
off ACPI.

This fallback mechanism for optional features can probably be
generalized and used for both ACPI and AN.

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/7] libata: check for AN support

2007-04-24 Thread Alan Cox

 + /*
 +  * check to see if this ATAPI device supports
 +  * Asynchronous Notification
 +  */
 + if ((ap-flags  ATA_FLAG_AN)  ata_id_has_AN(id))
 + {

Bracketing police ^^^

 + /* issue SET feature command to turn this on */
 + rc = ata_dev_set_AN(dev);
 + if (rc) {
 + ata_dev_printk(dev, KERN_ERR,
 + unable to set AN\n);
 + rc = -EINVAL;
 + goto err_out_nosup;

How fatal is this - do we need to ignore the device at this point or
should we just pretend (possibly correctly) that the device itself does
not support notification. 

 @@ -299,6 +305,8 @@ struct ata_taskfile {
  #define ata_id_queue_depth(id)   (((id)[75]  0x1f) + 1)
  #define ata_id_removeable(id)((id)[0]  (1  7))
  #define ata_id_has_dword_io(id)  ((id)[50]  (1  0))
 +#define ata_id_has_AN(id)\
 + ((id[76]  (~id[76]))  ((id)[78]  (1  5)))

Might be nice to check ATA version as well to be paranoid but this all
looks ok as its a reserved field since way back when.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/7] genhd: expose AN to user space

2007-04-24 Thread Tejun Heo

Kristen Carlson Accardi wrote:
 +static struct disk_attribute disk_attr_capability = {
 + .attr = {.name = capability_flags, .mode = S_IRUGO },
 + .show   = disk_capability_read
 +};

How about just capability?  I think that would be more consistent with
other attributes.

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 7/7] libata: send event when AN received

2007-04-24 Thread Alan Cox

 + /* check the 'N' bit in word 0 of the FIS */
 + if (f[0]  (1  15)) {
 + int port_addr =  ((f[0]  0x0f00)  8);
 + struct ata_device *adev = ap-device[port_addr];

You can't be sure that the port_addr returned will be in range if a
device is malfunctioning...

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [mmc] alternative TI FM MMC/SD driver for 2.6.21-rc7

2007-04-24 Thread Sergey Yanovich


Hi,

If you add support for let's say [tifm_8xx2] in the future, which
would have port offsets different that [tifm_7xx1], you would also need a
completely new modules for slots (sd, ms, etc).



Does not this constitutes an unbounded speculation?

Only time will tell :)

And then, what would you propose to do with
adapters that have SD support disabled? There are quite a few of those in the 
wild, as of right
now (SD support is provided by bundled SDHCI on such systems, if at all). 
Similar argument goes
for other media types as well - many controllers have xD support disabled too 
(I think you have
one of those - Sony really values its customers). After all, it is not healthy 
to have dead code
in the kernel.


A typical kernel config is an allmconfig, which has tones of dead
code: just see a 'General setup' part of your distro '.config'.
There are item like 'SMP' selected by default for 686+ CPUs. And
this is far more overhead that a single check of card type on
insert.

To allow customization, boolean module options that disable certain
card type may suffice.

And again, you are doing a great work with the driver.

--
Sergey Yanovich
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH -mm take2] 64bit-futex - provide new commands instead of new syscall

2007-04-24 Thread Pierre Peiffer


Ulrich Drepper a écrit :


It looks mostly good.  I wouldn't use the high bit to differentiate
the 64-bit operations, though.  Since we do not allow to apply it to
all operations the only effect will be that the compiler has a harder
time generating the code for the switch statement.  If you use
continuous values a simple jump table can be used and no conditionals.
Smaller and faster.



Something like that may be...

Signed-off-by: Pierre Peiffer [EMAIL PROTECTED]


--
Pierre
---
 include/asm-ia64/futex.h|8 -
 include/asm-powerpc/futex.h |6 -
 include/asm-s390/futex.h|8 -
 include/asm-sparc64/futex.h |8 -
 include/asm-um/futex.h  |9 -
 include/asm-x86_64/futex.h  |   86 --
 include/asm-x86_64/unistd.h |2 
 include/linux/futex.h   |6 +
 include/linux/syscalls.h|3 
 kernel/futex.c  |  203 ++--
 kernel/futex_compat.c   |2 
 kernel/sys_ni.c |1 
 12 files changed, 95 insertions(+), 247 deletions(-)

Index: b/include/asm-ia64/futex.h
===
--- a/include/asm-ia64/futex.h
+++ b/include/asm-ia64/futex.h
@@ -124,13 +124,7 @@ futex_atomic_cmpxchg_inatomic(int __user
 static inline u64
 futex_atomic_cmpxchg_inatomic64(u64 __user *uaddr, u64 oldval, u64 newval)
 {
-	return 0;
-}
-
-static inline int
-futex_atomic_op_inuser64 (int encoded_op, u64 __user *uaddr)
-{
-	return 0;
+	return -ENOSYS;
 }
 
 #endif /* _ASM_FUTEX_H */
Index: b/include/asm-powerpc/futex.h
===
--- a/include/asm-powerpc/futex.h
+++ b/include/asm-powerpc/futex.h
@@ -119,11 +119,5 @@ futex_atomic_cmpxchg_inatomic64(u64 __us
 	return 0;
 }
 
-static inline int
-futex_atomic_op_inuser64 (int encoded_op, u64 __user *uaddr)
-{
-	return 0;
-}
-
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_FUTEX_H */
Index: b/include/asm-s390/futex.h
===
--- a/include/asm-s390/futex.h
+++ b/include/asm-s390/futex.h
@@ -51,13 +51,7 @@ static inline int futex_atomic_cmpxchg_i
 static inline u64
 futex_atomic_cmpxchg_inatomic64(u64 __user *uaddr, u64 oldval, u64 newval)
 {
-	return 0;
-}
-
-static inline int
-futex_atomic_op_inuser64 (int encoded_op, u64 __user *uaddr)
-{
-	return 0;
+	return -ENOSYS;
 }
 
 #endif /* __KERNEL__ */
Index: b/include/asm-sparc64/futex.h
===
--- a/include/asm-sparc64/futex.h
+++ b/include/asm-sparc64/futex.h
@@ -108,13 +108,7 @@ futex_atomic_cmpxchg_inatomic(int __user
 static inline u64
 futex_atomic_cmpxchg_inatomic64(u64 __user *uaddr, u64 oldval, u64 newval)
 {
-	return 0;
-}
-
-static inline int
-futex_atomic_op_inuser64 (int encoded_op, u64 __user *uaddr)
-{
-	return 0;
+	return -ENOSYS;
 }
 
 #endif /* !(_SPARC64_FUTEX_H) */
Index: b/include/asm-um/futex.h
===
--- a/include/asm-um/futex.h
+++ b/include/asm-um/futex.h
@@ -6,14 +6,7 @@
 static inline u64
 futex_atomic_cmpxchg_inatomic64(u64 __user *uaddr, u64 oldval, u64 newval)
 {
-	return 0;
+	return -ENOSYS;
 }
 
-static inline int
-futex_atomic_op_inuser64 (int encoded_op, u64 __user *uaddr)
-{
-	return 0;
-}
-
-
 #endif
Index: b/include/asm-x86_64/futex.h
===
--- a/include/asm-x86_64/futex.h
+++ b/include/asm-x86_64/futex.h
@@ -41,38 +41,6 @@
 	  =r (tem)		\
 	: r (oparg), i (-EFAULT), m (*uaddr), 1 (0))
 
-#define __futex_atomic_op1_64(insn, ret, oldval, uaddr, oparg) \
-  __asm__ __volatile (		\
-1:	 insn \n		\
-2:	.section .fixup,\ax\\n\
-3:	movq	%3, %1\n\
-	jmp	2b\n\
-	.previous\n\
-	.section __ex_table,\a\\n\
-	.align	8\n\
-	.quad	1b,3b\n\
-	.previous		\
-	: =r (oldval), =r (ret), =m (*uaddr)		\
-	: i (-EFAULT), m (*uaddr), 0 (oparg), 1 (0))
-
-#define __futex_atomic_op2_64(insn, ret, oldval, uaddr, oparg) \
-  __asm__ __volatile (		\
-1:	movq	%2, %0\n\
-	movq	%0, %3\n	\
-	insn \n		\
-2:	 LOCK_PREFIX cmpxchgq %3, %2\n\
-	jnz	1b\n\
-3:	.section .fixup,\ax\\n\
-4:	movq	%5, %1\n\
-	jmp	3b\n\
-	.previous\n\
-	.section __ex_table,\a\\n\
-	.align	8\n\
-	.quad	1b,4b,2b,4b\n\
-	.previous		\
-	: =a (oldval), =r (ret), =m (*uaddr),		\
-	  =r (tem)		\
-	: r (oparg), i (-EFAULT), m (*uaddr), 1 (0))
 
 static inline int
 futex_atomic_op_inuser (int encoded_op, int __user *uaddr)
@@ -128,60 +96,6 @@ futex_atomic_op_inuser (int encoded_op, 
 }
 
 static inline int
-futex_atomic_op_inuser64 (int encoded_op, u64 __user *uaddr)
-{
-	int op = (encoded_op  28)  7;
-	int cmp = (encoded_op  24)  15;
-	u64 oparg = (encoded_op  8)  20;
-	u64 cmparg = (encoded_op  20)  20;
-	u64 oldval = 0, ret, tem;
-
-	if (encoded_op  (FUTEX_OP_OPARG_SHIFT  28))
-		oparg = 1  oparg;
-
-	if (! access_ok (VERIFY_WRITE, uaddr, sizeof(u64)))
-		return -EFAULT;
-
-

Re: 2.6.21-rc6-mm1

2007-04-24 Thread J.A. Magallón

On Sun, 8 Apr 2007 14:35:59 -0700, Andrew Morton [EMAIL PROTECTED] wrote:

 
 ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/
 
 
 - Lots of x86 updates
 

Has somthing related with PTY's changed in this kernel ?
I have to enable legacy PTY handling in a couple boxes to get ssh working.
If not, I had openpty() errors and nor sshd nor virtual terminals (aterm) were
able to get a terminal.

User space (udev) is the same in three boxes and one works and two fail.
I had /dev/ptmx everywhere and /dev/pts mounted

Any idea ?
TIA

--
J.A. Magallon jamagallon()ono!com \   Software is like sex:
 \ It's better when it's free
Mandriva Linux release 2008.0 (Cooker) for i586
Linux 2.6.20-jam10 (gcc 4.1.2 20070302 (prerelease) (4.1.2-1mdv2007.1)) #1 SMP 
PREEMPT
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: PageLRU can be non-atomic bit operation

2007-04-24 Thread Hisashi Hifumi



At 11:47 07/04/24, Nick Piggin wrote:

As Hugh points out, we must have atomic ops here, so changing the generic
code to use the __ version is wrong. However if there is a faster way that
i386 can perform the atomic variant, then doing so will speed up the generic
code without breaking other architectures.


Do you mean writing page-flags.h specific for i386 so improving generic code
and without breaking other architectures ?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH -mm take4 2/6] support multiple logging

2007-04-24 Thread Keiichi KII


On Fri, 20 Apr 2007 18:51:13 +0900
Keiichi KII [EMAIL PROTECTED] wrote:


I started to do some cleanups and fixups here, but abandoned it when it was
all getting a bit large.

Here are some fixes against this patch:
I'm going to fix my patches by following your reviews and send new patches 
on the LKML and the netdev ML in a few days.




Well..  before you can finish this work we need to decide upon what the
interface to userspace will be.

- The miscdev isn't appropriate



Why isn't miscdev appropriate? 
We just shouldn't use miscdev for networking conventionally?


--
Keiichi KII
NEC Corporation OSS Promotion Center
E-mail: [EMAIL PROTECTED]





-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH -mm take4 2/6] support multiple logging

2007-04-24 Thread Keiichi KII


We don't really have anything that corresponds to netpoll's
connections at higher levels.

I'm tempted to say we should make this work more like the dummy
network device. ie:

modprobe netconsole -o netcon1 [params]
modprobe netconsole -o netcon2 [params]


The configuration of netconsole's looks like the configuration of routes.
Granted you probably have more routes than netconsoles, but the interface
issues are similar.  Netlink with a small application wouldn't be nice.
And having /proc/net/netconsole (read-only) would be good for the netlink
impaired.


Do you say that we had better use procfs instead of sysfs to show the 
configurations of netconsole?


If so, I have a question.
I thought that procfs use things related to process as far as possible.
Is it no problem to use procfs here? 


--
Keiichi KII
NEC Corporation OSS Promotion Center
E-mail: [EMAIL PROTECTED]


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 911 matches

Mail list logo