Re: [PATCH 1/2] [RESEND] PCI: read revision ID by default

2007-06-25 Thread Greg KH
On Sun, Jun 24, 2007 at 08:19:18PM -0700, Auke Kok wrote:
> Currently there are 97 occurrences where drivers need the pci
> revision ID. We can do this once for all devices. Even the pci
> subsystem needs the revision several times for quirks. The extra
> u8 member pads out nicely in the pci_dev struct.
> 
> Signed-off-by: Auke Kok <[EMAIL PROTECTED]>

Thanks, I've updated both of these in my tree.

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-25 Thread Willy Tarreau
On Mon, Jun 25, 2007 at 09:08:23AM +0200, Segher Boessenkool wrote:
> >In my experience, -Os produced faster code on gcc-2.95 than -O2 or -O3.
> 
> On what CPU?  The effect of different optimisations varies
> hugely between different CPUs (and architectures).

x86

> >It was not only because of cache considerations, but because gcc used
> >different tricks to avoid poor optimizations, and at the end, the CPU
> >ended executing the alternative code faster.
> 
> -Os is "as fast as you can without bloating the code size",
> so that is the expected result for CPUs that don't need
> special hand-holding around certain performance pitfalls.
> 
> >With gcc-3.3, -Os show roughly the same performance as -O2 for me on
> >various programs. However, with gcc-3.4, I noticed a slow down with
> >-Os. And with gcc-4, using -Os optimizes only for size, even if the
> >output code is slow as hell. I've had programs whose speed dropped
> >by 70% using -Os on gcc-4.
> 
> Well you better report those!  

No, -Os is for size only :

   -Os Optimize for size.  -Os enables all -O2 optimizations
   that do not typically increase code size.  It also
   performs further optimizations designed to reduce code
   size.

So it is expected that speed can be reduced using -Os. I won't report
a thing which is already documented !

> >But in some situtations, it's desirable to have the smallest possible
> >kernel whatever its performance. This goes for installation CDs for
> >instance.
> 
> There are much better ways to achieve that.

Optimizing is not a matter of choosing *one* way, but cumulating
everything you have. For instance, on a smart boot loader, I have
a kernel which is about 300 kB, or 700 kB with the initramfs. Among
the tricks I used :
  - -Os
  - -march=i386
  - align everything to 0
  - replace gzip with p7zip

Even if each of them reduces overall size by 5%, the net result is
0.95^4 = 0.81 = 19% gain, for the same set of features. This is
something to consider.

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux v2.6.22-rc6

2007-06-25 Thread Jan Engelhardt
Hi,

On Jun 24 2007 23:12, Linus Torvalds wrote:
>
>So nothing really too exciting here, but hopefully we're getting closer to 
>a real 2.6.22 release. Please *do* test it, and in particular people who 
>have been involved with regressions, please check that the ones that 
>should be fixed are really fixed, and remind people about anything that is 
>still pending.

This one, http://lkml.org/lkml/2007/6/23/244 , would really be needed.


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch, 2.6.22-rc6] fix nmi_watchdog=2 bootup hang

2007-06-25 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> hm, restoring nmi.c to the v2.6.21 state does not fix the 
> nmi_watchdog=2 hang. I'll do a bisection run.

and after spending an hour on 15 bisection steps:

 git-bisect start
 git-bisect good d1be341dba5521506d9e6dccfd66179080705bea
 git-bisect bad a06381fec77bf88ec6c5eb6324457cb04e9ffd69
 git-bisect bad 794543a236074f49a8af89ef08ef6a753e4777e5
 git-bisect good 24a77daf3d80bddcece044e6dc3675e427eef3f3
 git-bisect bad ea62ccd00fd0b6720b033adfc9984f31130ce195
 git-bisect good 7e20ef030dde0e52dd5a57220ee82fa9facbea4e
 git-bisect bad f19cccf366a07e05703c90038704a3a5ffcb0607
 git-bisect good 0d08e0d3a97cce22ebf80b54785e00d9b94e1add
 git-bisect bad 856f44ff4af6e57fdc39a8b2bec498c88438bd27
 git-bisect bad f8822f42019eceed19cc6c0f985a489e17796ed8
 git-bisect good 1c3d99c11c47c8a1a9ed6a46555dbf6520683c52
 git-bisect good b239fb2501117bf3aeb4dd6926edd855be92333d
 git-bisect good 98de032b681d8a7532d44dfc66aa5c0c1c755a9d
 git-bisect good 42c24fa22e86365055fc931d833f26165e687c19

the winner is ...

 f8822f42019eceed19cc6c0f985a489e17796ed8 is first bad commit
 commit f8822f42019eceed19cc6c0f985a489e17796ed8
 Author: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
 Date:   Wed May 2 19:27:14 2007 +0200

[PATCH] i386: PARAVIRT: Consistently wrap paravirt ops callsites to make 
them patchable

... our wonderful paravirt subsystem, honed to eternal perfection by the 
testing-machine x86_64 tree.

reverting -git-curr's paravirt.c, paravirt.h, smp.c and tlbflush.h to 
before the bad commit makes the NMI watchdog work again. Patch against 
-rc6 is below.

Ingo

>
Subject: [patch, 2.6.22-rc6] fix nmi_watchdog=2 bootup hang
From: Ingo Molnar <[EMAIL PROTECTED]>

nmi_watchdog=2 hangs on i386:

 Calling initcall 0xc06cc620: check_nmi_watchdog+0x0/0x1f0()
 Testing NMI watchdog ... CPU#0: NMI appears to be stuck (0->0)!
 CPU#1: NMI appears to be stuck (0->0)!
 initcall 0xc06cc620: check_nmi_watchdog+0x0/0x1f0() returned -1.
 initcall 0xc06cc620 ran for 27 msecs: check_nmi_watchdog+0x0/0x1f0()
 initcall at 0xc06cc620: check_nmi_watchdog+0x0/0x1f0(): returned with
 error code -1
 Calling initcall 0xc06ccbb0: io_apic_bug_finalize+0x0/0x20()
 initcall 0xc06ccbb0: io_apic_bug_finalize+0x0/0x20() returned 0.
 initcall 0xc06ccbb0 ran for 0 msecs: io_apic_bug_finalize+0x0/0x20()
 Calling initcall 0xc06ccd00: balanced_irq_init+0x0/0x1e0()
 Starting balanced_irq
 [hard hang]

bisected it down to:

 git-bisect start
 git-bisect good d1be341dba5521506d9e6dccfd66179080705bea
 git-bisect bad a06381fec77bf88ec6c5eb6324457cb04e9ffd69
 git-bisect bad 794543a236074f49a8af89ef08ef6a753e4777e5
 git-bisect good 24a77daf3d80bddcece044e6dc3675e427eef3f3
 git-bisect bad ea62ccd00fd0b6720b033adfc9984f31130ce195
 git-bisect good 7e20ef030dde0e52dd5a57220ee82fa9facbea4e
 git-bisect bad f19cccf366a07e05703c90038704a3a5ffcb0607
 git-bisect good 0d08e0d3a97cce22ebf80b54785e00d9b94e1add
 git-bisect bad 856f44ff4af6e57fdc39a8b2bec498c88438bd27
 git-bisect bad f8822f42019eceed19cc6c0f985a489e17796ed8
 git-bisect good 1c3d99c11c47c8a1a9ed6a46555dbf6520683c52
 git-bisect good b239fb2501117bf3aeb4dd6926edd855be92333d
 git-bisect good 98de032b681d8a7532d44dfc66aa5c0c1c755a9d
 git-bisect good 42c24fa22e86365055fc931d833f26165e687c19

 f8822f42019eceed19cc6c0f985a489e17796ed8 is first bad commit
 commit f8822f42019eceed19cc6c0f985a489e17796ed8
 Author: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
 Date:   Wed May 2 19:27:14 2007 +0200

[PATCH] i386: PARAVIRT: Consistently wrap paravirt ops callsites to make 
them patchable

this patch reverts the code back to the last known booting version.

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 arch/i386/kernel/paravirt.c |  174 ++
 arch/i386/kernel/smp.c  |   93 +++--
 include/asm-i386/paravirt.h |  718 +---
 include/asm-i386/tlbflush.h |   21 -
 4 files changed, 245 insertions(+), 761 deletions(-)

Index: linux-2.6-git/arch/i386/kernel/paravirt.c
===
--- linux-2.6-git.orig/arch/i386/kernel/paravirt.c
+++ linux-2.6-git/arch/i386/kernel/paravirt.c
@@ -19,7 +19,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
 #include 
 #include 
@@ -54,142 +54,40 @@ char *memory_setup(void)
 #define DEF_NATIVE(name, code) \
extern const char start_##name[], end_##name[]; \
asm("start_" #name ": " code "; end_" #name ":")
-
-DEF_NATIVE(irq_disable, "cli");
-DEF_NATIVE(irq_enable, "sti");
-DEF_NATIVE(restore_fl, "push %eax; popf");
-DEF_NATIVE(save_fl, "pushf; pop %eax");
+DEF_NATIVE(cli, "cli");
+DEF_NATIVE(sti, "sti");
+DEF_NATIVE(popf, "push %eax; popf");
+DEF_NATIVE(pushf, "pushf; pop %eax");
 DEF_NATIVE(iret, "iret");
-DEF_NATIVE(irq_enable_sysexit, "sti; sysexit");
-DEF_NATIVE(read_cr2, "mov %cr2, %eax");
-DEF_NATIVE(write_cr3, "mov %eax, %cr3");
-DEF_NATIVE(read_cr3, "mov %cr3, 

Re: [patch 1/3] add the fsblock layer

2007-06-25 Thread Nick Piggin

Neil Brown wrote:

On Sunday June 24, [EMAIL PROTECTED] wrote:



+#define PG_blocks  20  /* Page has block mappings */
+



I've only had a very quick look, but this line looks *very* wrong.
You should be using PG_private.

There should never be any confusion about whether ->private has
buffers or blocks attached as the only routines that ever look in
->private are address_space operations  (or should be.  I think 'NULL'
is sometimes special cased, as in try_to_release_page.  It would be
good to do some preliminary work and tidy all that up).


There is a lot of confusion, actually :)
But as you see in the patch, I added a couple more aops APIs, and
am working toward decoupling it as much as possible. It's pretty
close after the fsblock patch... however:



Why do you think you need PG_blocks?


Block device pagecache (buffer cache) has to be able to accept
attachment of either buffers or blocks for filesystem metadata,
and call into either buffer.c or fsblock.c based on that.

If the page flag is really important, we can do some awful hack
like assuming the first long of the private data is flags, and
those flags will tell us whether the structure is a buffer_head
or fsblock ;) But for now it is just easier to use a page flag.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-25 Thread Segher Boessenkool

-Os is "as fast as you can without bloating the code size",
so that is the expected result for CPUs that don't need
special hand-holding around certain performance pitfalls.


this sounds like you are saying that people wanting performance should 
pick -Os.


That is true on most CPUs.  Some CPUs really really need
some of things that -Os disables (compared to -O2) for
decent performance though (branch target alignment...)

what should people pick who care more about code size then anything 
else? (examples being embedded development where you may be willing to 
sacrafice speed to avoid having to add additional chips to the design)


-Os and tune some options.  There is extensive work being
done over the last few years to make GCC more suitable for
embedded targets btw.  But the -O1/-O2/-O3/-Os gives you
four choices only, it's really not so hard to understand
I hope that for more specific goals you need to add more
specific options?


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-25 Thread Segher Boessenkool

Also note that whether or not it is profitable to unroll
a particular loop depends largely on how "hot" that loop
is, and GCC doesn't know much about that if you don't feed
it profiling information (it can guess a bit, sure, but it
can guess wrong too).


actually, what you are saying is that the compiler can't know enough 
to figure out how to optimize for speed. it will just do what you tell 
it to, either unroll loops or not.


It bases its optimisation decisions on the options you give
it, the profile feedback information you either or not gave
it, and a whole bunch of heuristics.

this argues that both O2 and Os are incorrect for a project to use and 
instead the project needs to make it's own decisions on this.


For optimal performance, you need to fine-tune options yes,
per file (or per function even!)

if this is the true feeling of the gcc team I'm very disappointed, it 
feels like a huge step backwards.


I speak only for myself.  However this is the only way it _can_
be, the compiler isn't clairvoyant.  Some of the heuristics sure
could use some tuning, but they stay heuristics.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Problems with mounting flash partition with jffs2

2007-06-25 Thread gshan

Hey Guys,

Today, I got a strange problems. When I tried to mount a the last 2 
flash partitions, following errors happened. Any ideas are appreciated.


# cat /proc/mtd
dev:size   erasesize  name
mtd0: 0010 0001 "boot"
mtd1: 0020 0001 "ro"
mtd2: 0010 0001 "diag-var-log"
mtd3: 0010 0001 "mlba"
mtd4: 0010 0001 "rw"
mtd5: 0020 0001 "sarsu"
mtd6: 00c0 0001 "backup"
mtd7: 00c0 0001 "kdi"
# mknod /dev/mtd.boot b 31 0
# mknod /dev/mtd.ro b 31 1
# mknod /dev/mtd.diag-var-log b 31 2
# mknod /dev/mtd.mlba b 31 3
# mknod /dev/mtd.rw b 31 4
# mknod /dev/mtd.sarsu b 31 5
# mknod /dev/mtd.backup b 31 6
# mknod /dev/mtd.kdi b 31 7
#
# mount -t jffs2 /dev/mtd.boot /mnt
# umount /mnt
# mount -t jffs2 /dev/mtd.ro /mnt
Inode #3 was a directory with children - removing those 
too... <<< Error here

# mount -t jffs2 /dev/mtd.diag-var-log /mnt
# umount /mnt
# mount -t jffs2 /dev/mtd.mlba /mnt
Inode #4 was a directory with children - removing those 
too... <<< Error here

# mount -t jffs2 /dev/mtd.rw /mnt
# umount /mnt
# mount -t jffs2 /dev/mtd.sarsu /mnt
# umount /mnt
# mount -t jffs2 /dev/mtd.backup /mnt
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x0044: 
0x5599 instead   <<< Error here
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x0048: 
0x0c00 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00400010: 
0x0c80 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00400018: 
0x0c80 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x0040001c: 
0x2020 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00400020: 
0x0c80 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00400024: 
0x80e4 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00400028: 
0x0c00 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00400030: 
0x0c00 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00400038: 
0x0c00 instead

Further such events for this erase block will not be printed
Empty flash at 0x004065e4 ends at 0x004065e8
Empty flash at 0x00406670 ends at 0x00406674
Empty flash at 0x00406684 ends at 0x00406688
Empty flash at 0x00406818 ends at 0x0040681c
Empty flash at 0x00406854 ends at 0x00406858
Empty flash at 0x00406868 ends at 0x0040686c
Empty flash at 0x0040687c ends at 0x00406880
Empty flash at 0x004068a4 ends at 0x004068a8
Empty flash at 0x004069a8 ends at 0x004069ac
:
:
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x005502d8: 
0x0200 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x0055071c: 
0x0400 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00550720: 
0x8e00 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00550748: 
0x8200 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00550770: 
0xa200 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00550794: 
0x0100 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00550798: 
0x8200 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x005507c0: 
0x8200 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x005507e8: 
0xa200 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00550810: 
0x8200 instead

Further such events for this erase block will not be printed
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x0056: 
0x2000 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00560004: 
0x0021 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x0056000c: 
0x0028 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00560010: 
0x2101 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00560018: 
0x6001 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x0056001c: 
0x0202 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00560020: 
0x0002 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00560024: 
0x1112 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00560028: 
0x8080 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x0056002c: 
0x0802 instead

Further such events for this erase block will not be printed
Cowardly refusing to erase blocks on filesystem with no valid JFFS2 nodes
empty_blocks 169, bad_blocks 0, c->nr_blocks 192
mount: mounting /dev/mtd.backup on /mnt failed
#
#
#
# mount -t jffs2 /dev/mtd.kdi /mnt
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x0044: 
0x5599 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x0048: 
0x0c00 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00400010: 
0x0c80 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x0

Re: [patch] CFS scheduler, -v18

2007-06-25 Thread Antonino Ingargiola

2007/6/24, Ingo Molnar <[EMAIL PROTECTED]>:


* Antonino Ingargiola <[EMAIL PROTECTED]> wrote:

> Anyway, I've discovered with great pleasure that CFS has also the
> SCHED_ISO priority. I may have missed something, but I don't remember
> to have read this in any of the CFS release notes :). For me this is a
> really useful feature. Thanks.

well, it's only a hack and emulated: SCHED_ISO in CFS is recognized as a
policy but it falls back to SCHED_NORMAL. Could you check how well this
(i.e. SCHED_NORMAL) works for your workload, compared to SD's SCHED_ISO?


To be fair, my workload is not really "critical". I'm used to
skip-free audio listening (no matter what) since long time running my
audio player with SCHED_ISO. Even in mainline the skips aren't so
frequent, but still annoying. I'm using SCHED_ISO for the confidence
it gives in providing skip-free audio.

For my modest needs also CFS SCHED_NORMAL has been just fine (in these
latest days). I'll report if I can find a more critical workload that
can possibly stress CFS SCHED_NORMAL.


Regards,

   ~ Antonio
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.22-rc5-yesterdaygit with VM debug: BUG in mm/rmap.c:66: anon_vma_link ?

2007-06-25 Thread Petr Vandrovec

Hello,
  to catch some memory corruption bug in our code I've modified malloc 
to do mmap + mprotect - which has unfortunate effect that it creates 
thousands and thousands of VMAs.  Everything works (though rather slowly 
on kernel with CONFIG_VM_DEBUG) until application does fork() - kernel 
crashes on fork() because copy_process()'s anon_vma_link complains that 
it could not find anon vma after walking through 10 elements of anon 
list - which seems strange, as I did not touch system wide limit (which 
is 65536 vmas), and mprotect()s started failing after creating 65536 
vmas, as expected.


Full output of test program and full kernel dmesg are at 
http://buk.vc.cvut.cz/linux/rmap.

Thanks,
Petr Vandrovec


#include 
#include 
#include 
#include 

#define TRY_REGIONS 131072

int main(void) {
unsigned char* ptr[TRY_REGIONS];
int i;
int fd;
int badmprot = 0;
char buf[16384];
ssize_t l;

printf("PID=%u\n", getpid());
for (i = 0; i < TRY_REGIONS; i++) {
		ptr[i] = mmap(0, 8192, PROT_READ | PROT_WRITE, MAP_PRIVATE | 
MAP_ANONYMOUS, -1, 0);

if (ptr[i] == MAP_FAILED) {
break;
}
if (mprotect(ptr[i] + 4096, 4096, PROT_NONE)) {
badmprot++;
}
}
printf("Allocated %u regions, %u mprotects failed\n", i, badmprot);
fflush(stdout);
fd = open("/proc/self/maps", O_RDONLY);
while ((l = read(fd, buf, sizeof buf)) > 0) {
write(1, buf, l);
}
close(fd);
fork();
return 0;
}

PID=6101
Allocated 131072 regions, 98310 mprotects failed
08048000-08049000 r-xp  08:05 1163513 
 /root/test
08049000-0804a000 rw-p  08:05 1163513 
 /root/test

b7e37000-e7e44000 rw-p b7e37000 00:00 0
e7e44000-e7e45000 ---p e7e44000 00:00 0
e7e45000-e7e46000 rw-p e7e45000 00:00 0
[65525 lines removed]
f7f7b000-f7f7c000 ---p f7f7b000 00:00 0
f7f7c000-f7f7f000 rw-p f7f7c000 00:00 0
f7f7f000-f7f9a000 r-xp  08:05 15581230 
 /lib/ld-2.5.so
f7f9a000-f7f9c000 rw-p 0001b000 08:05 15581230 
 /lib/ld-2.5.so
ff869000-ff8ef000 rw-p ff869000 00:00 0 
 [stack]
e000-f000 r-xp e000 00:00 0 
 [vdso]


[ cut here ]
kernel BUG at /usr/src/linus/linux-2.6.22-rc5-7515/mm/rmap.c:66!
invalid opcode:  [1] PREEMPT SMP
CPU 0
Modules linked in: binfmt_misc rfcomm l2cap nfs nfsd exportfs lockd 
nfs_acl sunrpc ipx p8022 psnap llc p8023 ppdev lp af_packet aoe deflate 
zlib_deflate zlib_inflate twofish twofish_common camellia serpent 
blowfish des cbc ecb blkcipher aes xcbc sha256 sha1 crypto_null hmac 
crypto_hash cryptomgr af_key nls_utf8 nls_iso8859_2 ntfs fuse sbp2 loop 
hci_usb raw1394 dv1394 bluetooth usb_storage usbhid libusual 
snd_hda_intel snd_pcm_oss snd_mixer_oss snd_pcm snd_timer firewire_ohci 
firewire_core sg snd crc_itu_t parport_pc parport sky2 k8temp 8250_pnp 
8250 serial_core sr_mod serio_raw hwmon sata_sil24 ohci1394 ieee1394 
ohci_hcd ehci_hcd cdrom snd_page_alloc usbcore i2c_nforce2

Pid: 6101, comm: test Not tainted 2.6.22-rc5-7515-64 #1
RIP: 0010:[]  [] anon_vma_link+0x8b/0xa0
RSP: 0018:810111d4fd88  EFLAGS: 00010202
RAX: 8101109fb9e0 RBX: 8101109fb978 RCX: 8101109fb978
RDX: 810112fccf10 RSI: 000186a1 RDI: 
RBP: 810111d4fd98 R08: 810112fccf10 R09: 
R10: 8029874b R11:  R12: 810112fccee0
R13: 01200011 R14: 810111590080 R15: 810111590080
FS:  () GS:80652000(0063) knlGS:f7e196c0
CS:  0010 DS: 002b ES: 002b CR0: 8005003b
CR2: f7eaa7c0 CR3: 000111c17000 CR4: 06e0
Process test (pid: 6101, threadinfo 810111d4e000, task 810112fcf080)
Stack:  0001 8101109fb978 810111d4fe78 8023817c
 8061d688 8101140e00e0 8101115901b8 81012560d148
 8101115906a0 810111ebd760 f7e19708 
Call Trace:
 [] copy_process+0xb9c/0x1760
 [] alloc_pid+0x212/0x320
 [] do_fork+0xa3/0x290
 [] _spin_unlock+0x30/0x60
 [] __fput+0x176/0x1c0
 [] sys32_clone+0x27/0x30
 [] ia32_ptregs_common+0x25/0x50


Code: 0f 0b eb fe 0f 0b eb fe 66 0f 1f 44 00 00 0f 1f 80 00 00 00
RIP  [] anon_vma_link+0x8b/0xa0
 RSP 
note: test[6101] exited with preempt_count 1
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] add the fsblock layer

2007-06-25 Thread Nick Piggin

Andi Kleen wrote:

Nick Piggin <[EMAIL PROTECTED]> writes:


[haven't read everything, just commenting on something that caught my eye]



+struct fsblock {
+   atomic_tcount;
+   union {
+   struct {
+   unsigned long   flags; /* XXX: flags could be int for 
better packing */



int is not supported by many architectures, but works on x86 at least.


Yeah, that would be nice. We could actually use this for buffer_head as well,
but saving 4% there isn't so important as saving 20% for fsblock :)



Hmm, could define a macro DECLARE_ATOMIC_BITMAP(maxbit) that expands to the 
smallest
possible type for each architecture. And a couple of ugly casts for set_bit 
et.al.
but those could be also hidden in macros. Should be relatively easy to do.


Cool. It would probably be useful for other things as well.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch -rss] Make RSS accounting display more user friendly

2007-06-25 Thread Paul Menage

On 6/22/07, Balbir Singh <[EMAIL PROTECTED]> wrote:


The problem with input in bytes is that the user will have to ensure
that the input is
a  multiple of page size, which implies that she would need to use the
calculator every time.



Having input in bytes seems pretty natural to me. Why not just have
the RSS controller round the input to the nearest page (or whatever
granularity of memory the controller is able to limit at)?

Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] fsblock

2007-06-25 Thread Nick Piggin

Andi Kleen wrote:

Nick Piggin <[EMAIL PROTECTED]> writes:


- Structure packing. A page gets a number of buffer heads that are
 allocated in a linked list. fsblocks are allocated contiguously, so
 cacheline footprint is smaller in the above situation.



It would be interesting to test if that makes a difference for 
database benchmarks running over file systems. Databases

eat a lot of cache so in theory any cache improvements
in the kernel which often runs cache cold then should be beneficial. 


But I guess it would need at least ext2 to test; Minix is probably not
good enough.


Yeah, you are right. ext2 would be cool to port as it would be
a reasonable platform for basic performance testing and comparisons.


In general have you benchmarked the CPU overhead of old vs new code? 
e.g. when we went to BIO scalability went up, but CPU costs

of a single request also went up. It would be nice to not continue
or better reverse that trend.


At the moment there are still a few silly things in the code, such
as always calling the insert_mapping indirect function (which is
the get_block equivalent). And it does a bit more RMWing than it
should still.

Also, it always goes to the pagecache radix-tree to find fsblocks,
wheras the buffer layer has a per-CPU cache front-end... so in
that regard, fsblock is really designed with lockless pagecache
in mind, where find_get_page is much faster even in the serial case
(though fsblock shouldn't exactly be slow with the current pagecache).

However, I don't think there are any fundamental performance
problems with fsblock. It even uses one less layer of locking to
do regular IO compared with buffer.c, so in theory it might even
have some advantage.

Single threaded performance of request submission is something I
will definitely try to keep optimal.



- Large block support. I can mount and run an 8K block size minix3 fs on
 my 4K page system and it didn't require anything special in the fs. We
 can go up to about 32MB blocks now, and gigabyte+ blocks would only
 require  one more bit in the fsblock flags. fsblock_superpage blocks
 are > PAGE_CACHE_SIZE, midpage ==, and subpage <.



Can it be cleanly ifdefed or optimized away?


Yeah, it pretty well stays out of the way when using <= PAGE_CACHE_SIZE
size blocks, generally just a single test and branch of an already-used
cacheline. It can be optimised away completely by commenting out
#define BLOCK_SUPERPAGE_SUPPORT from fsblock.h.



Unless the fragmentation
problem is not solved it would seem rather pointless to me. Also I personally
still think the right way to approach this is larger softpage size.


It does not suffer from a fragmentation problem. It will do scatter
gather IO if the pagecache of that block is not contiguous. My naming
may be a little confusing: fsblock_superpage (which is a function that
returns true if the given fsblock is larger than PAGE_CACHE_SIZE) is
just named as to whether the fsblock is larger than a page, rather than
having a connection to VM superpages.

Don't get me wrong, I think soft page size is a good idea for other
reasons as well (less page metadata and page operations), and that
8 or 16K would probably be a good sweet spot for today's x86 systems.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-25 Thread david

On Mon, 25 Jun 2007, Segher Boessenkool wrote:


 In my experience, -Os produced faster code on gcc-2.95 than -O2 or -O3.


On what CPU?  The effect of different optimisations varies
hugely between different CPUs (and architectures).


 It was not only because of cache considerations, but because gcc used
 different tricks to avoid poor optimizations, and at the end, the CPU
 ended executing the alternative code faster.


-Os is "as fast as you can without bloating the code size",
so that is the expected result for CPUs that don't need
special hand-holding around certain performance pitfalls.


this sounds like you are saying that people wanting performance should 
pick -Os.


what should people pick who care more about code size then anything else? 
(examples being embedded development where you may be willing to sacrafice 
speed to avoid having to add additional chips to the design)


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-25 Thread david

On Mon, 25 Jun 2007, Segher Boessenkool wrote:


 then do we need a new option 'optimize for best overall performance' that
 goes for size (and the corresponding wins there) most of the time, but is
 ignored where it makes a huge difference?


That's -Os mostly.  Some awful CPUs really need higher
loop/label/function alignment though to get any
performance; you could add -falign-xxx options for those.


 in reality this was a flaw in gcc that on modern CPU's with the larger
 difference between CPU speed and memory speed it still preferred to unroll
 loops (eating more memory and blowing out the cpu cache) when it shouldn't
 have.


You told it to unroll loops, so it did.  No flaw.  If you
feel the optimisations enabled by -O2 should depend on the
CPU tuning selected, please file a PR.

Also note that whether or not it is profitable to unroll
a particular loop depends largely on how "hot" that loop
is, and GCC doesn't know much about that if you don't feed
it profiling information (it can guess a bit, sure, but it
can guess wrong too).


actually, what you are saying is that the compiler can't know enough to 
figure out how to optimize for speed. it will just do what you tell it to, 
either unroll loops or not.


this argues that both O2 and Os are incorrect for a project to use and 
instead the project needs to make it's own decisions on this.


if this is the true feeling of the gcc team I'm very disappointed, it 
feels like a huge step backwards.


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] get_random_long() and AT_ENTROPY for auxv, kernel 2.6.21.5

2007-06-25 Thread Jakub Jelinek
On Sun, Jun 24, 2007 at 09:43:03PM -0700, Arjan van de Ven wrote:
> > - something to do with aux vector headers
> 
> the primary goal is to pass a random value to userspace at process
> start; this to save glibc from having to open /dev/urandom on ever
> program start (which it does now for all apps compiled with
> -fstack-protector, which in various distros is "everything").

There are 2 ways to compile -fstack-protector supporting glibc actually,
only one opens /dev/urandom on every program initialization, the other
computes the stack guard from some bits of the stack address (so indirectly
depends on get_random_int() in stack randomization).
Nevertheless, having one random long (32-bit for 32-bit arches, 64-bit
otherwise) in aux vector would be useful.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-25 Thread Segher Boessenkool

In my experience, -Os produced faster code on gcc-2.95 than -O2 or -O3.


On what CPU?  The effect of different optimisations varies
hugely between different CPUs (and architectures).


It was not only because of cache considerations, but because gcc used
different tricks to avoid poor optimizations, and at the end, the CPU
ended executing the alternative code faster.


-Os is "as fast as you can without bloating the code size",
so that is the expected result for CPUs that don't need
special hand-holding around certain performance pitfalls.


With gcc-3.3, -Os show roughly the same performance as -O2 for me on
various programs. However, with gcc-3.4, I noticed a slow down with
-Os. And with gcc-4, using -Os optimizes only for size, even if the
output code is slow as hell. I've had programs whose speed dropped
by 70% using -Os on gcc-4.


Well you better report those!  


But in some situtations, it's desirable to have the smallest possible
kernel whatever its performance. This goes for installation CDs for
instance.


There are much better ways to achieve that.


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -Os versus -O2

2007-06-25 Thread Segher Boessenkool
then do we need a new option 'optimize for best overall performance' 
that goes for size (and the corresponding wins there) most of the 
time, but is ignored where it makes a huge difference?


That's -Os mostly.  Some awful CPUs really need higher
loop/label/function alignment though to get any
performance; you could add -falign-xxx options for those.

in reality this was a flaw in gcc that on modern CPU's with the larger 
difference between CPU speed and memory speed it still preferred to 
unroll loops (eating more memory and blowing out the cpu cache) when 
it shouldn't have.


You told it to unroll loops, so it did.  No flaw.  If you
feel the optimisations enabled by -O2 should depend on the
CPU tuning selected, please file a PR.

Also note that whether or not it is profitable to unroll
a particular loop depends largely on how "hot" that loop
is, and GCC doesn't know much about that if you don't feed
it profiling information (it can guess a bit, sure, but it
can guess wrong too).


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc3 nmi watchdog hang

2007-06-25 Thread Ingo Molnar

hm, restoring nmi.c to the v2.6.21 state does not fix the nmi_watchdog=2 
hang. I'll do a bisection run.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    1   2   3   4   5